Server move notice, Unicode
Hello everyone, On Wednesday evening (US time) July 18 the www.unicode.org server will again be undergoing migration. Downtime / off-line period is expected to be a few hours at most, beginning shortly after 17:00 Pacific time. We apologize for the inconvenience. Rick
Re: UAX #9: applicability of higher-level protocols to bidi plaintext
On Wed, 18 Jul 2018 13:43:36 + (UTC) philip chastney via Unicode wrote: > > On Tue, 17/7/18, Richard Wordingham via Unicode > wrote: > > > Subject: Re: UAX #9: applicability of higher-level protocols to > > bidi plaintext To: unicode@unicode.org > > Date: Tuesday, 17 July, 2018, 3:30 AM > > > An interesting ambiguity is "!True" v. "True!". > > "!True" can be read as "Not true". > > true - there are contexts where "!True" can be read as "Not true". The context I had in mind was terse exchanges between those who have recently programmed in C. Thus, 'true' would be read as 'true' rather than as '1', and '!true' as 'not true'. A longer context would usually eliminate the ambiguity. Richard.
Re: UAX #9: applicability of higher-level protocols to bidi plaintext
On 7/18/2018 6:43 AM, philip chastney via Unicode wrote: except that I remember a conference where one of the paricipants noted that fully one-third of the time allocated to each presentation was taken up explaining the presenter's notation :)
Re: UAX #9: applicability of higher-level protocols to bidi plaintext
On 7/18/2018 6:43 AM, philip chastney via Unicode wrote: there are also contexts where "Hello World!" can be read as the function "Hello", applied to the factorial value of "World" even though such a move wouldn't necessarily remove all ambiguity, the easiest solution is to declare that formal notations cannot be "plain" text Of course they can -- and (usually) should be, as they are designed that way. To state otherwise would just create headaches for designing parsers for formal notations. I think you are confusing ambiguity of *interpretation* of bits of formal notation, taken out of context, with ambiguity of *display* of formal notations in contexts where one does not know and control the paragraph directionality. The easiest (and correct) solution, when displaying formal notation for visual interpretation by human readers, is to use tools where one knows and can rely on the paragraph directionality explicitly, so that Unicode bidi doesn't add an out-of-left-field set of display conundrums, as it were, for bidi edge cases that can result in *mis*interpretation by the reader. In other words, if I am trying to read C program text or regex expressions, I expect that my tooling is not going to silently assume a RTL paragraph directional context and present me with visual garbage to interpret, forcing me to reverse engineer the bidi algorithm in my head, just to read the text. Why would I put up with that? --Ken
Re: UAX #9: applicability of higher-level protocols to bidi plaintext
On Tue, 17/7/18, Richard Wordingham via Unicode wrote: > Subject: Re: UAX #9: applicability of higher-level protocols to bidi plaintext > To: unicode@unicode.org > Date: Tuesday, 17 July, 2018, 3:30 AM > An interesting ambiguity is "!True" v. "True!". > "!True" can be read as "Not true". true - there are contexts where "!True" can be read as "Not true". it's unclear from the short sample given whether "True" is a variable name, or a Boolean constant, but there are other contexts where "True!" can be read as "the factorial value of True" and yet others where where "!True" can be similarly interpreted there are also contexts where "Hello World!" can be read as the function "Hello", applied to the factorial value of "World" even though such a move wouldn't necessarily remove all ambiguity, the easiest solution is to declare that formal notations cannot be "plain" text use a higher-level protocol to identify what formal notation is being used, perhaps, except that I remember a conference where one of the paricipants noted that fully one-third of the time allocated to each presentation was taken up explaining the presenter's notation /phil
Re: UAX #9: applicability of higher-level protocols to bidi plaintext
On 7/18/2018 1:51 AM, Shai Berger via Unicode wrote: The trade-off you seem to prefer is to make the "plain text is universally readable" idea from the core Unicode definition, not applicable to BiDi text. Your idea would simply outlaw being able to view text with a reader-defined stylesheet imposed on it. Such a stylesheet should be perfectly able to impose a paragraph direction. Just as you might make sure that your application gives you a choice of using a stylesheet that "imposes" default paragraph direction. Just because your choice makes the most sense to you in the scenarios that you imagine, doesn't mean it should somehow be the only choice. There are cases where such flexibility is much less motivated and, if allowed, potentially more harmful - therefore, you will find little in the way of optional (higher-level protocol) stuff for the basic encoding or even normalization. A./
Re: UAX #9: applicability of higher-level protocols to bidi plaintext
On 7/18/2018 1:51 AM, Shai Berger via Unicode wrote: My claim is that in the absence of an agreed or conveyed higher-level protocol, this default must be respected. Not how higher-level protocols work in Unicode. If you say that you support the default, then you better support the default but if you say otherwise, you can single-handedly declare a higher level protocol (or effectively declare one) and then be conformant if your HLP modifies otherwise modifiable (tailorable) behavior. There's no concept here of "sender and receiver" or means of expressing binding agreements between them. Not just for UBA, but all similar cases in Unicode. Now, you can create a protocol that makes such guarantees, like W3C does for HTML/CSS where some style values do mean "follow the default" for a given paragraph and you would not be conformant to either Unicode or the W3C specs if you didn't do that. A./
Re: UAX #9: applicability of higher-level protocols to bidi plaintext
On Mon, 16 Jul 2018 17:40:50 -0700 Ken Whistler via Unicode wrote: > > So your complaint seems to boil down to the claim that if you > transmit "Hello, world!" to a process which then renders it > conformantly according to the Unicode Standard (including > UBA), then that process must somehow know *and honor* > your intent that it display in a LTR directional context. That > information, however, is explicitly *not* contained in > the plain text string there, and has to be conveyed by means of a > higher-level protocol. > (E.g. HTML markup as dir="ltr", etc.) > I believe this is an inaccurate description, but indeed the discrepancy is at the root of the issue here. The UBA defines a default algorithm for determining the directionality of plain text paragraphs. My claim is that in the absence of an agreed or conveyed higher-level protocol, this default must be respected. > If the receiving process, by whatever means, has raised its hand and > says, effectively, "I assume a RTL context for all text display", > that is its right. You can't complain if it displays your "Hello, > world!" as shown above. Well, you *can* complain, but you wouldn't be > correct. Basically, you and the receiving process do not share the > same assumptions about the higher-level protocol involved which > specifies paragraph direction. > This, essentially, boils down to a claim that the default is not really a default, but itself must be the subject of agreement between sides. My view is that expressed by FAQ #bidi7 -- a higher-level protocol is an agreement. It can be explicit (e.g. HTML) or implicit (e.g. the convention that log files are to be read LTR), but it cannot be applied in a void, or else interoperability is lost. > OR, you are just unhappy about the bidirectional > rendering conundrums > of some edge cases for the UBA. I wish they were -- while the "Hello, World!" example is a bit of a contrition, the "SESU RETHO DNA email ROF plaintext REFERP I" example is quite cental to the UBA, and represents an extremely common case; Hebrew paragraphs with embedded English words are at least whole percents of all paragraphs written in Hebrew about technology, for example. On Mon, 16 Jul 2018 21:51:32 -0700 Asmus Freytag via Unicode wrote: > [The Unicode Standard's] conformance clause is written to allow > implementations to solve real-world issues without becoming formally > non-conformant. I accept that this was the intention; I claim that, as things are currently written, they cause more real-world issues than they solve. The only example given here of a real-world issue served by abolishing the UBA defaults is performance degradation on some special files -- which are just as easy to treat specially, as Eli described in the case of Emacs and logs. One other consideration raised boils down to, "it's better to make some texts completely unreadable, then to present some other texts readably, but with the wrong alignment". The trade-off you seem to prefer is to make the "plain text is universally readable" idea from the core Unicode definition, not applicable to BiDi text. Why? Thanks, Shai
Re: Variation Sequences (and L2-11/059)
On 7/17/2018 8:56 PM, Janusz S. "Bień" wrote: On Tue, Jul 17 2018 at 8:34 -0700, Asmus Freytag writes: On 7/16/2018 10:04 PM, Janusz S. Bień via Unicode wrote: I understand there is no sufficient demand for the Unicode Consortium maintaining a supplementary non-ideographic variation database. Hence for the time being a kind of Private Use variation database seems to be the only solution - am I right? The question comes down to resources, among other things. As well as to whether there are actual users / implementers waiting for and ready to adopt such a database as solution to their problems. I hope the resources are sufficient to improve wording of the variation sequence FAQ. Do we agree that at present users/implementers are rather misled by it? Sure, we can go either of two ways: we can state that Unicode has no, and will not have any, solution to the issue of such variants for non-ideographic scripts. That part is easy. Or, alternatively we could figure out, what the solution space might be (in the right circumstances), including some external resources for maintaining a database on an ongoing basis, and a larger well-identified community of scholars or archivists that sign up to use and support it. If a non-zero solution space exists, simply saying that there will never be any solution would be equally wrong as the current wording which points at something that is not longer part of the solution space . . . (although at one point, people thought it might be). A strawman proposal could identify these issues and some ways that they might be addressed and then ask for criteria of what the UTC might deem sufficient. Perhaps this statement should be put into FAQ, instead of "you should propose your addition as a variation sequence"? There are some additions that should be proposed for standardization, but the bar is relatively high. A./