Server move notice, Unicode

2018-07-18 Thread Rick McGowan via Unicode

Hello everyone,

On Wednesday evening (US time) July 18 the www.unicode.org server will 
again be undergoing migration. Downtime / off-line period is expected to 
be a few hours at most, beginning shortly after 17:00 Pacific time.


We apologize for the inconvenience.

Rick



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Richard Wordingham via Unicode
On Wed, 18 Jul 2018 13:43:36 + (UTC)
philip chastney via Unicode  wrote:

> 
> On Tue, 17/7/18, Richard Wordingham via Unicode 
> wrote:
> 
> > Subject: Re: UAX #9: applicability of higher-level protocols to
> > bidi plaintext To: unicode@unicode.org
> > Date: Tuesday, 17 July, 2018, 3:30 AM  
> 
> > An interesting ambiguity is "!True" v. "True!".  
> > "!True" can be read as "Not true".  
>  
> true - there are contexts where "!True" can be read as "Not true".

The context I had in mind was terse exchanges between those who have
recently programmed in C. Thus, 'true' would be read as 'true' rather
than as '1', and '!true' as 'not true'.

A longer context would usually eliminate the ambiguity.

Richard.



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Asmus Freytag via Unicode

  
  
On 7/18/2018 6:43 AM, philip chastney
  via Unicode wrote:


  except that I remember a conference where one of the paricipants noted that 
fully one-third of the time allocated to each presentation was taken up 
explaining the presenter's notation

:)

  



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Ken Whistler via Unicode



On 7/18/2018 6:43 AM, philip chastney via Unicode wrote:

there are also contexts where "Hello World!" can be read as
the function "Hello", applied to the factorial value of "World"

even though such a move wouldn't necessarily remove all ambiguity,
the easiest solution is to declare that formal notations cannot be "plain" text



Of course they can -- and (usually) should be, as they are designed that 
way. To state otherwise would just create headaches for designing 
parsers for formal notations.


I think you are confusing ambiguity of *interpretation* of bits of 
formal notation, taken out of context, with ambiguity of *display* of 
formal notations in contexts where one does not know and control the 
paragraph directionality.


The easiest (and correct) solution, when displaying formal notation for 
visual interpretation by human readers, is to use tools where one knows 
and can rely on the paragraph directionality explicitly, so that Unicode 
bidi doesn't add an out-of-left-field set of display conundrums, as it 
were, for bidi edge cases that can result in *mis*interpretation by the 
reader.


In other words, if I am trying to read C program text or regex 
expressions, I expect that my tooling is not going to silently assume a 
RTL paragraph directional context and present me with visual garbage to 
interpret, forcing me to reverse engineer the bidi algorithm in my head, 
just to read the text. Why would I put up with that?


--Ken



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread philip chastney via Unicode



On Tue, 17/7/18, Richard Wordingham via Unicode  wrote:

> Subject: Re: UAX #9: applicability of higher-level protocols to bidi plaintext
> To: unicode@unicode.org
> Date: Tuesday, 17 July, 2018, 3:30 AM

> An interesting ambiguity is "!True" v. "True!".  
> "!True" can be read as "Not true".
 
true - there are contexts where "!True" can be read as "Not true".

it's unclear from the short sample given whether "True" is a variable name,
or a Boolean constant, but there are other contexts where  "True!" 
can be read as "the factorial value of True" 

and yet others where where "!True" can be similarly interpreted

there are also contexts where "Hello World!" can be read as 
the function "Hello", applied to the factorial value of "World"

even though such a move wouldn't necessarily remove all ambiguity,
the easiest solution is to declare that formal notations cannot be "plain" text

use a higher-level protocol to identify what formal notation is being used, 
perhaps,
except that I remember a conference where one of the paricipants noted that 
fully one-third of the time allocated to each presentation was taken up 
explaining the presenter's notation

/phil





Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Asmus Freytag via Unicode

  
  
On 7/18/2018 1:51 AM, Shai Berger via
  Unicode wrote:


  The trade-off you seem to prefer is to make the "plain text
is universally readable" idea from the core Unicode definition, not
applicable to BiDi text.

Your idea would simply outlaw being able to
view text with a reader-defined stylesheet imposed on it. Such a
stylesheet should be perfectly able to impose a paragraph
direction.
Just as you might make sure that your
application gives you a choice of using a stylesheet that
"imposes" default paragraph direction.
Just because your choice makes the most
sense to you in the scenarios that you imagine, doesn't mean it
should somehow be the only choice.
There are cases where such flexibility is
much less motivated and, if allowed, potentially more harmful -
therefore, you will find little in the way of optional (higher-level
protocol) stuff for the basic encoding or even normalization.
A./

  



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Asmus Freytag via Unicode

  
  
On 7/18/2018 1:51 AM, Shai Berger via
  Unicode wrote:


  My claim is that in the absence of an agreed
or conveyed higher-level protocol, this default must be respected.

Not how higher-level protocols work in
Unicode.
If you say that you support the default,
then you better support the default but if you say otherwise,
you can single-handedly declare a higher level protocol (or
effectively declare one) and then be conformant if your HLP modifies
otherwise modifiable (tailorable) behavior.
There's no concept here of "sender and receiver"
or means of expressing binding agreements between them.
Not just for UBA, but all similar cases in
Unicode.
Now, you can create a protocol that makes
such guarantees, like W3C does for HTML/CSS where some style
values do mean "follow the default" for  a given paragraph and
you would not be conformant to either Unicode or the W3C specs
if you didn't do that.
  
A./
  
  



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Shai Berger via Unicode
On Mon, 16 Jul 2018 17:40:50 -0700
Ken Whistler via Unicode  wrote:
> 
> So your complaint seems to boil down to the claim that if you
> transmit "Hello, world!" to a process which then renders it 
> conformantly according to the Unicode Standard (including
> UBA), then that process must somehow know *and honor* 
> your intent that it display in a LTR directional context. That
> information, however, is explicitly *not* contained in
> the plain text string there, and has to be conveyed by means of a 
> higher-level protocol.
> (E.g. HTML markup as dir="ltr", etc.)
> 
I believe this is an inaccurate description, but indeed the
discrepancy is at the root of the issue here.

The UBA defines a default algorithm for determining the directionality
of plain text paragraphs. My claim is that in the absence of an agreed
or conveyed higher-level protocol, this default must be respected.

> If the receiving process, by whatever means, has raised its hand and 
> says, effectively, "I assume a RTL context for all text display",
> that is its right. You  can't complain if it displays your "Hello,
> world!" as shown above. Well, you *can* complain, but you wouldn't be
> correct. Basically, you and the receiving process do not  share the
> same assumptions about the higher-level protocol involved which
> specifies paragraph direction.
> 

This, essentially, boils down to a claim that the default is not really
a default, but itself must be the subject of agreement between sides.
My view is that expressed by FAQ #bidi7 -- a higher-level protocol is
an agreement. It can be explicit (e.g. HTML) or implicit (e.g. the
convention that log files are to be read LTR), but it cannot be
applied in a void, or else interoperability is lost.

> OR, you are just unhappy about the bidirectional
> rendering conundrums
> of some edge cases for the UBA.

I wish they were -- while the "Hello, World!" example is a bit of a
contrition, the "SESU RETHO DNA email ROF plaintext REFERP I"
example is quite cental to the UBA, and represents an extremely common
case; Hebrew paragraphs with embedded English words are at least
whole percents of all paragraphs written in Hebrew about technology, for
example.

On Mon, 16 Jul 2018 21:51:32 -0700
Asmus Freytag via Unicode  wrote:

> [The Unicode Standard's] conformance clause is written to allow
> implementations to solve real-world issues without becoming formally
> non-conformant.

I accept that this was the intention; I claim that, as things are
currently written, they cause more real-world issues than they solve.

The only example given here of a real-world issue served by abolishing
the UBA defaults is performance degradation on some special files --
which are just as easy to treat specially, as Eli described in the case
of Emacs and logs. One other consideration raised boils down to, "it's
better to make some texts completely unreadable, then to present some
other texts readably, but with the wrong alignment".

The trade-off you seem to prefer is to make the "plain text
is universally readable" idea from the core Unicode definition, not
applicable to BiDi text.

Why?

Thanks,
Shai


Re: Variation Sequences (and L2-11/059)

2018-07-18 Thread Asmus Freytag (c) via Unicode

On 7/17/2018 8:56 PM, Janusz S. "Bień" wrote:

On Tue, Jul 17 2018 at  8:34 -0700, Asmus Freytag writes:

On 7/16/2018 10:04 PM, Janusz S. Bień via Unicode wrote:

  I understand there is no sufficient demand for the Unicode Consortium
maintaining a supplementary non-ideographic variation database. Hence
for the time being  a kind of Private Use variation database seems to be
the only solution - am I right?

The question comes down to resources, among other things. As well as to whether
there are actual users / implementers waiting for and ready to adopt such a 
database
as solution to their problems.

I hope the resources are sufficient to improve wording of the variation
sequence FAQ. Do we agree that at present users/implementers are rather
misled by it?


Sure, we can go either of two ways: we can state that Unicode has no, 
and will not have any, solution to the issue of such variants for 
non-ideographic scripts. That part is easy.


Or, alternatively we could figure out, what the solution space might be 
(in the right circumstances), including some external resources for 
maintaining a database on an ongoing basis, and a larger well-identified 
community of scholars or archivists that sign up to use and support it.


If a non-zero solution space exists, simply saying that there will never 
be any solution would be equally wrong as the current wording which 
points at something that is not longer part of the solution space . . . 
(although at one point, people thought it might be).



A strawman proposal could identify these issues and some ways that they might be
addressed and then ask for criteria of what the UTC might deem sufficient.

Perhaps this statement should be put into FAQ, instead of "you should
propose your addition as a variation sequence"?


There are some additions that should be proposed for standardization, 
but the bar is relatively high.



A./