Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-20 Thread Shai Berger via Unicode
Hi Ken (and all),

Thanks for your time and patience with this.

On Thu, 19 Jul 2018 18:10:49 -0700
Ken Whistler via Unicode  wrote:

> On 7/19/2018 12:38 AM, Shai Berger via Unicode wrote:
> > If I cannot trust that
> > people I communicate with make the same choices I make, plain text
> > cannot be used.  
> 
> Here is a counterexample [a table rendered in plain text, which is
> only truly legible using a fixed-width font].
> 
> It isn't that "plain text cannot be used" to convey this content. The 
> content is certainly "legible" in the minimal sense required by the 
> Unicode Standard, and it is interchangeable without data corruption.
> The problem is that for optimal display and interpretation as
> intended, I also need to convey (and/or have the reader guess) the
> higher-level protocol requirement that this particular plain text
> needs to be displayed with a monowidth font.
> 

If I understand correctly, you are rejecting my claim that
directionality is an issue of content, and claiming that, just like
the crumbling-down of your table, it is an issue of display. But that
argument is clearly disproved by the mere presence of the
directionality-setting characters (RLM, LRE, etc) in the Unicode
character set; in other words, your example would be convincing if
Unicode included characters like "start table row" and "close table
cell", AND there was an annex saying that your lines (for whatever
reason) are to be treated as table rows unless a higher-level-protocol
said otherwise. I believe this is not the case.

> > If the Unicode standard does not impose a
> > universal default, it does not define interchangeable plain text.  
> 
> And that is simply not the case. If your text is  ( L, 
> ON>), that will display as {abc!} in a LTR paragraph directional
> ON>context and as {!abc} in a RTL paragraph directional context.


> [...] if plain text doesn't forcefully carry with it and
> require how it must be displayed, well, then it isn't really
> interchangeable.
> 
> But that isn't what the Unicode Standard means by plain text. And
> isn't what it requires for interchangeability of plain text.

If I understood your argument correctly, it amounts to a claim that
Unicode defines plain text as a component in a data format, but not to
be used as a full document. If that is correct, then there is much to
fix -- I think that quite a lot of existing technology assumes the
opposite (e.g. the use of "Content-Type: text/plain; charset=UTF-8" in
MIME should be strongly discouraged, if the people who designed
Unicode and UTF-8 think it is not appropriate for full documents).

If I misunderstood, please correct me.

> >
> > My main point, whose rejection baffles me to no end, is that it
> > should.  
> 
> Well, I'm not expecting that I can make you feel good about the 
> situation. ;-) But perhaps the UTC position will seem a little less 
> baffling.

As I hope I've shown above, there's plenty of reason for bafflement.
The UTC defines code points to encode directionality, but then refuses
to treat directionality as content when it comes to paragraph
directionality; it defines a higher-level-protocol as an agreement, and
then turns around and says the word "agreement" actually means
"decision".

I can guess reasons for why the things are the way they are, but not
justifications. I stay baffled.

Thanks,
Shai.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-19 Thread Ken Whistler via Unicode



On 7/19/2018 12:38 AM, Shai Berger via Unicode wrote:

If I cannot trust that
people I communicate with make the same choices I make, plain text
cannot be used.


Here is a counterexample. The following is a chunk of plain text output 
from the bidi reference implementation:


Trace: Entering br_UBA_IdentifyIsolatingRunSequences [X10]
Current State: 6
  Position:   0    1    2    3    4    5    6    7    8    9 10   
11   12
  Text:    05D0 2067 0061 2066 0061 202B 0061 202C 0061 2069 0061 
2069 0061
  Bidi_Class: R  RLI    L  LRI    L  RLE    L  PDF    L  PDI L  
PDI    L
  Levels: 1    1    3    3    4    x    5    x    4    3 3    
1    1
  Runs:      

  Seqs (L= 1): 


  Seqs (L= 3): 
  Seqs (L= 4):  
  Seqs (L= 5):   

If I just let that default to browser output choices (and assuming you 
read your email with a proportional display font), it becomes almost 
incomprehensible for casual reading, because the output has an 
underlying assumption that there is column alignment across lines, which 
in turn depends on a user choice of a fixed-width font for display. 
Rectifying that, the reader would then see:


Trace: Entering br_UBA_IdentifyIsolatingRunSequences [X10]
Current State: 6
  Position:   0    1    2    3    4    5    6    7 8    9   10   
11   12
  Text:    05D0 2067 0061 2066 0061 202B 0061 202C 0061 2069 0061 
2069 0061
  Bidi_Class: R  RLI    L  LRI    L  RLE    L  PDF L  PDI    L  
PDI    L
  Levels: 1    1    3    3    4    x    5    x 4    3    3    
1    1
  Runs:      

  Seqs (L= 1): 


  Seqs (L= 3): 
  Seqs (L= 4):    
  Seqs (L= 5): 

where now everything makes sense. (Well, at least if the UBA internals 
are your thing!)


It isn't that "plain text cannot be used" to convey this content. The 
content is certainly "legible" in the minimal sense required by the 
Unicode Standard, and it is interchangeable without data corruption. The 
problem is that for optimal display and interpretation as intended, I 
also need to convey (and/or have the reader guess) the higher-level 
protocol requirement that this particular plain text needs to be 
displayed with a monowidth font.



If the Unicode standard does not impose a
universal default, it does not define interchangeable plain text.


And that is simply not the case. If your text is  (ON>), that will display as {abc!} in a LTR paragraph directional context 
and as {!abc} in a RTL paragraph directional context. Reliably. It isn't 
that we don't have interchangeable plain text. We do. What you cannot do 
is predict exactly how that text will *display*, if you haven't agreed 
with your interlocutor about paragraph direction. But substantively, 
that is no different than the proportional versus monowidth font example 
I just gave.


So I think this still really boils down to the putative requirement that 
for something like "Hello, world!", bidi is just too weird, and that 
somehow plain text shouldn't be allowed to behave that way. In other 
words, if plain text doesn't forcefully carry with it and require how it 
must be displayed, well, then it isn't really interchangeable.


But that isn't what the Unicode Standard means by plain text. And isn't 
what it requires for interchangeability of plain text. (And yes, bidi is 
weird!)




My main point, whose rejection baffles me to no end, is that it should.


Well, I'm not expecting that I can make you feel good about the 
situation. ;-) But perhaps the UTC position will seem a little less 
baffling.


--Ken



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-19 Thread Eli Zaretskii via Unicode
> Date: Thu, 19 Jul 2018 10:38:18 +0300
> Cc: Asmus Freytag 
> From: Shai Berger via Unicode 
> 
> And again -- the point is interoperability. If I cannot trust that
> people I communicate with make the same choices I make, plain text
> cannot be used.

This conclusion is too extreme.  In Real Life™, every reasonable
application that supports bidirectional text will have a knob that
allows the user to force a particular paragraph direction on a region
of text.  So if you display some text you received from outside the
application, and the display looks juggled, let alone illegible, you
force the other paragraph direction, and the problem will usually be
solved.  At least IME, and I do have experience not only with Emacs.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-19 Thread Shai Berger via Unicode



On Wed, 18 Jul 2018 02:55:17 -0700 Asmus Freytag  wrote:

> On 7/18/2018 1:51 AM, Shai Berger via Unicode wrote:
> > The trade-off you seem to prefer is to make the "plain text
> > is universally readable" idea from the core Unicode definition, not
> > applicable to BiDi text.
>
> Your idea would simply outlaw being able to view text with a
> reader-defined stylesheet imposed on it. Such a stylesheet should be
> perfectly able to impose a paragraph direction.
>

This argument is essentially circular: My point is exactly
that such a stylesheet should not be able to impose paragraph
directionality, just like it should not be able to impose any other
random word reordering. Again, text directionality is an issue of
content, not presentation.

(I am well aware that the W3C's CSS definition include directionality
controls -- I'm arguing that they're appropriate for HTML, but not for
plain text)

> Just as you might make sure that your application gives you a choice
> of using a stylesheet that "imposes" default paragraph direction.
> 

And again -- the point is interoperability. If I cannot trust that
people I communicate with make the same choices I make, plain text
cannot be used. If the Unicode standard does not impose a
universal default, it does not define interchangeable plain text.

My main point, whose rejection baffles me to no end, is that it should.

Shai.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Richard Wordingham via Unicode
On Wed, 18 Jul 2018 13:43:36 + (UTC)
philip chastney via Unicode  wrote:

> 
> On Tue, 17/7/18, Richard Wordingham via Unicode 
> wrote:
> 
> > Subject: Re: UAX #9: applicability of higher-level protocols to
> > bidi plaintext To: unicode@unicode.org
> > Date: Tuesday, 17 July, 2018, 3:30 AM  
> 
> > An interesting ambiguity is "!True" v. "True!".  
> > "!True" can be read as "Not true".  
>  
> true - there are contexts where "!True" can be read as "Not true".

The context I had in mind was terse exchanges between those who have
recently programmed in C. Thus, 'true' would be read as 'true' rather
than as '1', and '!true' as 'not true'.

A longer context would usually eliminate the ambiguity.

Richard.



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Asmus Freytag via Unicode

  
  
On 7/18/2018 6:43 AM, philip chastney
  via Unicode wrote:


  except that I remember a conference where one of the paricipants noted that 
fully one-third of the time allocated to each presentation was taken up 
explaining the presenter's notation

:)

  



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Ken Whistler via Unicode



On 7/18/2018 6:43 AM, philip chastney via Unicode wrote:

there are also contexts where "Hello World!" can be read as
the function "Hello", applied to the factorial value of "World"

even though such a move wouldn't necessarily remove all ambiguity,
the easiest solution is to declare that formal notations cannot be "plain" text



Of course they can -- and (usually) should be, as they are designed that 
way. To state otherwise would just create headaches for designing 
parsers for formal notations.


I think you are confusing ambiguity of *interpretation* of bits of 
formal notation, taken out of context, with ambiguity of *display* of 
formal notations in contexts where one does not know and control the 
paragraph directionality.


The easiest (and correct) solution, when displaying formal notation for 
visual interpretation by human readers, is to use tools where one knows 
and can rely on the paragraph directionality explicitly, so that Unicode 
bidi doesn't add an out-of-left-field set of display conundrums, as it 
were, for bidi edge cases that can result in *mis*interpretation by the 
reader.


In other words, if I am trying to read C program text or regex 
expressions, I expect that my tooling is not going to silently assume a 
RTL paragraph directional context and present me with visual garbage to 
interpret, forcing me to reverse engineer the bidi algorithm in my head, 
just to read the text. Why would I put up with that?


--Ken



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread philip chastney via Unicode



On Tue, 17/7/18, Richard Wordingham via Unicode  wrote:

> Subject: Re: UAX #9: applicability of higher-level protocols to bidi plaintext
> To: unicode@unicode.org
> Date: Tuesday, 17 July, 2018, 3:30 AM

> An interesting ambiguity is "!True" v. "True!".  
> "!True" can be read as "Not true".
 
true - there are contexts where "!True" can be read as "Not true".

it's unclear from the short sample given whether "True" is a variable name,
or a Boolean constant, but there are other contexts where  "True!" 
can be read as "the factorial value of True" 

and yet others where where "!True" can be similarly interpreted

there are also contexts where "Hello World!" can be read as 
the function "Hello", applied to the factorial value of "World"

even though such a move wouldn't necessarily remove all ambiguity,
the easiest solution is to declare that formal notations cannot be "plain" text

use a higher-level protocol to identify what formal notation is being used, 
perhaps,
except that I remember a conference where one of the paricipants noted that 
fully one-third of the time allocated to each presentation was taken up 
explaining the presenter's notation

/phil





Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Asmus Freytag via Unicode

  
  
On 7/18/2018 1:51 AM, Shai Berger via
  Unicode wrote:


  The trade-off you seem to prefer is to make the "plain text
is universally readable" idea from the core Unicode definition, not
applicable to BiDi text.

Your idea would simply outlaw being able to
view text with a reader-defined stylesheet imposed on it. Such a
stylesheet should be perfectly able to impose a paragraph
direction.
Just as you might make sure that your
application gives you a choice of using a stylesheet that
"imposes" default paragraph direction.
Just because your choice makes the most
sense to you in the scenarios that you imagine, doesn't mean it
should somehow be the only choice.
There are cases where such flexibility is
much less motivated and, if allowed, potentially more harmful -
therefore, you will find little in the way of optional (higher-level
protocol) stuff for the basic encoding or even normalization.
A./

  



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Asmus Freytag via Unicode

  
  
On 7/18/2018 1:51 AM, Shai Berger via
  Unicode wrote:


  My claim is that in the absence of an agreed
or conveyed higher-level protocol, this default must be respected.

Not how higher-level protocols work in
Unicode.
If you say that you support the default,
then you better support the default but if you say otherwise,
you can single-handedly declare a higher level protocol (or
effectively declare one) and then be conformant if your HLP modifies
otherwise modifiable (tailorable) behavior.
There's no concept here of "sender and receiver"
or means of expressing binding agreements between them.
Not just for UBA, but all similar cases in
Unicode.
Now, you can create a protocol that makes
such guarantees, like W3C does for HTML/CSS where some style
values do mean "follow the default" for  a given paragraph and
you would not be conformant to either Unicode or the W3C specs
if you didn't do that.
  
A./
  
  



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-18 Thread Shai Berger via Unicode
On Mon, 16 Jul 2018 17:40:50 -0700
Ken Whistler via Unicode  wrote:
> 
> So your complaint seems to boil down to the claim that if you
> transmit "Hello, world!" to a process which then renders it 
> conformantly according to the Unicode Standard (including
> UBA), then that process must somehow know *and honor* 
> your intent that it display in a LTR directional context. That
> information, however, is explicitly *not* contained in
> the plain text string there, and has to be conveyed by means of a 
> higher-level protocol.
> (E.g. HTML markup as dir="ltr", etc.)
> 
I believe this is an inaccurate description, but indeed the
discrepancy is at the root of the issue here.

The UBA defines a default algorithm for determining the directionality
of plain text paragraphs. My claim is that in the absence of an agreed
or conveyed higher-level protocol, this default must be respected.

> If the receiving process, by whatever means, has raised its hand and 
> says, effectively, "I assume a RTL context for all text display",
> that is its right. You  can't complain if it displays your "Hello,
> world!" as shown above. Well, you *can* complain, but you wouldn't be
> correct. Basically, you and the receiving process do not  share the
> same assumptions about the higher-level protocol involved which
> specifies paragraph direction.
> 

This, essentially, boils down to a claim that the default is not really
a default, but itself must be the subject of agreement between sides.
My view is that expressed by FAQ #bidi7 -- a higher-level protocol is
an agreement. It can be explicit (e.g. HTML) or implicit (e.g. the
convention that log files are to be read LTR), but it cannot be
applied in a void, or else interoperability is lost.

> OR, you are just unhappy about the bidirectional
> rendering conundrums
> of some edge cases for the UBA.

I wish they were -- while the "Hello, World!" example is a bit of a
contrition, the "SESU RETHO DNA email ROF plaintext REFERP I"
example is quite cental to the UBA, and represents an extremely common
case; Hebrew paragraphs with embedded English words are at least
whole percents of all paragraphs written in Hebrew about technology, for
example.

On Mon, 16 Jul 2018 21:51:32 -0700
Asmus Freytag via Unicode  wrote:

> [The Unicode Standard's] conformance clause is written to allow
> implementations to solve real-world issues without becoming formally
> non-conformant.

I accept that this was the intention; I claim that, as things are
currently written, they cause more real-world issues than they solve.

The only example given here of a real-world issue served by abolishing
the UBA defaults is performance degradation on some special files --
which are just as easy to treat specially, as Eli described in the case
of Emacs and logs. One other consideration raised boils down to, "it's
better to make some texts completely unreadable, then to present some
other texts readably, but with the wrong alignment".

The trade-off you seem to prefer is to make the "plain text
is universally readable" idea from the core Unicode definition, not
applicable to BiDi text.

Why?

Thanks,
Shai


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-16 Thread Asmus Freytag via Unicode

  
  
On 7/16/2018 8:30 PM, Richard
  Wordingham via Unicode wrote:


  On Mon, 16 Jul 2018 10:53:03 +0300
Shai Berger via Unicode  wrote:


  
What I'm not OK with is:

!Hello, World

Which is what you'll see if your editor decides to use RTL
directionality for this file, as the FAQ says it may.

  
  
Using 'left aligned' for RTL and 'right aligned' for LTR are 'marked'
styles; they are not appropriate for uninterpreted plain text.  Thus if
text is to displayed as left aligned, LTR defaults are appropriate.
With RTL default and right alignment, what looks like

 !Hello, World

is much more acceptable for "Hello, World!".

An interesting ambiguity is "!True" v. "True!".  "!True" can be read as
"Not true".

The solution may be to encourage the determination of the (default)
paragraph direction from the first paragraph for implementations with
only one margin.  I am not sure if this behaviour is 'standard
compliant'.




The Unicode Standard uses the term "conformant".

Its conformance clause is written to allow implementations to solve
real-world issues
without becoming formally non-conformant.

Given that the Unicode Standard is intended to be applicable to all
applications and 
all texts, a certain latitude is not only expected, it is essential.

In this case, the rules are clear, implementations may override the
paragraph direction
and there is no constraint as to how they arrive at their choice.

Ideally, there choice is documented, and users who want something
different would have
the choice of a setting (or an alternate implementation).

Likewise, plain text is generally not sufficient for all real and
imagined contents. At some
point you will run into special needs that require some amount of
"styling" information
to be sure the receiver can interpret it unambiguously.

(A mild form of that is the common device of using  italics in
marking the stress for some 
ambiguous sentences in English. A historic example would have have
been the alternation
between Fraktur font and Roman font for German texts containing
foreign words - there are
examples where you will loose some content if you cannot mark that
distinction).

I really like the way Ken put it, essentially, if you (as an author)
want to have control over
how the reader sees your text, then you need to agree on a
higher-level protocol. Normally,
that would mean styled text, in practice, but it could also be an
agreement on what text
editor to use (perhaps one with two margins ..)

A./
  



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-16 Thread Richard Wordingham via Unicode
On Mon, 16 Jul 2018 10:53:03 +0300
Shai Berger via Unicode  wrote:

> What I'm not OK with is:
> 
> !Hello, World
> 
> Which is what you'll see if your editor decides to use RTL
> directionality for this file, as the FAQ says it may.

Using 'left aligned' for RTL and 'right aligned' for LTR are 'marked'
styles; they are not appropriate for uninterpreted plain text.  Thus if
text is to displayed as left aligned, LTR defaults are appropriate.
With RTL default and right alignment, what looks like

 !Hello, World

is much more acceptable for "Hello, World!".

An interesting ambiguity is "!True" v. "True!".  "!True" can be read as
"Not true".

The solution may be to encourage the determination of the (default)
paragraph direction from the first paragraph for implementations with
only one margin.  I am not sure if this behaviour is 'standard
compliant'.

Richard.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-16 Thread Ken Whistler via Unicode



On 7/16/2018 3:51 PM, Shai Berger via Unicode wrote:

And I should add, in response to the other points raised in this
thread, from the same page in the core standard: "If the same plain text
sequence is given to disparate rendering processes, there is no
expectation that rendered text in each instance should have the same
appearance. Instead, the disparate rendering processes are simply
required to make the text legible according to the intended reading."
That paragraph ends with the following summary, emphasized in the
source:

Plain text must contain enough information to permit the text
to be rendered legibly, and nothing more.

The last answer inhttp://www.unicode.org/faq/bidi.html  violates this
dictum, as I have showed here with different examples. As long as it
stands, the Unicode standard fails its own criteria.


I've been trying to following your reasoning in this long thread, but am 
still not
finding much to convince that there is anything wrong in the #bidi8 FAQ 
entry

that you keep claiming is wrong.

First, for your "Hello, world!" example, in a rendering that imposes a 
RTL directional

context, the correct, conformant display of that string is:

!Hello, world

as you cited in your earlier example. To do otherwise, would represent a 
*non*-conformant

implementation of the UBA.

So your complaint seems to boil down to the claim that if you transmit 
"Hello, world!" to
a process which then renders it conformantly according to the Unicode 
Standard (including
UBA), then that process must somehow know *and honor* your intent that 
it display
in a LTR directional context. That information, however, is explicitly 
*not* contained in
the plain text string there, and has to be conveyed by means of a 
higher-level protocol.

(E.g. HTML markup as dir="ltr", etc.)

If the receiving process, by whatever means, has raised its hand and 
says, effectively,
"I assume a RTL context for all text display", that is its right. You 
can't complain if it
displays your "Hello, world!" as shown above. Well, you *can* complain, 
but you
wouldn't be correct. Basically, you and the receiving process do not 
share the same
assumptions about the higher-level protocol involved which specifies 
paragraph

direction.

So as I see it, you are either wanting the plain text to somehow contain 
and enforce
upon the renderer your assumption about the directional context that it 
should be
displayed in, OR, you are just unhappy about the bidirectional rendering 
conundrums
of some edge cases for the UBA. In either case, the remedy is the 
application of
LTR characters to provide context (or directional isolate controls, or 
explicit

higher-level markup).

--Ken



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-16 Thread Shai Berger via Unicode
Hi Eli and all,

On Sat, 14 Jul 2018 14:07:50 +0300
Eli Zaretskii via Unicode  wrote:

> From: Shai Berger 
> > 
> > I have no argument with this, but I do think that in such cases it
> > is wrong for the app to pretend that it is still treating the text
> > as plain.  
> 
> What is "plain text" in this context? 
> 

Plain text here is the thing described in subsection "Plain Text" in the
core unicode standard, Chapter 2 Section 2 "General Structure: Unicode
Design Principles". In terms of composition, it is "a pure sequence of
character codes"; in terms of function, it is "public, standardized,
and universally readable".

> Does, for example, text with bidi formatting controls count as
> "plain"?

So long as the bidi controls are Unicode characters, I'd say "yes" --
according to the definitions above. The one thing I would disagree with
is calling them "formatting controls" -- as I believe they encode
semantics, not appearance.

And I should add, in response to the other points raised in this
thread, from the same page in the core standard: "If the same plain text
sequence is given to disparate rendering processes, there is no
expectation that rendered text in each instance should have the same
appearance. Instead, the disparate rendering processes are simply
required to make the text legible according to the intended reading."
That paragraph ends with the following summary, emphasized in the
source:

Plain text must contain enough information to permit the text
to be rendered legibly, and nothing more.

The last answer in http://www.unicode.org/faq/bidi.html violates this
dictum, as I have showed here with different examples. As long as it
stands, the Unicode standard fails its own criteria.

Thanks,
Shai.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-16 Thread Shai Berger via Unicode
On Sat, 14 Jul 2018 12:14:35 -0700
Asmus Freytag via Unicode  wrote:

> I would say the problem lies in the attempt to exchange arbitrary raw
> data and expect perfectly compatible rendering [...] Editors for
> plain text will wrap or not wrap lines on presentation [...] The bidi
> case is just another such case 

This is not about "perfectly compatible rendering", it is about
legible rendering. As specified in the Unicode standard. Another
example below.

On Sat, 14 Jul 2018 14:15:37 +0100
Richard Wordingham via Unicode  wrote:

> If the display concept is to treat lines as being of unbounded length,
> one needs a left margin, a right margin, or perhaps one centres each
> line.  Centred text does not strike me as 'plain'.

You seem to be confounding directionality with alignment. While, for
plaintext, I would find it preferable if the two always matched, this
is not what I'm asking for, and not what I see as a requirement for
making plain text usable.

To be clear: If I write a file containing a single line (this is all
English, no special use of capitals), the iconic:

Hello, World!

then, when I open this file in a standard-compliant editor, I'm ok with
seeing (centered)

  Hello, World!

or (right aligned)

  Hello, World!

or even (wrapped at a surprisingly short line length)

Hello, 
World!

Indeed, these are presentation issues, where fidelity is not expected,
almost on a level with font and color.

What I'm not OK with is:

!Hello, World

Which is what you'll see if your editor decides to use RTL
directionality for this file, as the FAQ says it may. What I'm asking
is that we stop calling this behavior "standard compliant"; and I refer
you back to my first message in this thread[1] for an example of the
mess that this creates with true BiDi text.

Thanks,
Shai.

[1] http://unicode.org/pipermail/unicode/2018-July/006702.html


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-15 Thread Richard Wordingham via Unicode
On Sat, 14 Jul 2018 12:14:35 -0700
Asmus Freytag via Unicode  wrote:

> The bidi case is just another such case where you cannot expect any
> fidelity in presentation whatsoever. (And certainly not in the case
> of degenerate files containing all but one weak character).

It's going a bit far to call an ASCII histogram degenerate.

Richard.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-14 Thread Asmus Freytag via Unicode

  
  
I would say the problem lies in the attempt to exchange arbitrary
raw data and expect perfectly compatible rendering.

In the absence of very explicit markup there's simply no expectation
that all users see precisely the same thing. Editors for plain text
will wrap or not wrap lines on presentation, for example, and if
they do, the wrapping may depend on the width of the window.

The bidi case is just another such case where you cannot expect any
fidelity in presentation whatsoever. (And certainly not in the case
of degenerate files containing all but one weak character).

A./


  



Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-14 Thread Richard Wordingham via Unicode
On Sat, 14 Jul 2018 13:09:11 +0300
Shai Berger via Unicode  wrote:

> On Fri, 13 Jul 2018 11:22:51 +0300
> Eli Zaretskii via Unicode  wrote:
> 
> > 
> > Different applications will have different needs here, so there's
> > definitely a need to provide applications and users with some
> > control of paragraph direction, and the way to do this is define
> > high-level protocols controlled by some optional variables.  A
> > well-known example of that is the paragraph-direction buttons in
> > Word and similar processors (although they don't produce plain
> > text, so the analogy is limited).  
> 
> I have no argument with this, but I do think that in such cases it is
> wrong for the app to pretend that it is still treating the text as
> plain.

The problem with your concept of 'plain text' is that there is almost no
such thing.  To display text, one has to choose a basic writing
direction - direction within lines (LTR, RTL, TTB or BTT) and direction
from line to line (TTB, BTT, LTR or RTL) - and that's ignoring
boustrophedon variants and specialised cases such as 'round robin' or
the spiral of the Phaistos disc.

If the display concept is to treat lines as being of unbounded length,
one needs a left margin, a right margin, or perhaps one centres each
line.  Centred text does not strike me as 'plain'.  Centred text is the
only one that can handle paragraphs of different directionality well in
this concept.

Lines of unbounded length is the natural choice for editors for
programming languages - lines are often syntactically significant.
They are also syntactically relevant for emails in point by
point discussions.

The default BiDi rule for the basic directionality of paragraphs usually
works when there is a left margin and a right margin, though buffering
makes it impossible to bound the amount of memory required.  Note that
several key utilities limit the number of combining marks or the length
of Indic syllables.

Richard.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-14 Thread Eli Zaretskii via Unicode
> Date: Sat, 14 Jul 2018 13:09:11 +0300
> From: Shai Berger 
> Cc: Eli Zaretskii 
> 
> I have no argument with this, but I do think that in such cases it is
> wrong for the app to pretend that it is still treating the text as
> plain.

What is "plain text" in this context?  Does, for example, text with
bidi formatting controls count as "plain"?

> If it is an email client, it should use a mime-type such as
> (just inventing something here) "text/plain:ltr" instead of
> "text/plain".

As long as such mime-types don't exist, we cannot use them, right?

> Emacs should have an LTR-defaulting "Log mode" for log
> files, while keeping the UBA default for Text mode.

That's exactly what Emacs does.

> With the current definitions and FAQ, plain text is simply not a
> viable option for intercahnge whenever BiDi is involved. The only
> upside I see for them is, essentially, what Eli and Richard noted: The
> possibility for improved performance in fringe use-cases. I
> repeat/rephrase my original question: 

I don't think those use cases are fringe, FWIW.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-14 Thread Shai Berger via Unicode
On Fri, 13 Jul 2018 11:22:51 +0300
Eli Zaretskii via Unicode  wrote:

> 
> Different applications will have different needs here, so there's
> definitely a need to provide applications and users with some control
> of paragraph direction, and the way to do this is define high-level
> protocols controlled by some optional variables.  A well-known example
> of that is the paragraph-direction buttons in Word and similar
> processors (although they don't produce plain text, so the analogy is
> limited).

I have no argument with this, but I do think that in such cases it is
wrong for the app to pretend that it is still treating the text as
plain. If it is an email client, it should use a mime-type such as
(just inventing something here) "text/plain:ltr" instead of
"text/plain". Emacs should have an LTR-defaulting "Log mode" for log
files, while keeping the UBA default for Text mode.

With the current definitions and FAQ, plain text is simply not a
viable option for intercahnge whenever BiDi is involved. The only
upside I see for them is, essentially, what Eli and Richard noted: The
possibility for improved performance in fringe use-cases. I
repeat/rephrase my original question: 

The preference expressed by the Bidi FAQ, allowing programs to apply
hifger-level protocols to plain-text with no limitation, affords
performance improvements in fringe cases, for the price of giving up
"Plain text must contain enough information to permit the text to be
rendered legibly"[1] where BiDi is involved. Are there other
upsides? And whether there are or not -- Does the trade-off reflect
the intentions of the UTC? Do they realize how deeply BiDi plaintext is
broken?

Thanks for your attention and consideration,

Shai.

[1] http://www.unicode.org/versions/Unicode11.0.0/ch02.pdf page 19


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-13 Thread Eli Zaretskii via Unicode
> Date: Fri, 13 Jul 2018 08:57:25 +0100
> From: Richard Wordingham via Unicode 
> 
> Even just for horizontal text, one problem is the shape of the canvas.
> If it has a left and a right-hand margin, than having an undetermined
> direction by default can work, given enough memory.  The rendering
> system then has to have enough memory to store the entire paragraph -
> the strongly directional character may be the last one in the
> paragraph.  I'm not sure that a protocol is allowed to be based on
> analysing the first 100 characters of a paragraph.

Indeed.  We've discovered this problem in Emacs when the UBA was
implemented: some buffers, like those visiting log files, have very
long stretches of weak characters (digits and punctuation), which
require the automatic paragraph direction search very far, potentially
slowing down the display engine.

> However, it is common for displays to provide a window into a canvas
> that is unbounded both downwards and either rightwards or leftwards.
> If it is unbounded rightwards, one needs an LTR paragraph direction: if
> it is unbounded leftwards, one needs an RTL paragraph direction.

Yes.  In Emacs, there are commands that display text derived from
standardized templates.  In these cases, we cannot rely on the default
determination of the paragraph direction, because the first strong
directional character might be unpredictable.  We must force a certain
paragraph direction in those cases.

> I believe that having a mix of paragraphs unbounded on the left and
> paragraphs unbounded on the right would feel distinctly odd; it
> could also be a challenge to manage panning the window.  It also
> raises the question of where the LTR and RTL paragraphs would
> overlap.

Different applications will have different needs here, so there's
definitely a need to provide applications and users with some control
of paragraph direction, and the way to do this is define high-level
protocols controlled by some optional variables.  A well-known example
of that is the paragraph-direction buttons in Word and similar
processors (although they don't produce plain text, so the analogy is
limited).


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-13 Thread Richard Wordingham via Unicode
On Tue, 10 Jul 2018 19:40:59 +0300
Shai Berger via Unicode  wrote:

An agreement can take the form of a hidden condition that if one uses
someone else's software, one accepts what that software chooses to do.
(Most notoriously, this applies to the PUA.)  There seem to be no
applicable rules against 'unfair terms'.

> On Tue, 10 Jul 2018 13:37:56 +0200
> Philippe Verdy via Unicode  wrote:

> > A plain text editor should not have a default strong LTR default, it
> > should have a weak undetermined direction,  

> I agree -- but the UTC does not, according to the last entry in
> http://www.unicode.org/faq/bidi.html. I would like to convince them
> otherwise, or to be shown why my position is wrong.

Even just for horizontal text, one problem is the shape of the canvas.
If it has a left and a right-hand margin, than having an undetermined
direction by default can work, given enough memory.  The rendering
system then has to have enough memory to store the entire paragraph -
the strongly directional character may be the last one in the
paragraph.  I'm not sure that a protocol is allowed to be based on
analysing the first 100 characters of a paragraph.

However, it is common for displays to provide a window into a canvas
that is unbounded both downwards and either rightwards or leftwards.
If it is unbounded rightwards, one needs an LTR paragraph direction: if
it is unbounded leftwards, one needs an RTL paragraph direction.  I
believe that having a mix of paragraphs unbounded on the left and
paragraphs unbounded on the right would feel distinctly odd; it could
also be a challenge to manage panning the window.  It also raises the
question of where the LTR and RTL paragraphs would overlap.

Richard.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-10 Thread Shai Berger via Unicode
Hello Philippe,

On Tue, 10 Jul 2018 13:37:56 +0200
Philippe Verdy via Unicode  wrote:

> A plain text editor should not have a default strong LTR default, it
> should have a weak undetermined direction,

I agree -- but the UTC does not, according to the last entry in
http://www.unicode.org/faq/bidi.html. I would like to convince them
otherwise, or to be shown why my position is wrong.

Thanks,
Shai.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-10 Thread Eli Zaretskii via Unicode
> Date: Tue, 10 Jul 2018 13:37:56 +0200
> Cc: unicode Unicode Discussion 
> From: Philippe Verdy via Unicode 
> 
> Your "standard compliant" plain text editor just forces a LTR default for the 
> whole document, and does not
> tolerate that individual paragraphs may start with an undetermined direction 
> (which should then be determined
> by the first character on the line that defines a direction.)
> In my opinion, even if your text editor still does not enforce the default 
> left margin side for aligning the text, it
> should still treat individual paragraphs isolately and determine the 
> direction to use (each paragraph break
> should cancel the direction inheritance).
> 
> A plain text editor should not have a default strong LTR default, it should 
> have a weak undetermined direction,
> independantly of the fact that it will align the pagraph to the left of right 
> margin according to the resolved
> direction of the first character.

I think you may be missing the point.  The issue raised by Shai is not
what should be the default, the issue is whether each program can have
its own rules for overriding the default paragraph direction by
applying "higher-level" protocols private to the program, and not
shared by other programs when they present the exact same text.

There's no argument about the default -- it should indeed behave as
described in the UBA, i.e. look for the first string directional
character in each isolate run.


Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-10 Thread Philippe Verdy via Unicode
Your "standard compliant" plain text editor just forces a LTR default for
the whole document, and does not tolerate that individual paragraphs may
start with an undetermined direction (which should then be determined by
the first character on the line that defines a direction.)
In my opinion, even if your text editor still does not enforce the default
left margin side for aligning the text, it should still treat individual
paragraphs isolately and determine the direction to use (each paragraph
break should cancel the direction inheritance).

A plain text editor should not have a default strong LTR default, it should
have a weak undetermined direction, independantly of the fact that it will
align the pagraph to the left of right margin according to the resolved
direction of the first character. That's what web browsers are doing for
example in input fields (where automatic side of the start margin does not
change when you start typing some text in the input field and there's no
"text-align:left" or "text-align:right" to force it, just
"text-align:justify" or "text-align:normal"; note that CSS
"text-align:justify" positions the start margin according to the CSS
direction of the container element, this makes a difference for the last
line of the paragraph, but with automatic determination of an unspecified
direction, a justified paragraph may look ugly if this does not also
properly sets the start margin of the paragraph according to the resolved
direction of the first character of the paragraph or block element

Note also that images or other inline objects embedded in paragraphs/block
also don't have a defined strong direction for themselves, they act like
Unicode "isolates", but you may want to style them to set its outer
direction, independantly of the inner direction of the isolate; I'm not
sure however if images e.g. in SVG, may inherit their direction from the
outer context of the isolate, but if they do, I doubt it can, then they are
acting more like the old-fashioned Unicode "embeds" rather than "isolates",
except that what is after the image should not depend on the last direction
used inside the SVG; images should be completely isolated from their
context of use and completly define their expected rendering; SVG images
also contain their own upper layer protocol as they can embeded mutliple
texts, but in the context of the SVG document; now with SVG elements
directly in the HTML5 DOM as plain elements, the situation may have changed
because they can inherit many things from the HTML5 doc, including shared
stylesheets...).


2018-07-10 0:33 GMT+02:00 Shai Berger via Unicode :

> Hello all,
>
> About two and a half years ago, I suggested adding a FAQ about the
> applicability of higher-level protocols for bidirectional plaintext, as
> specified by http://www.unicode.org/reports/tr9/ -- my suggestion was
> to clarify that higher-level protocols can only be applied upon
> agreement between all producers and consumers, and that such agreements
> effectively mean that the text is "special text" -- no longer plain.
>
> In the time since then, I have been mostly removed from this issue, but
> I came back to it recently, to find that my suggested text was
> rejected, and instead, two FAQs were added to
> http://www.unicode.org/faq/bidi.html: The first, which is marked by the
> HTML anchor bidi7, goes with my understanding and defines a
> higher-level protocol as an agreement; but the second, marked as bidi8,
> goes the other way, and explains that actually, agreement is not
> necessary -- a program is at liberty to "implicitly define an overall
> directional context for display, and that implicit definition of
> direction is itself an example of application of a higher-level
> protocol for the purposes of the UBA".
>
> One result of this is the following scenario: I open my
> standard-compliant text editor, and write a line of text (to make
> things accessible to a wider audience, I use capitals for right-to-left
> English and small letters for normal, left-to-right English; note this
> sentence starts from the right):
>
> SESU RETHO DNA email ROF plaintext REFERP I
>
> I save this line in a text file. Then I display it using my
> standards-compliant text viewer, but now it looks like this:
>
> REFERP I plaintext ROF email SESU RETHO DNA
>
> And this is because my standard-compliant text-viewer chooses to apply
> its higher-level protocol and treat the line as a LTR paragraph.
>
> Since bidi8 is a little abstract on this point, and focuses on terminal
> windows rather than editors and viewers, I would like to ask:
> Does this concrete result represent the intents of the UTC?
>
> Thanks for your attention,
>
> Shai.
>


UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-09 Thread Shai Berger via Unicode
Hello all,

About two and a half years ago, I suggested adding a FAQ about the
applicability of higher-level protocols for bidirectional plaintext, as
specified by http://www.unicode.org/reports/tr9/ -- my suggestion was
to clarify that higher-level protocols can only be applied upon
agreement between all producers and consumers, and that such agreements
effectively mean that the text is "special text" -- no longer plain.

In the time since then, I have been mostly removed from this issue, but
I came back to it recently, to find that my suggested text was
rejected, and instead, two FAQs were added to
http://www.unicode.org/faq/bidi.html: The first, which is marked by the
HTML anchor bidi7, goes with my understanding and defines a
higher-level protocol as an agreement; but the second, marked as bidi8,
goes the other way, and explains that actually, agreement is not
necessary -- a program is at liberty to "implicitly define an overall
directional context for display, and that implicit definition of
direction is itself an example of application of a higher-level
protocol for the purposes of the UBA".

One result of this is the following scenario: I open my
standard-compliant text editor, and write a line of text (to make
things accessible to a wider audience, I use capitals for right-to-left
English and small letters for normal, left-to-right English; note this
sentence starts from the right):

SESU RETHO DNA email ROF plaintext REFERP I

I save this line in a text file. Then I display it using my
standards-compliant text viewer, but now it looks like this:

REFERP I plaintext ROF email SESU RETHO DNA

And this is because my standard-compliant text-viewer chooses to apply
its higher-level protocol and treat the line as a LTR paragraph.

Since bidi8 is a little abstract on this point, and focuses on terminal
windows rather than editors and viewers, I would like to ask:
Does this concrete result represent the intents of the UTC?

Thanks for your attention,

Shai.