from:"via Unicode"

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Richard Wordingham via Unicode

On Fri, 01 Feb 2019 15:18:13 -0700
Doug Ewell via Unicode  wrote:

> Richard Wordingham wrote:
>  
> > Language tagging is already available in Unicode, via the tag
> > characters in the deprecated plane.  
>  
> Plane 14 isn't deprecated -- that isn't a property of planes -- and
> the tag characters U+E0020 through U+E007E have been un-deprecated
> for use with emoji flags. Only U+E0001 LANGUAGE TAG and U+E007F
> CANCEL TAG are deprecated.

Unicode may not deprecate the tag characters, but the characters of
Plane 14 are widely deplored, despised or abhorred.  That is why I think
of it as the deprecated plane.

Richard.

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Khaled Hosny via Unicode

On Fri, Feb 01, 2019 at 06:57:43PM +, Richard Wordingham via Unicode wrote:
> On Fri, 1 Feb 2019 13:02:45 +0200
> Khaled Hosny via Unicode  wrote:
> 
> > On Thu, Jan 31, 2019 at 11:17:19PM +, Richard Wordingham via
> > Unicode wrote:
> > > On Thu, 31 Jan 2019 12:46:48 +0100
> > > Egmont Koblinger  wrote:
> > > 
> > > No.  How many cells do CJK ideographs occupy?  We've had a strong
> > > hint that a medial BEH should occupy one cell, while an isolated
> > > BEH should occupy two.  
> > 
> > Monospaced Arabic fonts (there are not that many of them) are designed
> > so that all forms occupy just one cell (most even including the
> > mandatory lam-alef ligatures), unlike CJK fonts.
> > 
> > I can imagine the terminal restricting itself to monspaced fonts,
> > disable “liga” feature just in case, and expect the font to well
> > behave. Any other magic is likely to fail.
> 
> Of course, strictly speaking, a monospaced font cannot support harakat
> as Egmont has proposed.

There are two approaches for handling them in monospaced fonts;
combining them with base characters as usual, or as spacing characters
placed next to their bases. The later approach is a bit unusual, but
makes editing heavily voweled text a bit more pleasant. It requires good
OpenType support, though, so virtually no terminal supports it.

Regards,
Khaled

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Kent Karlsson via Unicode

Den 2019-02-01 19:57, skrev "Richard Wordingham via Unicode"
:

> On Fri, 1 Feb 2019 13:02:45 +0200
> Khaled Hosny via Unicode  wrote:
> 
>> On Thu, Jan 31, 2019 at 11:17:19PM +0000, Richard Wordingham via
>> Unicode wrote:
>>> On Thu, 31 Jan 2019 12:46:48 +0100
>>> Egmont Koblinger  wrote:
>>> 
>>> No.  How many cells do CJK ideographs occupy?  We've had a strong
>>> hint that a medial BEH should occupy one cell, while an isolated
>>> BEH should occupy two.
>> 
>> Monospaced Arabic fonts (there are not that many of them) are designed
>> so that all forms occupy just one cell (most even including the
>> mandatory lam-alef ligatures), unlike CJK fonts.
>> 
>> I can imagine the terminal restricting itself to monspaced fonts,
>> disable ³liga² feature just in case, and expect the font to well
>> behave. Any other magic is likely to fail.
> 
> Of course, strictly speaking, a monospaced font cannot support harakat
> as Egmont has proposed.
> 
> Richard.

(harakat: non-spacing vowel mark in Arabic)

"Monospaced font" is really a concept with modification. Even for
"plain old ASCII" there are two advance widths, not just one: 0 for
control characters (and escape/control sequences, neither of which
should directly consult the font; even such things as OSC sequences,
but the latter are a bad idea to have in any line one might wish to
edit (vi/emacs/...) via a terminal emulator window). But terminals
(read terminal emulators) can deal with mixed single width and double
width characters (which is, IIUC, the motivation for the datafile
EastAsianWidth.txt). Likewise non-spacing combining characters should
be possible to deal reasonably with.

It is a lot more difficult to deal with BiDi in a terminal emulator,
also shaping may be hard to do, as well as reordering (or even
splitting) combining characters. All sorts of problems arise; feeding
the emulator a character (or "short" strings) at a time not allowed
to buffer for display (causing reshaping or movement of already
displayed characters, edit position movement even within a single
line, etc.). Even if solvable for a "GUI" text editor (not via a
terminal), they do not seem to be workable in a terminal (emulator)
setting. Esp. not if one also wants to support multiline editing
(vi/emacs/...) or even single-line editing.

As long as editing is limited to a single line (such as the system
line editor, or an "enhanced functionality" line editor (such as
that used for bash; moving in the history sets the edit position
at EOL) even variable width ("proportional) fonts should not pose
a major problem. But for multiline editors (à la vi/emacs) it would
not be possible to synch nicely (unless one accepts strange jums)
the visual edit position and the actual edit position in the edit
buffer: The program would not have access to the advance width data
from the font that the terminal emulator uses, unless one
revolutionise what terminal emulators do... (And I don't see a
case for doing that.) But both a terminal emulator and multiline
editing programs (for terminal emulators) still can have access
to EastAsianWidth data as well as which characters are non-spacing;
those are not font dependent. (There might be some glitches if
the Unicode versions used do not match (the terminal emulator
and the program being run are most often on different systems),
but only for characters where these properties have changed,
e.g. newly allocated non-spacing marks.)

/Kent K

PS
No, I have not done extensive testing of various terminal emulators
on how well the handle the stuff above.

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Ken Whistler via Unicode


Richard,

On 2/1/2019 1:30 PM, Richard Wordingham via Unicode wrote:


Language tagging is already available in Unicode, via the tag characters
in the deprecated plane.


Recte:

1. Plane 14 is not a "deprecated plane".

2. The tag characters in Tag Character block (U+E..U+E007F) are not 
deprecated. (They are used, for example, by UTS #51 to specify emoji tag 
sequences.)


3. However, the use of U+E0001 LANGUAGE TAG and the mechanism of using 
tag characters for spelling out language tags are explicitly deprecated 
by the standard. See: "Deprecated Use for Language Tagging" in Section 
23.9 Tag Characters.


https://www.unicode.org/versions/Unicode11.0.0/ch23.pdf#G30427

and PropList.txt:

E0001 ; Deprecated # Cf   LANGUAGE TAG

As I stated earlier: language tags should use BCP 47, and belong in the 
markup level, not in the plain text stream.


--Ken

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Andrew West via Unicode

On Fri, 1 Feb 2019 at 22:20, Doug Ewell via Unicode  wrote:
>
> Richard Wordingham wrote:
>
> > Language tagging is already available in Unicode, via the tag
> > characters in the deprecated plane.
>
> Plane 14 isn't deprecated -- that isn't a property of planes -- and the
> tag characters U+E0020 through U+E007E have been un-deprecated for use
> with emoji flags. Only U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are
> deprecated.

Cancel Tag is not deprecated any longer either
(http://www.unicode.org/Public/UNIDATA/PropList.txt).

Andrew

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Doug Ewell via Unicode

Richard Wordingham wrote:

> Language tagging is already available in Unicode, via the tag
> characters in the deprecated plane.

Plane 14 isn't deprecated -- that isn't a property of planes -- and the
tag characters U+E0020 through U+E007E have been un-deprecated for use
with emoji flags. Only U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are
deprecated.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Richard Wordingham via Unicode

On Fri, 1 Feb 2019 14:47:22 +0100
Egmont Koblinger via Unicode  wrote:

> Hi Ken,
> 
> > [language tag]
> > That is a complete non-starter for the Unicode Standard.  
> 
> Thanks for your input!
> 
> (I hope it was clear that I just started throwing in random ideas, as
> in a brainstorming session. This one is ruled out, then.)

Language tagging is already available in Unicode, via the tag characters
in the deprecated plane.

Richard.

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Richard Wordingham via Unicode

On Fri, 1 Feb 2019 13:02:45 +0200
Khaled Hosny via Unicode  wrote:

> On Thu, Jan 31, 2019 at 11:17:19PM +, Richard Wordingham via
> Unicode wrote:
> > On Thu, 31 Jan 2019 12:46:48 +0100
> > Egmont Koblinger  wrote:
> > 
> > No.  How many cells do CJK ideographs occupy?  We've had a strong
> > hint that a medial BEH should occupy one cell, while an isolated
> > BEH should occupy two.  
> 
> Monospaced Arabic fonts (there are not that many of them) are designed
> so that all forms occupy just one cell (most even including the
> mandatory lam-alef ligatures), unlike CJK fonts.
> 
> I can imagine the terminal restricting itself to monspaced fonts,
> disable “liga” feature just in case, and expect the font to well
> behave. Any other magic is likely to fail.

Of course, strictly speaking, a monospaced font cannot support harakat
as Egmont has proposed.

Richard.

Re: Encoding italic

2019-02-01 Thread Philippe Verdy via Unicode

the proposal would contradict the goals of variation selectors and would
pollute ther variation sequences registry (possibly even creating
conflicts). And if we admit it for italics, than another VSn will be
dedicated to bold, and another for monospace, and finally many would follow
for various style modifiers.
Finally we would no longer have enough variation selectors for all
requests).
And what we would have made was only trying to reproduce another existing
styling standard, but very inefficiently (and this use wil be "abused" for
all usages, creating new implementation constraints and contradicting goals
with existing styling languages: they would then decide to make these
characters incompatible for use in conforming applications. The Unicode
encoding would have lost all its interest.
I do not support the idea of encoding generic styles (applicable to more
than 100k+ existing characters) using variation selectors. Their goal is
only to allow semantic distinctions when two glyphs were unified in one
language may occasionnaly (not always) have some significance in specific
languages. But what you propose would apply to all languages, all scripts,
and would definitely reserve some the the few existing VSn for this styling
use, blocking further registration of needed distinctions (VSn characters
are notably needed for sinographic scripts to properly represent toponyms
or person names, or to solve some problems existing with generic character
properties in Unicode that cannot be changed because of stability rules).


Le jeu. 31 janv. 2019 à 16:32, wjgo_10...@btinternet.com via Unicode <
unicode@unicode.org> a écrit :

> Is the way to try to resolve this for a proposal document to be produced
> for using Variation Selector 14 in order to produce italics and for the
> proposal document to be submitted to the Unicode Technical Committee?
>
> If the proposal is allowed to go to the committee rather than being
> ruled out of scope, then we can know whether the Unicode Technical
> Committee will allow the encoding.
>
> William Overington
>
> Thursday 31 January 2019
>
>

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi,

I'm trying to respond to every question, but I'm having a hard time
keeping up :-)

Thanks a lot for all the precious input about shaping!

Here's my suggestion, for version 0.2 of the recommendation:

- No longer encourage any use of presentation form characters.

- State that it's the terminal emulator's task to perform shaping,
both in implicit and explicit modes.

- Leave it for a future enhancement to handle trickier cases in
explicit mode, such as shaping of a word that's only partially
visible, or prevent shaping when two words happen to touch each other
and are visually separated by other means (e.g. background color).
Leave it for further research whether we could use ZWJ/ZWNJ here,
whether we could use ECMA's SAPV 5-8 & 21-11, or whether we should
invent something new (perhaps even telling the terminal emulator what
neighboring previous/next characters to imagine there for the purpose
of shaping)...

Let me know if you have any remaining problems/concerns/etc.

As for the implementation in VTE: initially I'll still use
presentation form characters, solely because that's a low hanging
fruit approach (low investment, high gain). I've already implemented
it in about an hour (a bit of further hacks will be necessary to
extend it to explicit mode, but still easily doable), whereas
switching to HarfBuzz is expected to take weeks of heavy work. We'll
tackle that in a subsequent version. And if anyone's happy to help,
there's already some bounty for harfbuzz support :)

Thanks again for the great guidance!

cheers,
egmont

On Tue, Jan 29, 2019 at 1:50 PM Egmont Koblinger  wrote:
>
> Hi,
>
> Terminal emulators are a powerful tool used by many people for various
> tasks. Most terminal emulators' bugtracker has a request to add RTL /
> BiDi support. Unicode has supported BiDi for about 20 years now.
> Still, the intersection of these two fields isn't solved. Even some
> Unicode experts have stated over time that no one knows how to do it
> properly.
>
> The only documentation I could find (ECMA TR/53) predates the Unicode
> BiDi algorithm, and as such no surprise that it doesn't follow the
> current state of the art or best practices.
>
> Some terminal emulators decided to run the BiDi algorithm for display
> purposes on its lines (rather than paragraphs, uh), not seeing the big
> picture that such a behavior turns them into a platform on top of
> which it's literally impossible to implement proper BiDi-aware text
> editing (vim, emacs, whatever) experience. In turn, vim, emacs and
> friends stand there clueless, not knowing how to do BiDi in terminals.
>
> With about 5 years of experience in terminal emulator development, and
> some prior BiDi homepage developing experience with the kind mentoring
> of one of the BiDi gurus (Aharon, if you're reading this, hi there!),
> I decided to tackle this issue. I studied and evaluated the
> aforementioned documentation and the behavior of such terminals,
> pointed out the problems, and came up with a draft proposal.
>
> My work isn't complete yet. One of the most important pending issues
> is to figure out how to track BiDi control characters (e.g. which
> character cells they belong to), it is to be addressed in a subsequent
> version. But I sincerely hope I managed to get the basics right and
> clean enough so that work can begin on implementing proper support in
> terminal emulators as well as fullscreen text applications; and as we
> gain experience and feedback, extending the spec to address the
> missing bits too.
>
> You can find this (draft) specification at [1]. Feedback is welcome –
> if it's an actionable one then preferably over there in the project's
> bugtracker.
>
> [1] https://terminal-wg.pages.freedesktop.org/bidi/
>
>
> cheers,
> egmont (GNOME Terminal / VTE co-developer)

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Fri, 1 Feb 2019 14:35:35 +0100
> Cc: Frédéric Grosshans , 
>   unicode@unicode.org
> 
> > You could do that, but it will require a lot of non-trivial processing
> > from the applications.  Text-mode applications don't want any complex
> > tinkering, they want just to write their text and be done.  The more
> > overhead you add to that simple task, the less probable it is that
> > applications will support such a terminal.
> 
> I agree with your overall observation, but I'm not sure how much it
> applies to this context.
> 
> Text-mode applications have to run the BiDi algorithm. The one I
> picked can also do shaping (well, the pretty limited one, using
> presentation forms).

Reordering and shaping have different requirements.  Reordering can be
done based only on the codepoints, whereas shaping needs also intimate
knowledge of the fonts being used.  The former can be done by a
text-mode application, the latter cannot, not anywhere close to what
readers of the respective scripts would expect.

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi Richard,

On Fri, Feb 1, 2019 at 12:19 AM Richard Wordingham via Unicode
 wrote:

> Cropped why?  If the problem is the truncation of lines, one can simple
> store the next character.

Yup, trancation of line for example.

I agree that one could "store the next character". We could extend the
terminal emulation protocol where by some means you can specify that
column 80 contains a letter X, and even though there's no column 81,
an app can still tell the terminal emulator that it should imagine
that column 81 contans the letter Y, and perform shaping accordingly.

This will need to be done not just at the end of the terminal, but at
any position, and for both directions. Think of e.g. a vertically
split tmux. You should be able to tell that column 40 contains X which
should be shaped as if column 41 contained Y, and column 41 contains Z
which should be shaped as if column 40 contained A.

What I canont see at all is how this could be "simply". Could you
please elaborate on that? I don't find this simple at all!

>> > It's not able to
> > separate different UI elements that happen to be adjacent in the
> > terminal, separated by different background color or such.
>
> ZWJ and ZWNJ can handle that.

Wouldn't it be a semantical misuse of these characters, though?

They are supposed to be present in the logical order, and in logical
order (that is: the terminal's implicit mode) they can work as
desired.

Are they okay to be present in visual order (the terminal's explicit
mode, what we're discussing now) too?

Anyway, ZWJ/ZWNJ aren't sufficient to handle the cases I outlined above.

> If a general text manipulating application, e.g. cat, grep or awk, is
> writing to a file, it should not convert normal Arabic characters to
> presentation forms.  You are now asking a general application to
> determine whether it is writing to a terminal or not, and alter its
> output if it is writing to a terminal.

No, this absolutely not what I'm talking about!

There are two vastly different modes of the terminal. For "cat",
"grep" etc. the terminal will be in implicit mode. Absolutely no BiDi
handling is expected from these apps, the terminal will do BiDi and
shaping (perhaps using Harfbuzz; perhaps using presentation form
characters as a temporarily low hanging fruit until a better one is
implemented – the choice is obviously up to the implementation and not
to the specification).

For "emacs" and friends, an explicit mode is required where visual
order is passed to the terminal. What we're discussing is how to
handle shaping in this mode.

> But it as an issue that needs to be addressed.  As a terminal can be
> addressed by cell, an application may need to keep track of what text
> went into each cell. Misery results when the application gets it wrong.

My recommendation doesn't change this principle at all. In the lower
(emulation) layer every character still goes into the cell it used to
go to, and is addressable using cursor motion escapes and so on
exactly as without BiDi.

> How many cells do CJK ideographs occupy?  We've had a strong hint
> that a medial BEH should occupy one cell, while an isolated BEH should
> occupy two.

CJK occupy two, but they do regardless of what's around them. That is,
they already occupy two cells in the logical buffers, in the emulation
layer.

There is absolutely no sane way we can make in terminal emulation a
character's logical width (as in number of cells it occupies) depend
on its neighboring characters. (And even if we could by some terrible
hacks, it would break the principle you just said as "misery
results...", and the principle Eli said that things should remain
reasonably simple, otherwise hardly anyone will bother implementing
them.) This is a compromise Arabic folks will have to accept.

When displayed, it's up for terminal emulators to perhaps
enwiden/shrink cells as it wants to (they might even totally give up
on monospace fonts), but then they'll risk vertical lines not aligning
up perfectly vertically, content overflowing on the right etc. Konsole
does such things.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi Ken,

> [language tag]
> That is a complete non-starter for the Unicode Standard.

Thanks for your input!

(I hope it was clear that I just started throwing in random ideas, as
in a brainstorming session. This one is ruled out, then.)

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

On Thu, Jan 31, 2019 at 4:26 PM Eli Zaretskii  wrote:

> > Yes, I do argue that emacs will need to print a new escape sequence.
> > Which is much-much-much-much-much better than having to tell users to
> > go into the settings of their macOS Terminal / Konsole /
> > gnome-terminal etc. and disable BiDi there, isn't it?
>
> I'm not sure I agree.  Most users can disable bidi reordering of the
> terminal once and for all.  They don't need it.

What users are we talking about? Those who don't need BiDi ever at all?

Everything is already perfect for them! They should't care about the
"enable BiDi" settings of their terminal, either value will result in
the same, correct behavior for them.

Or do we talk about users who care about BiDi inside Emacs, but don't
care about BiDi when echo'ing, cat'ing...? Do such users exist? Well,
even if they do, they're not the only target of my work.

Remember: My proposal aims to address both the Emacs as well as the
echo/cat/... use cases. These are substantially different use cases
that require the terminal emulator to be in a different mode, and thus
automatic switching between the two modes has to be solved.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Fri, 1 Feb 2019 14:16:03 +0100
> Cc: Adam Borowski , unicode@unicode.org
> 
> There's absolutely no way we could reorder first, and then handle
> TAB's cursor movement. TAB's cursor movement happens in the lower
> layer, reordering happens in the upper one.

But that means you won't ever be able to be in compliance with UAX#9,
because TAB has distinct properties that affect the UBA.  If you
reorder after all TABs have been converted to spaces, you will not be
able to implement the support for Segment Separator characters.  Am I
missing something?

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi,

On Thu, Jan 31, 2019 at 4:14 PM Eli Zaretskii  wrote:>

> I suggest that you show the result to someone who does read Arabic.

I contacted one guy who is pretty knowledgeable in Arabic scripts, as
well as terminal emulation, I sent out an early unpublished version of
the proposal to him, but unfortunately he was busy and didn't have the
chance to respond. Let this thread be one where we invite Arabic folks
to comment :)

> Small changes can be very unpleasant to the eyes of an Arabic reader.

I can easily imagine that!

I can assure you, seeing õ instead of ő in my native language is
extremely unpleasant to my eyes. Depending on the font you're using,
you may not even have spotted any difference.

But could someone argue for example that seeing an "i" and "w" equally
wide is unpleasant to their eyes? Where do we draw the lines of what's
an acceptable compromise on a platform that has technical limitations
(fixed grid) to begin with? We really need input from Arabic folks to
answer this.

I'm also wondering: how unpleasant it is if a letter is cut in half
(e.g. overflows at the edge of the text editor), and is shaped not
according to the entire word but according to the visible part? I took
it from the CSS specification that the desired behavior is to shape it
according to the entire word, but I honestly don't know how acceptable
or how unpleasant the other approach is.

> You could do that, but it will require a lot of non-trivial processing
> from the applications.  Text-mode applications don't want any complex
> tinkering, they want just to write their text and be done.  The more
> overhead you add to that simple task, the less probable it is that
> applications will support such a terminal.

I agree with your overall observation, but I'm not sure how much it
applies to this context.

Text-mode applications have to run the BiDi algorithm. The one I
picked can also do shaping (well, the pretty limited one, using
presentation forms). Shouldn't any BiDi algorithm also provide methods
for shaping that produce some output that can be easily sent to the
terminals? Shouldn't we push for them?

As far as I imagine the ideal solution, doing this part of shaping
shouldn't be any harder for apps than doing BiDi, basically all they
would need to do is hook up to existing API methods.

Of course, given the current APIs, it's probably really not this simple.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Fri, 1 Feb 2019 13:54:02 +0100
> Cc: Adam Borowski , unicode@unicode.org
> 
> For this behavior, the only feature you need from a terminal emulator
> is to have a mode where it doesn't shuffle the characters. Currently
> every emulator I'm aware of has such a mode, although in some of them
> you have to tweak the settings to get to this mode (in my firm opinion
> it's an unacceptable user experience), while in emulators according to
> my specification there'll be an escape sequence for text-mode apps to
> automatically switch to this mode.

Like I said, as long as not every emulator supports this control, an
application will need to detect its support, and that in itself is a
complication.

> > This is indeed a significant issue, because it means applications
> > cannot force the terminal use a certain non-default base paragraph
> > direction.
> 
> They can, since there's a dedicated escape sequence (SCP) for setting
> the base paragraph.

Does this change the base direction globally for the whole screen, or
only for the current text?  The latter is what's needed.

And again, just detecting whether this is supported is a
complication.  Emitting LRM or RLM as needed is much easier.

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Fri, 1 Feb 2019 13:40:48 +0100
> Cc: unicode@unicode.org
> 
> I now understand that presentation forms isn't an ideal possible
> approach, and the recommendation should be improved here.
> 
> Until it happens, I'm uncertain whether using presentation form
> characters is a decent low hanging fruit that significantly improves
> the readability in some situations (e.g. "good enough" in some sense
> for Arabic), or is a dead end we shouldn't propagate.

IMNSHO, you shouldn't try solving this problem on your own.  Instead,
use a shaping engine, such as HarfBuzz, to do that for you, since the
emulator does know which fonts it uses, and can access their
properties.  The only problem a terminal emulator does need to solve
in this regard is what to do when N codepoints yield M /= N glyphs
that the shaping engine tells you to emit, or, more generally, when
the width on display after shaping is different from N times the
character cell width.

> I still do not agree however that the entire responsibility can be
> shifted to the emulator. There are certain important bits of
> information that are only available to the application, and not the
> emulator – as with many other aspects, such as reordering,
> copy-pasting, searching in the data in BiDi-aware text editors using
> the terminal's explicit mode, which are all pushed to the application
> because the emulator cannot do them correctly.

As soon as you attempt to target applications that move cursor and use
cursor addressing, you are in trouble, and should IMO refrain from
trying to support such applications.  For example, Emacs doesn't even
write whole lines to the screen, it compares the internal
representation of what's on the screen and what should be there, and
only emits the parts that should be modified.  (It does that to
minimize screen writes, which might be expensive, especially if
writing to a remote terminal.)  In such cases, the emulator doesn't
stand a chance of doing TRT, because the application doesn't provide
enough context for it to reorder text correctly.

So I don't think a bidi-aware terminal emulator can support any
application more complex than those which write full lines to the
terminal, like 'cat', 'sed', 'diff', 'grep', etc.

> I believe we should further study the situation, e.g. see whether
> ECMA-48's SAPV (8.3.18) parameters 5..8 (to explicitly specify whether
> to use isolated/initial/medial/final form for each character) are
> flexible enough to convey all this information, or perhaps a new, more
> powerful means should be crafted.

Once again, I think it's impractical to expect applications to emit
these controls.  The emulator must do this part of the job.

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi,

On Thu, Jan 31, 2019 at 4:10 PM Eli Zaretskii  wrote:

> The reordering happens before TABs are converted to cursor motion,
> does it not?

No, not at all.

You cannot "mix" handling the input and reordering, since the input is
not available as a single step but arrives continuously in a stream.

Consider a heavy BiDi text such as (I'm making up some random
gibberish, uppercase being RTL):
foo BAR FSI BAz quUX 1234 PDI whatEVer

Someone prints it to the terminal, but due to the internals, the
terminal doesn't receive this in one single step but in two
consecutive ones, broken in the middle. Maybe the app split it in half
(e.g. a shell script printed fragments one by one using printf without
a trailing newline). Maybe the emitter is a "dd" printing blocks of
let's say 4kB and this line happens to cross a boundary. Maybe a
transport layer such as ssh split it for whatever reason.

Then would you take the first half of this text, let's say
foo BAR FSI BAz quU
even with unbalanced BiDi controls, then reorder it, and continue from
it? Continue how? How to remember to reorder the second half too, but
not the first half once again in order to avoid "double BiDi"?

What to do with explicit cursor movement, would they jump to the
visual positon? This would break absolutely basic principles, e.g.
jumping twice to the same location to overwrite a letter twice in a
row may actually end up overwriting two different letters, since
everything was potentially rearranged after the first overwrite
happened? Any application having any existing preconception about
cursor movement would uncontrollably fall apart.

This approach is doomed to fail big time (and was the reason I had to
drop ECMA TR/53's DCSM "presentation" mode).

The only reasonable way is if you have two layers. The bottom layer
does the emulation almost exactly as it used to do, with no BiDi
whatsoever (except for tiny additions, e.g. it tracks BiDi-related
properties such as the paragraph direction). The upper layer displays
the data, and this upper layer performs BiDi solely for display
purposes: using the lower layer's data as input, but not modifying it.

This is, by the way, also what current emulators that shuffle the
characters arond do.

Let's also mention that the lower layer (emulation) should be as fast
as possible. e.g. VTE can handle input in the ballpark of 10MB/s.
Reordering, that is, running BiDi for display purposes needs to happen
much more rarely, maybe 20-60 times per second. It would be a
performance killer having to run the BiDi algorithm upon every
received chunk of data – in fact, to eliminate any possible behavior
difference due to timing difference, it'd need to happen after every
printable character received.

There's absolutely no way we could reorder first, and then handle
TAB's cursor movement. TAB's cursor movement happens in the lower
layer, reordering happens in the upper one.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi Eli,

> So we will some day have one such terminal emulator.  That's good, but
> a text-mode application that needs to support bidi cannot rely on its
> users all having access to that single terminal.

No. A text-mode application that needs to support BiDi must do the
BiDi itself and pass visual order to the emulator, and beforehand
switch the emulator to explicit mode so that you don't end up with
"double BiDi". Once you emit visual order, there's no need for any
BiDi control characters.

For this behavior, the only feature you need from a terminal emulator
is to have a mode where it doesn't shuffle the characters. Currently
every emulator I'm aware of has such a mode, although in some of them
you have to tweak the settings to get to this mode (in my firm opinion
it's an unacceptable user experience), while in emulators according to
my specification there'll be an escape sequence for text-mode apps to
automatically switch to this mode.

What BiDi control characters (LRE, LRI, FSI etc.) in implicit mode
will give you – if supported – is that you'll be able to execute "cat
file", and it'll be displayed correctly, even taking FSI and friends
as present in the file into account. Of course this will only work in
terminal emulators that support this.

> This is indeed a significant issue, because it means applications
> cannot force the terminal use a certain non-default base paragraph
> direction.

They can, since there's a dedicated escape sequence (SCP) for setting
the base paragraph.

That being said, not being able to remember FSI at the beginning of a
string is indeed a significant issue, we agree on this. We just need
to figure out how to alter the emulation behavior to remember them,
which I find the next big step to address in the specification.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Egmont Koblinger via Unicode

Hi Eli,

> Arabic presentation forms are more like an exception than a rule, I
> hope you understand this by now.  Most languages/scripts don't have
> such forms, and even for Arabic they cover only a part of what needs
> to be done to present correctly shaped text.  Complex script shaping
> is much more than just substituting some glyphs with others, it
> requires an intimate knowledge of the font being used and its
> capabilities, and the ability to control how various glyphs of a
> grapheme cluster are placed relative to one another, something that an
> application running on a text terminal cannot do.
>
> So I suggest that you don't consider Arabic presentation forms a
> representative of the direction in which terminal emulators supporting
> such scripts should evolve.

Thanks a lot for this information!

I now understand that presentation forms isn't an ideal possible
approach, and the recommendation should be improved here.

Until it happens, I'm uncertain whether using presentation form
characters is a decent low hanging fruit that significantly improves
the readability in some situations (e.g. "good enough" in some sense
for Arabic), or is a dead end we shouldn't propagate.

I still do not agree however that the entire responsibility can be
shifted to the emulator. There are certain important bits of
information that are only available to the application, and not the
emulator – as with many other aspects, such as reordering,
copy-pasting, searching in the data in BiDi-aware text editors using
the terminal's explicit mode, which are all pushed to the application
because the emulator cannot do them correctly.

I believe we should further study the situation, e.g. see whether
ECMA-48's SAPV (8.3.18) parameters 5..8 (to explicitly specify whether
to use isolated/initial/medial/final form for each character) are
flexible enough to convey all this information, or perhaps a new, more
powerful means should be crafted. At this point I lack sufficient
knowledge to fix the design, I'd need to spend a lot of time studying
the situation and/or working together with you guys, if you're up for
it.


Thanks a lot,
egmont

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Khaled Hosny via Unicode

On Thu, Jan 31, 2019 at 11:17:19PM +, Richard Wordingham via Unicode wrote:
> On Thu, 31 Jan 2019 12:46:48 +0100
> Egmont Koblinger  wrote:
> 
> No.  How many cells do CJK ideographs occupy?  We've had a strong hint
> that a medial BEH should occupy one cell, while an isolated BEH should
> occupy two.

Monospaced Arabic fonts (there are not that many of them) are designed
so that all forms occupy just one cell (most even including the mandatory
lam-alef ligatures), unlike CJK fonts.

I can imagine the terminal restricting itself to monspaced fonts,
disable “liga” feature just in case, and expect the font to well behave.
Any other magic is likely to fail.

Regards,
Khaled

Re: Encoding italic

2019-02-01 Thread James Kass via Unicode




On 2019-01-31 3:18 PM, Adam Borowski via Unicode wrote:

> They're only from a spammer's point of view.

Spammers need love, too.  They’re just not entitled to any.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode

> Date: Thu, 31 Jan 2019 23:17:19 +
> From: Richard Wordingham via Unicode 
> 
> Emacs needs a lot of help - I can't write a generic Tai Tham
> OpenType .flt file :-(

Which is why Emacs is migrating towards HarfBuzz.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Richard Wordingham via Unicode

On Thu, 31 Jan 2019 12:46:48 +0100
Egmont Koblinger  wrote:

> Hi Richard,
> 
> > Basic Arabic shaping, at the level of a typewriter, is
> > straightforward enough to leave to a terminal emulator, as Eli has
> > suggested.  
> 
> What is "basic" Arabic shaping exactly?

Just using initial, medial and final forms, with no vertical stacking,
In terms of glyphs, none of glyphs of the presentation forms with
'LIGATURE' in the name would be used.

> I can see problems with leaving it to a terminal. It's not aware of
> the neighboring character if the string is cropped.

Cropped why?  If the problem is the truncation of lines, one can simple
store the next character.

> It's not able to
> separate different UI elements that happen to be adjacent in the
> terminal, separated by different background color or such.

ZWJ and ZWNJ can handle that.

> On the other hand, let's reverse the question:
> 
> "Basic Arabic shaping, at the level of a typewriter, is
> straightforward enough to be implemented in the application, using
> presentation form characters, as I suggest". Could you please point
> out the problems with this statement?

Apart from using presentation form characters in raw text being a sin?

If a general text manipulating application, e.g. cat, grep or awk, is
writing to a file, it should not convert normal Arabic characters to
presentation forms.  You are now asking a general application to
determine whether it is writing to a terminal or not, and alter its
output if it is writing to a terminal.  If the terminal window is
actually an emacs text buffer, I would not want such output to be
converted.  It is entirely natural to convert an emacs text buffer to
a file. 

> > I believe combining marks present issues even in implicit modes.  In
> > implicit mode, one cannot simply delegate the task to normal text
> > rendering, for one has to allocate text to cells.  There are a
> > number of complications that spring to mind:
> >
> > 1) Some characters decompose to two characters that may otherwise
> > lay claim to their own cells:
> >
> > U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to
> > <06D2,  
> > 0654>.  Do you intend that your scheme be usable by
> > Unicode-compliant processes?  
> 
> Decompose during which step? During shaping?
> 
> Or do you mean they are NFC-NFD counterparts of each other?

The latter.

> > 4) Indic conjuncts.
> > (i) There are some conjuncts, such as Devanagari K.SSA, where a
> > display as ,  is simply unacceptable.  In some
> > closely related scripts, this conjunct has the status of a
> > character.  
> 
> We (in GNOME Terminal / VTE) do have an open bug about Devanagari
> spacing marks (currently they don't show up properly), plus Virama and
> friends. I'd like to address the essentials along with the BiDi
> implementation; although here we should discuss the design and not a
> particular implementation thereof :)
> 
> In case you're interested, at
> https://bugzilla.gnome.org/show_bug.cgi?id=584160 comments 45-48, 95
> and perhaps a few others comments I wondered whether certain joining
> operations should be done on the emulation layer or the display layer.
> The answer is not yet clear. We can't fix suddenly everything, but
> it's nice to move forward step by step. It's also proposed that we
> used HarfBuzz, but it's unclear to me at this point how the grid
> alignment could be preserved in the mean time.

Thanks for the link.

There are two different beasties.  There are text windows into which
the user and the application communicate using text, and this text
tends to be rendered properly, as one might aim to do with HarfBuzz,
and as an Emacs text buffer running the shell tries to do.  (Emacs
needs a lot of help - I can't write a generic Tai Tham OpenType .flt
file :-(  In my opinion, these are highly appropriate for application
like diff, grep and cat.  Do we have a good name for them/  They are,
perhaps, 'teletype emulators'.

> "simply unacceptable" – I'm not familiar with those languages,
> cultures and so on, but I'd be hesitant to go as far as calling
> anything "unacceptable". E.g. there's a physical typewriter in our
> family, as far as I remember it has no digits 1 or 0 (use the letters
> lowercase L and anycase O instead), it doesn't contain all the
> accented letters of my mother tounge so sometimes a similarly looking
> one has to be used. In today's computer world, I'd say such
> limitations are "unacceptable", but at that time this was what we had
> to live with.
> 
> Terminal emulators, due to their strict character grid nature and
> their legacy behavior of many decades, are a platform where a certain
> level of compromise might be necessary for some scripts. I cannot tell
> where to draw the line, cannot tell what is "extremely bad" vs. "not
> nice" vs. "kind of okay but could be better", but we can't do
> everything in a terminal emulator that a graphical app could do. If
> someone wants to have a pixel perfect look, terminal emulators are

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Richard Wordingham via Unicode

On Thu, 31 Jan 2019 08:28:41 +
Martin J. Dürst via Unicode  wrote:

> > Basic Arabic shaping, at the level of a typewriter, is
> > straightforward enough to leave to a terminal emulator, as Eli has
> > suggested.  Lam-alif would be trickier - one cell or two?  
> 
> Same for other characters. A medial Beh/Teh/Theh/... (ببب) in any 
> reasonably decent rendering should take quite a bit less space than a 
> Seen or Sheen (سسس). I remember that the multilingual Emacs version 
> mostly written by Ken'ichi Handa (was it called mEmacs or nEmacs or 
> something like that?) had different widths only just for Arabic. In 
> Thunderbird, which is what I'm using here, I get hopelessly 
> stretched/squeezed glyph shapes, which definitely don't look good.

It's a long time since I last knowingly read typewritten Arabic script,
but on reading the description of Haddad's design of the Arabic
typewriter, I see what you mean.  My point is correct, but your point
is another argument for having single- and double-width characters.

Richard.

Re: Encoding italic

2019-01-31 Thread Asmus Freytag via Unicode


  
  
On 1/31/2019 12:55 AM, Tex via Unicode
  wrote:


  As with the many problems with walls not being effective, you choose to ignore the legitimate issues pointed out on the list with the lack of italic standardization for Chinese braille, text to voice readers, etc.
The choice of plain text isn't always voluntary. And the existing alternatives, like math italic characters, are problematic.

The underlying issue is the lack of rich
text support in places where users expect rich text.
The solution is to find ways to enable rich
text layers that are not full documents and make them
interoperable.
The solution is not to push this into plain
text - which then becomes lowest common denominator rich text
instead.
A./

RE: Encoding italic

2019-01-31 Thread Doug Ewell via Unicode

Kent Karlsson wrote:

> ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
> sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
> is implemented in Cygwin (sorry for mentioning a product name).)

Fair enough. This thread is mostly about italics and bold and such, not
colors, but the point is well taken that one of these leads invariably
to the others, especially if the standard or flavor in question
implements them.

> ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which
> traditionally does not use bold or italic.

But that's OK. For low-level mechanisms like these, it should be
incumbent on the user to say, "Yes, I can use this styling with that
script, but I shouldn't; it would look terrible and would fly in the
face of convention." ISO 6429 also allows green text on a cyan
background, which is about as good an idea as CJK italics.

> Compare those specified for CSS
> (https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and
> https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style).
> These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should
> be of interest for the generalised subject of this thread.

I'm hoping we can continue to restrict this thread to plain text.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Doug Ewell via Unicode

Egmont Koblinger wrote:

> "Basic Arabic shaping, at the level of a typewriter, is
> straightforward enough to be implemented in the application, using
> presentation form characters, as I suggest". Could you please point
> out the problems with this statement?

As multiple people have pointed out, Arabic presentation forms don't
cover the whole Arabic script and are not generally recommended for new
applications, though they are not formally deprecated.

If you take a look at the parallel discussion about italics in plain
text, you will see a corollary in the use of Mathematical Alphanumeric
Symbols: they look tempting and are (usually) easy to render, but among
other things, they only cover [A-Za-zıȷΑ-Ωα-ω] and thus miss much
of the text that may need to be italicized.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Ken Whistler via Unicode




On 1/31/2019 1:41 AM, Egmont Koblinger via Unicode wrote:

I mean, for
example we can introduce control characters that specify the language.


That is a complete non-starter for the Unicode Standard. And if the 
terminal implementation introduces such as one-off hacks, they will fail 
completely for interoperability.


https://en.wikipedia.org/wiki/IETF_language_tag

That belongs to the markup level, not to the plain text stream.

--Ken

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode

> Date: Thu, 31 Jan 2019 10:58:54 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode 
> 
> Yes, I do argue that emacs will need to print a new escape sequence.
> Which is much-much-much-much-much better than having to tell users to
> go into the settings of their macOS Terminal / Konsole /
> gnome-terminal etc. and disable BiDi there, isn't it?

I'm not sure I agree.  Most users can disable bidi reordering of the
terminal once and for all.  They don't need it.

If terminals supported some control sequence to turn on and off the
reordering, it might be a useful feature to support such sequences.
But IME just querying the emulator whether it supports that or not is
a hassle, and generally slows down the application startup.  So it's a
mixed blessing.

> > On the other hand, all that the program can output is a sequence of Unicode
> > codepoints.  These don't include shaping information
> 
> With "presentation form" characters, yes, they can, they do including
> shaping information.

Let's please stop talking about presentation forms, they solve only a
small part of the shaping problem.

> > and it's
> > the terminal who's equipped with _most_ of the needed data
> 
> Why? It's the app that knows the context characters, it's the app that
> knows the language.

But only the emulator knows which fonts it uses, and only the emulator
can access the information about the font, like what OTF features it
supports, what glyphs it has, etc.

Re: Encoding italic

2019-01-31 Thread Adam Borowski via Unicode

On Thu, Jan 31, 2019 at 02:21:40PM +, James Kass via Unicode wrote:
> David Starner wrote,
> > The choice of using single-byte character sets isn't always voluntary.
> > That's why we should use ISO-2022, not Unicode. Or we can expect
> > people to fix their systems. What systems are we talking about, that
> > support Unicode but compel you to use plain text? The use of Twitter
> > is surely voluntary.
> 
> This marketing-related web page,
> 
> https://litmus.com/blog/best-practices-for-plain-text-emails-a-look-at-why-theyre-important
> 
> ...lists various reasons for using plain-text e-mail.

They're only from a spammer's point of view.

> Besides marketing, there’s also newsletters and e-mail discussion groups. 
> Some of those discussion groups are probably scholarly. Anyone involved in
> that would likely embrace ‘super cool Unicode text magic’ and it’s
> surprising if none of them have stumbled across the math alphanumerics yet.

Then there are technical mailing lists.  In particular, on every single list
other than Unicode I'm subscribed to, a HTML-only mail would get you flamed
by several list members; even a plain+HTML alternative can get you an
earful.

Then there's LKML and other lists hosted at vger, where a mail that as much
as has a HTML version attached will get outright rejected at mail software
level.

After 2½ decades of participating mailing in mailing lists, I got aversion
to HTML mails burned in as a kind of involuntary reflex.  Upon seeing Asmus'
mails, the ingrained reflex kicks in, I start getting upset, only to realize
what list I'm reading and that it's him who's a regular here, not me.

So even when in principle adding such features would be possible, many
communities decide to prefer interoperability over newest types of bling.
Some prefer top-posted HTML mails, some prefer Twitter, some Unicode plain
text, some perhaps want plain ASCII only.

> It’s true that people don’t have to use Twitter.  People don’t have to turn
> on their computers, either.

And sometimes they use a Braille reader or a text console.

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Thu, 31 Jan 2019 10:41:02 +0100
> Cc: Frédéric Grosshans , 
>   unicode@unicode.org
> 
> > Personally, I think we should simply assume that complex script
> > shaping is left to the terminal, and if the terminal cannot do that,
> > then that's a restriction of working on a text terminal.
> 
> I cannot read any of the Arabic, Syriac etc. scripts, but I did lots
> of experimenting with picking random Arabic words, displaying in the
> terminal their unshaped as well as shaped (using presentation form
> characters) variants, and compared them to pango-view's (harfbuzz's)
> rendering.
> 
> To my eyes the version I got in the terminal with the presentation
> form characters matched the "expected" (pango-view) rendering
> extremely closely.

I suggest that you show the result to someone who does read Arabic.
Small changes can be very unpleasant to the eyes of an Arabic reader.

As for Arabic presentation forms, I already explained why they cannot
be considered a solution that is anywhere near the complete one.

> > OTOH a terminal emulator who wants to perform shaping needs
> > information from the application
> 
> And the presentation form characters are pretty much exactly that
> information, aren't they (for Arabic)?

Much more is needed for correct shaping.

> Instead of saying that it's not possible, could we perpahs try to
> solve it, or at least substantially improve the situation? I mean, for
> example we can introduce control characters that specify the language.
> We can introduce a flag that tells the terminal whether to do shaping
> or not. There are probably plenty of more ideas to be thrown in for
> discussion and improvement.

You could do that, but it will require a lot of non-trivial processing
from the applications.  Text-mode applications don't want any complex
tinkering, they want just to write their text and be done.  The more
overhead you add to that simple task, the less probable it is that
applications will support such a terminal.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Thu, 31 Jan 2019 10:28:27 +0100
> Cc: Adam Borowski , unicode@unicode.org
> 
> On Wed, Jan 30, 2019 at 5:10 PM Eli Zaretskii  wrote:
> 
> > I think the application could use TAB characters to get to the next
> > cell, then simplistic reordering would also work.
> 
> TAB handling is extremely complicated, because in terminal emulation
> TAB is not a character, TAB is a control instruction (like escape
> sequences) that moves the cursor (and jumps through the existing
> content, if any, without erasing it).

The reordering happens before TABs are converted to cursor motion,
does it not?  If so, their effect on reordering, by virtue of the TAB
being Segment Separator for the UBA purposes, could happen
nonetheless.  Right?

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Thu, 31 Jan 2019 10:21:52 +0100
> Cc: Adam Borowski , unicode@unicode.org
> 
> > Does anyone know of a terminal emulator which supports isolates?
> 
> GNOME Terminal's (VTE's) current work-in-progress implementation does
> remember BiDi control characters just like it remembers combining
> accents, that is, tied to the preceding letter's cell. It uses FriBidi
> 1.0 for the BiDi work, so yes, it supports Unicode 6.3's isolates.

So we will some day have one such terminal emulator.  That's good, but
a text-mode application that needs to support bidi cannot rely on its
users all having access to that single terminal.

> There's one significant issue, though. Because we currently just
> misuse our existing infrastructure of combining accents for the BiDi
> controls, BiDi controls at the very beginning of a paragraph are
> dropped. Addressing this issue would need core changes to the terminal
> emulation behavior, such as introducing in-between-cells storage, or
> zero-width special characters belonging to a cell _before_ the cell's
> actual letter, or something like this. I outline one idea in my
> specification, but it's subject to discussion to finalize it.

This is indeed a significant issue, because it means applications
cannot force the terminal use a certain non-default base paragraph
direction.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Thu, 31 Jan 2019 10:11:22 +0100
> Cc: unicode@unicode.org
> 
> > It doesn't do _any_ shaping.  Complex script shaping is left to the
> > terminal, because it's impossible to do shaping in any reasonable way
> > [...]
> 
> Partially, you are right. On the other hand, as far as I know, shaping
> should take into account the neighboring glyphs even if those are not
> visible (e.g. overflow from the viewport), and the terminal is unaware
> of what those glyps are. This is an area that "presentation form"
> characters can address for Arabic – although as it was pointed out,
> not for Syrian and some others.

Arabic presentation forms are more like an exception than a rule, I
hope you understand this by now.  Most languages/scripts don't have
such forms, and even for Arabic they cover only a part of what needs
to be done to present correctly shaped text.  Complex script shaping
is much more than just substituting some glyphs with others, it
requires an intimate knowledge of the font being used and its
capabilities, and the ability to control how various glyphs of a
grapheme cluster are placed relative to one another, something that an
application running on a text terminal cannot do.

So I suggest that you don't consider Arabic presentation forms a
representative of the direction in which terminal emulators supporting
such scripts should evolve.

Re: Encoding italic

2019-01-31 Thread wjgo_10...@btinternet.com via Unicode

Is the way to try to resolve this for a proposal document to be produced 
for using Variation Selector 14 in order to produce italics and for the 
proposal document to be submitted to the Unicode Technical Committee?


If the proposal is allowed to go to the committee rather than being 
ruled out of scope, then we can know whether the Unicode Technical 
Committee will allow the encoding.


William Overington

Thursday 31 January 2019

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Frédéric Grosshans via Unicode


Le 31/01/2019 à 10:41, Egmont Koblinger a écrit :

Hi,


Personally, I think we should simply assume that complex script
shaping is left to the terminal, and if the terminal cannot do that,
then that's a restriction of working on a text terminal.

I cannot read any of the Arabic, Syriac etc. scripts, but I did lots
of experimenting with picking random Arabic words, displaying in the
terminal their unshaped as well as shaped (using presentation form
characters) variants, and compared them to pango-view's (harfbuzz's)
rendering.

To my eyes the version I got in the terminal with the presentation
form characters matched the "expected" (pango-view) rendering
extremely closely. Of course there's still some tradeoffs due to fixed
with cells (just as in English, arguably an "i" and "w" having the
same width doesn't look as nice as with proportional fonts). In the
mean time, the unshaped characters looks vastly differently.


OTOH a terminal emulator who wants to perform shaping needs
information from the application

And the presentation form characters are pretty much exactly that
information, aren't they (for Arabic)?


There's nothing you can do here [...] there's no way for the application to 
provide

Instead of saying that it's not possible, could we perpahs try to
solve it, or at least substantially improve the situation? I mean, for
example we can introduce control characters that specify the language.
We can introduce a flag that tells the terminal whether to do shaping
or not. There are probably plenty of more ideas to be thrown in for
discussion and improvement.


cheers,
egmont

Re: Encoding italic

2019-01-31 Thread James Kass via Unicode




David Starner wrote,

> The choice of using single-byte character sets isn't always voluntary.
> That's why we should use ISO-2022, not Unicode. Or we can expect
> people to fix their systems. What systems are we talking about, that
> support Unicode but compel you to use plain text? The use of Twitter
> is surely voluntary.

This marketing-related web page,

https://litmus.com/blog/best-practices-for-plain-text-emails-a-look-at-why-theyre-important

...lists various reasons for using plain-text e-mail.  Here’s an excerpt.

“Some people simply prefer it. Plain and simple—some people prefer text 
emails. ... Some users may also see HTML emails as a security and 
privacy risk, and choose not to load any images and have visibility over 
all links that are included in an email. In addition, the increased 
bandwidth that image-heavy emails tend to consume is another driver of 
why users simply prefer plain-text emails.”


Besides marketing, there’s also newsletters and e-mail discussion 
groups.  Some of those discussion groups are probably scholarly. Anyone 
involved in that would likely embrace ‘super cool Unicode text magic’ 
and it’s surprising if none of them have stumbled across the math 
alphanumerics yet.


A web search for the string “plain text only” leads to all manner of 
applications for which searchers are trying to control their 
environments.  There’s all kinds of reasons why some people prefer to 
use plain-text, it’s often an informed choice and it isn’t limited to 
e-mail.


It’s true that people don’t have to use Twitter.  People don’t have to 
turn on their computers, either.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi Richard,

> Basic Arabic shaping, at the level of a typewriter, is straightforward
> enough to leave to a terminal emulator, as Eli has suggested.

What is "basic" Arabic shaping exactly?

I can see problems with leaving it to a terminal. It's not aware of
the neighboring character if the string is cropped. It's not able to
separate different UI elements that happen to be adjacent in the
terminal, separated by different background color or such.

On the other hand, let's reverse the question:

"Basic Arabic shaping, at the level of a typewriter, is
straightforward enough to be implemented in the application, using
presentation form characters, as I suggest". Could you please point
out the problems with this statement?

> I believe combining marks present issues even in implicit modes.  In
> implicit mode, one cannot simply delegate the task to normal text
> rendering, for one has to allocate text to cells.  There are a number
> of complications that spring to mind:
>
> 1) Some characters decompose to two characters that may otherwise lay
> claim to their own cells:
>
> U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2,
> 0654>.  Do you intend that your scheme be usable by Unicode-compliant
> processes?

Decompose during which step? During shaping?

Or do you mean they are NFC-NFD counterparts of each other?

Most terminal emulators are able to handle combining accents, and of
course implicit mode would take them into account when rearranging the
letters. Terminal emulators don't do explicit (de)composing, a.k.a.
NFC->NFD or NFD->NFC conversion (at least I'm not aware of any that
does).

> 4) Indic conjuncts.
> (i) There are some conjuncts, such as Devanagari K.SSA, where a
> display as ,  is simply unacceptable.  In some
> closely related scripts, this conjunct has the status of a character.

We (in GNOME Terminal / VTE) do have an open bug about Devanagari
spacing marks (currently they don't show up properly), plus Virama and
friends. I'd like to address the essentials along with the BiDi
implementation; although here we should discuss the design and not a
particular implementation thereof :)

In case you're interested, at
https://bugzilla.gnome.org/show_bug.cgi?id=584160 comments 45-48, 95
and perhaps a few others comments I wondered whether certain joining
operations should be done on the emulation layer or the display layer.
The answer is not yet clear. We can't fix suddenly everything, but
it's nice to move forward step by step. It's also proposed that we
used HarfBuzz, but it's unclear to me at this point how the grid
alignment could be preserved in the mean time.

"simply unacceptable" – I'm not familiar with those languages,
cultures and so on, but I'd be hesitant to go as far as calling
anything "unacceptable". E.g. there's a physical typewriter in our
family, as far as I remember it has no digits 1 or 0 (use the letters
lowercase L and anycase O instead), it doesn't contain all the
accented letters of my mother tounge so sometimes a similarly looking
one has to be used. In today's computer world, I'd say such
limitations are "unacceptable", but at that time this was what we had
to live with.

Terminal emulators, due to their strict character grid nature and
their legacy behavior of many decades, are a platform where a certain
level of compromise might be necessary for some scripts. I cannot tell
where to draw the line, cannot tell what is "extremely bad" vs. "not
nice" vs. "kind of okay but could be better", but we can't do
everything in a terminal emulator that a graphical app could do. If
someone wants to have a pixel perfect look, terminal emulators are not
for them. Maybe looking at typewriters of those scripts could be a
good starting point. Anyway, we've drifted quite far away.


What I've already implemented in VTE (in a work-in-progress branch),
and to my eyes looks quite nice, is Arabic shape using presentation
form characters as done by FriBidi (in implicit mode only). According
to the API of this library, this shaping process keeps a 1:1 mapping
between the original and shaped letters (at least the number of
Unicode codepoints – I haven't double checked their terminal width,
but I really hope they don't mess with us here). That is, I don't have
to deal with a character cell splitting into two, or two character
cells joining into one during shaping. Does this sound okay so far?


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

On Thu, Jan 31, 2019 at 10:05 AM Richard Wordingham via Unicode
 wrote:

> > How will "ls -l" possibly work?  This is an example of the "table"
> > layout you were already discussing.
>
> I think the answer is that it will use the same trickery as with a
> default setting for the --color argument.  Colour codes are emitted
> only when the output is a terminal.  Presumably the same would go for
> Bidi controls.

Exactly, that's what I have in mind in the long run. If coreutils
folks like the idea, "ls" could have a new option
--bidi=never/auto/always. With BiDi mode, it would enclose each of the
logical segments of strings that potentially contain RTL text
(filenames, dates etc.) separately inside an FSI...PDI block. That way
its output would look as desired (over the terminal's new default
"implicit" mode), since the terminal would take care of BiDi-ing each
FSI...PDI block.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi,

> > And if you argue "so make emacs print your
> > new code to disable formatting", so do thousands of other programs that are
> > less sophisticated than emacs.
>
> Yes, I do argue that emacs will need to print a new escape sequence.
> Which is much-much-much-much-much better than having to tell users to
> go into the settings of their macOS Terminal / Konsole /
> gnome-terminal etc. and disable BiDi there, isn't it?

Let me phrase it slightly differently. Emacs will not "need" to print
a new escape sequence, but will have the possibility to do so.

VTE is pretty certainly going to switch its default behavior to what
Konsole, PuTTY, Mlterm, macOS Terminal do now: to perform BiDi on its
contents. This mode is not suitable for Emacs or for any BiDi-aware
text editor.

Similarly to these terminal emulators, GNOME Terminal (and hopefully
other VTE-based frontends) will also most likely have a user setting
to force disable BiDi.

But as opposed to the aforementioned terminals, VTE will also turn off
BiDi upon a designated escape sequence.

VTE is the terminal widget behind several emulator apps, such as GNOME
Terminal, Xfce Terminal, Tilix, Terminator, Guake... I don't have
metrics, but according to various user polls I have the feeling that
VTE's usage share among Linux users is pretty significant, somewhere
in the ballpark of 50%.

Of course Emacs, or any other text editor, can still point its users
to the terminal's setting to disable BiDi. And then if the user also
wishes to have BiDi for "cat", they'll have to keep toggling it back
and forth. Or Emacs can emit the new escape sequence and then it will
be fully automatic.

Which one puts less supporting burden on Emacs's developers and
supporters? Which one is the better for the users? I think the answer
is the same to these two questions, and you sure know which answer I'm
thinking of.

According to this specification, nothing is going to be "worse" than
it already is in those few aforementioned terminal emulators. The new
default behavior will be the same as their behavior. We'll just
further extend it with the possibility of switching back to the old
mode without annoying the user.

I hope this clarifies a lot.


cheers,
egmont

Re: Encoding italic

2019-01-31 Thread James Kass via Unicode




David Starner wrote,

> Emoji, as have been pointed out several times, were in the original
> Unicode standard and date back to the 1980s; the first DOS character
> page has similes at 0x01 and 0x02.

That's disingenuous.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi,

> Arabic terminals and terminal emulators existed at the time of Unicode 1.0.

I haven't found any mention of them, let alone any documentation about them.

> If you are trying to emulate those services, for example so that older 
> software can run, you would need to look at how these programs expected to be 
> fed their data.

My goal is not to have those ancient software run. My goal is to look
into the future. Address the requests often seen in current terminal
emulator's bugtrackers. Stop the utterly unacceptable user experience
of current self-proclaimed BiDi-aware terminals where in order to run
Emacs you need to fiddle with the terminal's settings. Show that BiDi
in terminals is a much more complex story than just shuffling around
the characters, thus stopping new emulators from taking this broken
route which causes about as much damage as good. Create a platform on
top of which modern BiDi-aware experience can be created, to make both
"cat" and "emacs" work properly out of the box for BiDi users.

> I see little reason to reinvent things here, because we are talking about 
> emulating legacy hardware. Or are we not?

As per the above, no, not really. I'm not aware of any hardware that
supported BiDi, was there any? I look at terminal emulators as
extremely powerful tools for getting all kinds of work done. They are
continuously being improved, nowadays many terminal emulators contain
plenty of features that weren't there in any hardware one. I'm looking
for smoothlessly extending the terminal emulator experience to the RTL
/ BiDi world.

> It's conceivable, that with modern fonts, one can show some characters that 
> could not be supported on the actual legacy hardware, because that was 
> limited by available character memory and available pre-Unicode character 
> sets. As long as the new characters otherwise fit the paradigm (character per 
> cell) they can be supported without other changes in the protocol beyond 
> change in character set.

Which protocol, the protocol of non-BiDi-aware terminals that lays out
everything from left to right, so the output of "echo", "cat" etc. are
reversed; or the protocol of self-proclaimed BiDi-aware terminals
where it's literally impossible to create a proper BiDi-aware text
editor?

My work focuses on proving that both of these modes are needed, and
how the necessary mode switches could happen automatically behind the
scenes.

> However, I would not expect an emulator to accept data in NFD for example.

Many emulators do.


cheers,
egmont

Re: Encoding italic

2019-01-31 Thread David Starner via Unicode

On Thu, Jan 31, 2019 at 12:56 AM Tex  wrote:
>
> David,
>
> "italics has never been considered part of plain text and has always been 
> considered outside of plain text. "
>
> Time to change the definition if that is what is holding you back.

That's not a definition; that's a fact. Again, it's like the 8-bit
byte; there are systems with other sizes of byte, but you usually
shouldn't worry about it. Building systems that don't have 8-bit bytes
are possible, but it's likely to cost more than it's worth.

> As has been said before, interlinear annotation, emoji and other features of 
> Unicode which  are now considered plain text were not in the original 
> definition.

https://www.w3.org/TR/unicode-xml/#Interlinear (which used to be
Unicode Technical Report #20) says "The interlinear annotation
characters were included in Unicode only in order to reserve code
points for very frequent application-internal use. ... Including
interlinear annotation characters in marked-up text does not work
because the additional formatting information (how to position the
annotation,...) is not available. ... The interlinear annotation
characters are also problematic when used in plain text, and are not
intended for that purpose."

Emoji, as have been pointed out several times, were in the original
Unicode standard and date back to the 1980s; the first DOS character
page has similes at 0x01 and 0x02.

> If Unicode encoded an italic mechanism it would be part of plain text, just 
> as the many other styled spaces, dashes and other characters have become 
> plain text despite being typographic.

If Unicode encoded an italic mechanism, then some "plain text" would
include italics. Maybe it would be successful, and maybe it would join
the interlinear annotation characters as another discouraged poorly
supported feature.

> As with the many problems with walls not being effective, you choose to 
> ignore the legitimate issues pointed out on the list with the lack of italic 
> standardization for Chinese braille, text to voice readers, etc.

Text to voice readers don't have problems with the lack of italic
standardization; they have problems with people using mathematical
characters instead of actual letters.

> The choice of plain text isn't always voluntary.

The choice of using single-byte character sets isn't always voluntary.
That's why we should use ISO-2022, not Unicode. Or we can expect
people to fix their systems. What systems are we talking about, that
support Unicode but compel you to use plain text? The use of Twitter
is surely voluntary.

-- 
Kie ekzistas vivo, ekzistas espero.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi,

On Wed, Jan 30, 2019 at 5:31 PM Adam Borowski  wrote:

> The program (emacs in this case) can do arbitrary reordering of characters
> on the grid, it also has lots of information the terminal doesn't.  For
> example, what are you going to do when there's a line longer than what fits
> on the screen?  Emacs will cut and hide part of it; any attempts to reorder
> that paragraph by the terminal are outright broken as you don't _have_ the
> paragraph.  Same for a popup window on the middle of the screen partially
> obscuring some text underneath.

This is absolutely correct so far.

> And if you argue "so make emacs print your
> new code to disable formatting", so do thousands of other programs that are
> less sophisticated than emacs.

Yes, I do argue that emacs will need to print a new escape sequence.
Which is much-much-much-much-much better than having to tell users to
go into the settings of their macOS Terminal / Konsole /
gnome-terminal etc. and disable BiDi there, isn't it?

Could you please give me a brief idea about those "thousands of other
programs" that will need to be adjusted? What other apps can do BiDi?
Not even Vim/NeoVim can do it.

If an app doesn't support BiDi, it's broken anyways when encountering
RTL text. It'll still be broken, just broken differently. Did you mean
all these programs as those thousands?

For ncurses apps there's also a workaround that you could apply:
create a terminfo where the ti/te entries not only switch to/from the
alternate screen but also disable/enable BiDi. In that case all these
thousand ones will be "fixed" (that is: broken in the "old" way rather
than broken in the "new" way).

On the other hand, what you absolutely can *not* do automatically by
emitting escape sequences at the right times, is to enclose the output
of much lighter utilities like "echo", "cat", "grep", "head" and so on
with any kind of BiDi controls.

> On the other hand, all that the program can output is a sequence of Unicode
> codepoints.  These don't include shaping information

With "presentation form" characters, yes, they can, they do including
shaping information.

> and are not supposed
> to.  The shaping is explicitly meant to be done by the terminal,

Why?

> and it's
> the terminal who's equipped with _most_ of the needed data

Why? It's the app that knows the context characters, it's the app that
knows the language.

What is it that the terminal knows, but the app doesn't although
should, or what is it that the terminal doesn't know if presentation
form characters are used?

What is it that the app knows but cannot pass to the terminal?
Shouldn't we then extend the protocol so that it can pass these, too?

e.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi,

> Personally, I think we should simply assume that complex script
> shaping is left to the terminal, and if the terminal cannot do that,
> then that's a restriction of working on a text terminal.

I cannot read any of the Arabic, Syriac etc. scripts, but I did lots
of experimenting with picking random Arabic words, displaying in the
terminal their unshaped as well as shaped (using presentation form
characters) variants, and compared them to pango-view's (harfbuzz's)
rendering.

To my eyes the version I got in the terminal with the presentation
form characters matched the "expected" (pango-view) rendering
extremely closely. Of course there's still some tradeoffs due to fixed
with cells (just as in English, arguably an "i" and "w" having the
same width doesn't look as nice as with proportional fonts). In the
mean time, the unshaped characters looks vastly differently.

> OTOH a terminal emulator who wants to perform shaping needs
> information from the application

And the presentation form characters are pretty much exactly that
information, aren't they (for Arabic)?

> There's nothing you can do here [...] there's no way for the application to 
> provide

Instead of saying that it's not possible, could we perpahs try to
solve it, or at least substantially improve the situation? I mean, for
example we can introduce control characters that specify the language.
We can introduce a flag that tells the terminal whether to do shaping
or not. There are probably plenty of more ideas to be thrown in for
discussion and improvement.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi Eli,

On Wed, Jan 30, 2019 at 5:10 PM Eli Zaretskii  wrote:

> I think the application could use TAB characters to get to the next
> cell, then simplistic reordering would also work.

TAB handling is extremely complicated, because in terminal emulation
TAB is not a character, TAB is a control instruction (like escape
sequences) that moves the cursor (and jumps through the existing
content, if any, without erasing it). Some terminal emulators perform
some magic to remember TABs in certain circumstances, but they cannot
always do so.

There are plenty of other problems, e.g. how they are handled at the
end of line (no, they don't wrap to the next line), how their
positions are user-configurable and not necessarily at every 8th
column etc., I'm not going into these details now if you don't mind,
it's just not a feasible approach.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi Eli,

> Does anyone know of a terminal emulator which supports isolates?

GNOME Terminal's (VTE's) current work-in-progress implementation does
remember BiDi control characters just like it remembers combining
accents, that is, tied to the preceding letter's cell. It uses FriBidi
1.0 for the BiDi work, so yes, it supports Unicode 6.3's isolates.

There's one significant issue, though. Because we currently just
misuse our existing infrastructure of combining accents for the BiDi
controls, BiDi controls at the very beginning of a paragraph are
dropped. Addressing this issue would need core changes to the terminal
emulation behavior, such as introducing in-between-cells storage, or
zero-width special characters belonging to a cell _before_ the cell's
actual letter, or something like this. I outline one idea in my
specification, but it's subject to discussion to finalize it.

(There's also a less significant issue: copy-pasting fragments of text
probably doesn't produce the contents that make the most sense wrt.
BiDi controls. I'm not sure what other software do here, though.)

Mintty is also actively working on BiDi support, I believe its author
just recently added support for isolates. It uses its own BiDi
implementation.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Egmont Koblinger via Unicode

Hi Eli,

> It doesn't do _any_ shaping.  Complex script shaping is left to the
> terminal, because it's impossible to do shaping in any reasonable way
> [...]

Partially, you are right. On the other hand, as far as I know, shaping
should take into account the neighboring glyphs even if those are not
visible (e.g. overflow from the viewport), and the terminal is unaware
of what those glyps are. This is an area that "presentation form"
characters can address for Arabic – although as it was pointed out,
not for Syrian and some others.

I'd say it's subject to further research and improvement to find the
ideal behavior.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Richard Wordingham via Unicode

On Wed, 30 Jan 2019 20:35:36 -0500
"Mark E. Shoulson via Unicode"  wrote:

> On 1/30/19 8:58 AM, Egmont Koblinger via Unicode wrote:
> > There's another side to the entire BiDi story, though. Simple
> > utilities like "echo", "cat", "ls", "grep" and so on, line editing
> > experience of your shell, these kinds. It's absolutely not feasible
> > to add BiDi support to these utilities. Here the only viable
> > approach is to have the terminal emulator do it.  
> 
> How will "ls -l" possibly work?  This is an example of the "table" 
> layout you were already discussing.

I think the answer is that it will use the same trickery as with a
default setting for the --color argument.  Colour codes are emitted
only when the output is a terminal.  Presumably the same would go for
Bidi controls.

> 
> I think us command-line troglodytes just have to deal with not having
> a whole lot of BiDi support.  There's simply no way any terminal
> emulator could possibly know what makes sense and what doesn't for a
> given line of text, coming from some random program.  Your "grep"
> could be grepping from a file with ANY layout, not necessarily one
> conducive to terminal layout, and so on.

So how do editors work now?

To avoid confusion, you will have to work with the terminal set to
having LTR paragraphs (or RTL instead); that is how Notepad works.

Richard.

RE: Encoding italic

2019-01-31 Thread Tex via Unicode

David,

"italics has never been considered part of plain text and has always been 
considered outside of plain text. "

Time to change the definition if that is what is holding you back. As has been 
said before, interlinear annotation, emoji and other features of Unicode which  
are now considered plain text were not in the original definition. If Unicode 
encoded an italic mechanism it would be part of plain text, just as the many 
other styled spaces, dashes and other characters have become plain text despite 
being typographic.

"The fact that italics can be handled elsewhere very much weighs against the 
value of your change. Everything you want to do can be done and is being done, 
except when someone chooses not to do it."

I heard a recent similar argument that goes: walls have been around since 
medieval times and they work really well... (Except they provably don't.)

As with the many problems with walls not being effective, you choose to ignore 
the legitimate issues pointed out on the list with the lack of italic 
standardization for Chinese braille, text to voice readers, etc.
The choice of plain text isn't always voluntary. And the existing alternatives, 
like math italic characters, are problematic.

tex

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of David Starner 
via Unicode
Sent: Wednesday, January 30, 2019 11:59 PM
To: Unicode Mailing List
Subject: Re: Encoding italic

On Wed, Jan 30, 2019 at 11:37 PM James Kass via Unicode
 wrote:
> As Tex Texin observed, differences of opinion as to where we draw the
> line between text and mark-up are somewhat ideological.  If a compelling
> case for handling italics at the plain-text level can be made, then the
> fact that italics can already be handled elsewhere doesn’t matter.  If a
> compelling case cannot be made, there are always alternatives.

To the extent I'd have ideology here, it's that that line is arbitrary
and needs to fit practical demands. Should we have eight-bit bytes?
I'm not sure that was the best solution, and other systems worked just
fine, but we've got a computing environment that makes anything else
unpractical. Unlike that question, italics has never been considered
part of plain text and has always been considered outside of plain
text. The fact that italics can be handled elsewhere very much weighs
against the value of your change. Everything you want to do can be
done and is being done, except when someone chooses not to do it.

-- 
Kie ekzistas vivo, ekzistas espero.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Martin J . Dürst via Unicode

On 2019/01/31 07:02, Richard Wordingham via Unicode wrote:
> On Wed, 30 Jan 2019 15:33:38 +0100
> Frédéric Grosshans via Unicode  wrote:
> 
>> Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :
>>> - It doesn't do Arabic shaping. In my recommendation I'm arguing
>>> that in this mode, where shuffling the characters is the task of
>>> the text editor and not the terminal, so should it be for Arabic
>>> shaping using presentation form characters.
>>
>> I guess Arabic shaping is doable through presentation form
>> characters, because the latter are character inherited from legacy
>> standards using them in such solutions.
> 
> So long as you don't care about local variants, e.g. U+0763 ARABIC
> LETTER KEHEH WITH THREE DOTS ABOVE.  It has no presentation form
> characters.

Same also for characters used for other languages than Arabic.

> Basic Arabic shaping, at the level of a typewriter, is straightforward
> enough to leave to a terminal emulator, as Eli has suggested.  Lam-alif
> would be trickier - one cell or two?

Same for other characters. A medial Beh/Teh/Theh/... (ببب) in any 
reasonably decent rendering should take quite a bit less space than a 
Seen or Sheen (سسس). I remember that the multilingual Emacs version 
mostly written by Ken'ichi Handa (was it called mEmacs or nEmacs or 
something like that?) had different widths only just for Arabic. In 
Thunderbird, which is what I'm using here, I get hopelessly 
stretched/squeezed glyph shapes, which definitely don't look good.

Regards,   Martin.

Re: Encoding italic

2019-01-31 Thread Andrew Cunningham via Unicode

On Thursday, 31 January 2019, James Kass via Unicode 
wrote:.
>
>
> As for use of other variant letter forms enabled by the math
> alphanumerics, the situation exists.  It’s an interesting phenomenon which
> is sometimes worthy of comment and relates to this thread because the math
> alphanumerics include italics.  One of the web pages referring to
> third-party input tools calls the practice “super cool Unicode text magic”.
>
>
Although not all devices can render such text. Many Android handsets on the
market do not have a sufficiently recent version of Android to have system
fonts that can render such existing usage.




-- 
Andrew Cunningham
lang.supp...@gmail.com

Re: Encoding italic

2019-01-31 Thread David Starner via Unicode

On Wed, Jan 30, 2019 at 11:37 PM James Kass via Unicode
 wrote:
> As Tex Texin observed, differences of opinion as to where we draw the
> line between text and mark-up are somewhat ideological.  If a compelling
> case for handling italics at the plain-text level can be made, then the
> fact that italics can already be handled elsewhere doesn’t matter.  If a
> compelling case cannot be made, there are always alternatives.

To the extent I'd have ideology here, it's that that line is arbitrary
and needs to fit practical demands. Should we have eight-bit bytes?
I'm not sure that was the best solution, and other systems worked just
fine, but we've got a computing environment that makes anything else
unpractical. Unlike that question, italics has never been considered
part of plain text and has always been considered outside of plain
text. The fact that italics can be handled elsewhere very much weighs
against the value of your change. Everything you want to do can be
done and is being done, except when someone chooses not to do it.

-- 
Kie ekzistas vivo, ekzistas espero.

RE: Encoding italic

2019-01-30 Thread Tex via Unicode

David, Asmus,
 
·   “without external standards, then it's simply impossible.”

·   “And without external standard, not interoperable.“

As you both know there are de jure as well as de facto standards. So for years 
people typed : - ) as a smiley without a de facto standard and at some point 
long before emoji, systems began converting these to smiley faces.

Even the utf-8 BOM began as one company’s non-interoperable convention for 
encoding identifier which later became part of the de facto standard.

Ideally interoperability means supported everywhere but we have many useful 
mechanisms that simply don’t do harm without being interpreted.

For example, Unicode relies on this for backward compatibility when it 
introduces new characters, properties, algorithms, et al that are not 
understood by all systems but are tolerated by older ones.

=

While I am at it, I am amused by the arguments earlier in this thread as well 
as other threads, that go:

·   If the feature was needed developers would have implemented it by now. 
It isn’t implemented so the standard doesn’t need it.

·   The feature was implemented without the standard, so we don’t need it 
in the standard.

If men were meant to fly they would have wings…

Apparently, for some, it is only when there are many conflicting 
implementations that a feature demonstrates both that it is a requirement and 
also that it should be standardized.

In fact, this is sometimes not a bad view as it prevents adding features to the 
standard that go unused yet add complexity. 

But, it can also set too high a bar. And often it isn’t a true criteria but 
just resistance to change.

You  don’t need italics. When I went to school we just tilted the terminal a 
few degrees and voila.

(You don’t need a car. When I went to school we walked 6 miles to get there. 
Uphill both ways. J )

tex

 

 

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag 
via Unicode
Sent: Wednesday, January 30, 2019 10:20 PM
To: unicode@unicode.org
Subject: Re: Encoding italic

 

On 1/30/2019 7:46 PM, David Starner via Unicode wrote:

On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
 <mailto:unicode@unicode.org>  wrote:

A new beta of BabelPad has been released which enables input, storing,
and display of italics, bold, strikethrough, and underline in plain-text

 
Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.
 

It's either "markdown" or control/tag sequences. Both are out of band 
information.

And without external standard, not interoperable.

A./

Re: Encoding italic

2019-01-30 Thread James Kass via Unicode




David Starner wrote,

>> ... italics, bold, strikethrough, and underline in plain-text
>
> Okay? Ed can do that too, along with nano and notepad. It's called
> HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
> without external standards, then it's simply impossible.

HTML source files are in plain-text.  Hopefully everyone on this list 
understands that and has already explored the marvelous benefits offered 
by granting users the ability to make exciting and effective page 
layouts via any plain-text editor.  HTML is standard and interchangeable.


As Tex Texin observed, differences of opinion as to where we draw the 
line between text and mark-up are somewhat ideological.  If a compelling 
case for handling italics at the plain-text level can be made, then the 
fact that italics can already be handled elsewhere doesn’t matter.  If a 
compelling case cannot be made, there are always alternatives.


As for use of other variant letter forms enabled by the math 
alphanumerics, the situation exists.  It’s an interesting phenomenon 
which is sometimes worthy of comment and relates to this thread because 
the math alphanumerics include italics.  One of the web pages referring 
to third-party input tools calls the practice “super cool Unicode text 
magic”.

Re: Encoding italic

2019-01-30 Thread Asmus Freytag via Unicode


  
  
On 1/30/2019 7:46 PM, David Starner via
  Unicode wrote:


  On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
 wrote:

  
A new beta of BabelPad has been released which enables input, storing,
and display of italics, bold, strikethrough, and underline in plain-text

  
  
Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.



It's either "markdown" or control/tag
sequences. Both are out of band information.
And without external standard, not
interoperable.
A./

Re: Encoding italic

2019-01-30 Thread David Starner via Unicode

On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
 wrote:
> A new beta of BabelPad has been released which enables input, storing,
> and display of italics, bold, strikethrough, and underline in plain-text

Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.

-- 
Kie ekzistas vivo, ekzistas espero.

Re: Encoding italic

2019-01-30 Thread Asmus Freytag via Unicode


  
  
On 1/30/2019 4:38 PM, Kent Karlsson via
  Unicode wrote:


  I did say "multiple" and "for instance". But since you ask:

ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
is implemented in Cygwin (sorry for mentioning a product name).)



No need to be sorry; we understand that the motivation is not so
  much advertising as giving a concrete example. It would be
  interesting if anything out there implements CMY(K). My
  expectation would be that this would be limited to interfaces for
  printers or their emulators.




  
(The "named" ones, though very popular in terminal emulators, are
all much too stark, I think, and the exact colour for them are
implementation defined.)



Muted colors are something that's become more popular as display
  hardware has improved. Modern displays are able to reproduce these
  both more predictably as well as with the necessary degree of
  contrast (although some users'/designer's fetish for low contrast
  text design is almost as bad as people randomly mixing "stark"
  FG/BG colors in the '90s.)




  

ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which
traditionally does not use bold or italic. Compare those specified for CSS
(https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and
https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style).
These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should
be of interest for the generalised subject of this thread.



Mapping all of these to CSS would be essential if you want this
  stuff to be interoperable.




  

There are some other differences as well, but those are the major ones
with regard to text styling. (I don't know those standards to a tee.
I've just looked at the "m" control sequences for text styling. And yes,
I looked at the free copies...)

/Kent Karlsson

PS
If people insist on that EACH character in "plain text" italic/bold/etc
"controls" be default ignorable: one could just take the control sequences
as specified, but map the printable characters part to the corresponding
tag characters... Not that I think that that is really necessary.

Systems that support "markdown", i.e. simplified markup to
  provide the most main-stream features of rich-text tend to do that
  with printable characters, for a reason. Perhaps two reasons.
Users find it preferable to have a visible fallback when
  "markdown" is not interpreted by a receiving system and users'
  generally like the ability to edit the markdown directly (even if,
  for convenience) there's some direct UI support for adding text
  styling.
Loading up the text with lots of invisible characters that may be
  deleted or copied out of order by someone working on a system that
  neither interprets nor displays these code points is an
  interoperability nightmare in my opinion.




  


Den 2019-01-30 22:24, skrev "Doug Ewell via Unicode" :


  
Kent Karlsson wrote:
 


  Yes, great. But as I've said, we've ALREADY got a
default-ignorable-in-display (if implemented right)
way of doing such things.

And not only do we already have one, but it is also
standardised in multiple standards from different
standards institutions. See for instance "ISO/IEC 8613-6,
Information technology --- Open Document Architecture (ODA)
and Interchange Format: Character content architecture".


 
I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
has the advantage of not costing me USD 179, and it looks very similar
to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
are talking about: setting text display properties such as bold and
italics by means of escape sequences.
 
Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
doing, and if it does not, why we should not simply refer to the more
familiar 6429?
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Mark E. Shoulson via Unicode


On 1/30/19 8:58 AM, Egmont Koblinger via Unicode wrote:

There's another side to the entire BiDi story, though. Simple
utilities like "echo", "cat", "ls", "grep" and so on, line editing
experience of your shell, these kinds. It's absolutely not feasible to
add BiDi support to these utilities. Here the only viable approach is
to have the terminal emulator do it.


How will "ls -l" possibly work?  This is an example of the "table" 
layout you were already discussing.


I think us command-line troglodytes just have to deal with not having a 
whole lot of BiDi support.  There's simply no way any terminal emulator 
could possibly know what makes sense and what doesn't for a given line 
of text, coming from some random program.  Your "grep" could be grepping 
from a file with ANY layout, not necessarily one conducive to terminal 
layout, and so on.


~mark

Re: Encoding italic

2019-01-30 Thread Kent Karlsson via Unicode

I did say "multiple" and "for instance". But since you ask:

ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
is implemented in Cygwin (sorry for mentioning a product name).)
(The "named" ones, though very popular in terminal emulators, are
all much too stark, I think, and the exact colour for them are
implementation defined.)

ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which
traditionally does not use bold or italic. Compare those specified for CSS
(https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and
https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style).
These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should
be of interest for the generalised subject of this thread.

There are some other differences as well, but those are the major ones
with regard to text styling. (I don't know those standards to a tee.
I've just looked at the "m" control sequences for text styling. And yes,
I looked at the free copies...)

/Kent Karlsson

PS
If people insist on that EACH character in "plain text" italic/bold/etc
"controls" be default ignorable: one could just take the control sequences
as specified, but map the printable characters part to the corresponding
tag characters... Not that I think that that is really necessary.

Den 2019-01-30 22:24, skrev "Doug Ewell via Unicode" :

> Kent Karlsson wrote:
>  
>> Yes, great. But as I've said, we've ALREADY got a
>> default-ignorable-in-display (if implemented right)
>> way of doing such things.
>> 
>> And not only do we already have one, but it is also
>> standardised in multiple standards from different
>> standards institutions. See for instance "ISO/IEC 8613-6,
>> Information technology --- Open Document Architecture (ODA)
>> and Interchange Format: Character content architecture".
>  
> I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
> has the advantage of not costing me USD 179, and it looks very similar
> to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
> are talking about: setting text display properties such as bold and
> italics by means of escape sequences.
>  
> Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
> doing, and if it does not, why we should not simply refer to the more
> familiar 6429?
>  
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Asmus Freytag via Unicode


  
  
Arabic terminals and terminal emulators
  existed at the time of Unicode 1.0. If you are trying to emulate
  those services, for example so that older software can run, you
  would need to look at how these programs expected to be fed their
  data.


I see little reason to reinvent things
  here, because we are talking about emulating legacy hardware. Or
  are we not?


It's conceivable, that with modern
  fonts, one can show some characters that could not be supported on
  the actual legacy hardware, because that was limited by available
  character memory and available pre-Unicode character sets. As long
  as the new characters otherwise fit the paradigm (character per
  cell) they can be supported without other changes in the protocol
  beyond change in character set.


However, I would not expect an emulator
  to accept data in NFD for example.


A./





On 1/30/2019 2:02 PM, Richard
  Wordingham via Unicode wrote:


  On Wed, 30 Jan 2019 15:33:38 +0100
Frédéric Grosshans via Unicode  wrote:


  
Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :


  - It doesn't do Arabic shaping. In my recommendation I'm arguing
that in this mode, where shuffling the characters is the task of
the text editor and not the terminal, so should it be for Arabic
shaping using presentation form characters.  



I guess Arabic shaping is doable through presentation form
characters, because the latter are character inherited from legacy
standards using them in such solutions.

  
  
So long as you don't care about local variants, e.g. U+0763 ARABIC
LETTER KEHEH WITH THREE DOTS ABOVE.  It has no presentation form
characters.

Basic Arabic shaping, at the level of a typewriter, is straightforward
enough to leave to a terminal emulator, as Eli has suggested.  Lam-alif
would be trickier - one cell or two?


  
But if you want to support
other “arabic like” scripts (like Syriac, N’ko), or even some LTR
complex scripts, like Myanmar or Khmer, this “solution” cannot work,
because no equivalent of “presentation form characters” exists for
these scripts

  
  
I believe combining marks present issues even in implicit modes.  In
implicit mode, one cannot simply delegate the task to normal text
rendering, for one has to allocate text to cells.  There are a number
of complications that spring to mind:

1) Some characters decompose to two characters that may otherwise lay
claim to their own cells:

U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2,
0654>.  Do you intend that your scheme be usable by Unicode-compliant
processes?

2) 2-part vowels, such as U+0D4A MALAYALAM VOWEL SIGN O, which
canonically decomposes into a preceding combining mark U+0D46 MALAYALAM
VOWEL SIGN E and following combining mark U+0D3E MALAYALAM VOWEL SIGN
AA.

3) Similar 2-part vowels that do not decompose, such as U+17C4 KHMER
VOWEL SIGN OO.  OpenType layout decomposes that into a preceding
'U+17C1 KHMER VOWEL SIGN E' and the second part.

4) Indic conjuncts.
(i) There are some conjuncts, such as Devanagari K.SSA, where a
display as ,  is simply unacceptable.  In some
closely related scripts, this conjunct has the status of a character.

(ii) In some scripts, e.g. Khmer, the virama-equivalent is not an
acceptable alternative to form a consonant stack.  Khmer could
equally well have been encoded with a set of subscript consonants in
the same manner as Tibetan.

(iii) In some scripts, there are marks named as medial consonants
which function in exactly the same way as <'virama', consonant>; it is
silly to render them in entirely different manners.

5) Some non-spacing marks are spacing marks in some contexts.  U+102F
MYANMAR VOWEL SIGN U is probably the best known example.

Richard.

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Richard Wordingham via Unicode

On Wed, 30 Jan 2019 15:33:38 +0100
Frédéric Grosshans via Unicode  wrote:

> Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :
> > - It doesn't do Arabic shaping. In my recommendation I'm arguing
> > that in this mode, where shuffling the characters is the task of
> > the text editor and not the terminal, so should it be for Arabic
> > shaping using presentation form characters.  
> 
> I guess Arabic shaping is doable through presentation form
> characters, because the latter are character inherited from legacy
> standards using them in such solutions.

So long as you don't care about local variants, e.g. U+0763 ARABIC
LETTER KEHEH WITH THREE DOTS ABOVE.  It has no presentation form
characters.

Basic Arabic shaping, at the level of a typewriter, is straightforward
enough to leave to a terminal emulator, as Eli has suggested.  Lam-alif
would be trickier - one cell or two?

> But if you want to support
> other “arabic like” scripts (like Syriac, N’ko), or even some LTR
> complex scripts, like Myanmar or Khmer, this “solution” cannot work,
> because no equivalent of “presentation form characters” exists for
> these scripts

I believe combining marks present issues even in implicit modes.  In
implicit mode, one cannot simply delegate the task to normal text
rendering, for one has to allocate text to cells.  There are a number
of complications that spring to mind:

1) Some characters decompose to two characters that may otherwise lay
claim to their own cells:

U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2,
0654>.  Do you intend that your scheme be usable by Unicode-compliant
processes?

2) 2-part vowels, such as U+0D4A MALAYALAM VOWEL SIGN O, which
canonically decomposes into a preceding combining mark U+0D46 MALAYALAM
VOWEL SIGN E and following combining mark U+0D3E MALAYALAM VOWEL SIGN
AA.

3) Similar 2-part vowels that do not decompose, such as U+17C4 KHMER
VOWEL SIGN OO.  OpenType layout decomposes that into a preceding
'U+17C1 KHMER VOWEL SIGN E' and the second part.

4) Indic conjuncts.
(i) There are some conjuncts, such as Devanagari K.SSA, where a
display as ,  is simply unacceptable.  In some
closely related scripts, this conjunct has the status of a character.

(ii) In some scripts, e.g. Khmer, the virama-equivalent is not an
acceptable alternative to form a consonant stack.  Khmer could
equally well have been encoded with a set of subscript consonants in
the same manner as Tibetan.

(iii) In some scripts, there are marks named as medial consonants
which function in exactly the same way as <'virama', consonant>; it is
silly to render them in entirely different manners.

5) Some non-spacing marks are spacing marks in some contexts.  U+102F
MYANMAR VOWEL SIGN U is probably the best known example.

Richard.

Re: Encoding italic

2019-01-30 Thread Doug Ewell via Unicode

Kent Karlsson wrote:

> Yes, great. But as I've said, we've ALREADY got a
> default-ignorable-in-display (if implemented right)
> way of doing such things.
>
> And not only do we already have one, but it is also
> standardised in multiple standards from different
> standards institutions. See for instance "ISO/IEC 8613-6,
> Information technology --- Open Document Architecture (ODA)
> and Interchange Format: Character content architecture".

I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
has the advantage of not costing me USD 179, and it looks very similar
to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
are talking about: setting text display properties such as bold and
italics by means of escape sequences.

Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
doing, and if it does not, why we should not simply refer to the more
familiar 6429?

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-30 Thread Doug Ewell via Unicode

Martin J. Dürst wrote:

> Here's a little dirty secret about these tag characters: They were
> placed in one of the astral planes explicitly to make sure they'd use
> 4 bytes per tag character, and thus quite a few bytes for any actual
> complete tags.

Aha. That explains why SCSU had to be banished to the hut, right around
the same time the Plane 14 language tags were deprecated. In SCSU,
astral characters can be 1 byte just like BMP characters.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Adam Borowski via Unicode

On Wed, Jan 30, 2019 at 05:56:00PM +0200, Eli Zaretskii via Unicode wrote:
> > - It doesn't do Arabic shaping.
> 
> It doesn't do _any_ shaping.  Complex script shaping is left to the
> terminal, because it's impossible to do shaping in any reasonable way
> without controlling the fonts being used and accessing the font
> information, and this is not possible when you run on a terminal

It's the inverse of the situation with RTL reordering.  The interface
between the program and the terminal is a character cell grid (really, a
sequence of printables and \e-based codes, but that's a technical detail).

The program (emacs in this case) can do arbitrary reordering of characters
on the grid, it also has lots of information the terminal doesn't.  For
example, what are you going to do when there's a line longer than what fits
on the screen?  Emacs will cut and hide part of it; any attempts to reorder
that paragraph by the terminal are outright broken as you don't _have_ the
paragraph.  Same for a popup window on the middle of the screen partially
obscuring some text underneath.  And if you argue "so make emacs print your
new code to disable formatting", so do thousands of other programs that are
less sophisticated than emacs.

On the other hand, all that the program can output is a sequence of Unicode
codepoints.  These don't include shaping information, and are not supposed
to.  The shaping is explicitly meant to be done by the terminal, and it's
the terminal who's equipped with _most_ of the needed data (it might lack
context just outside screen's end or under an overlapped window, but that's
not specific to complex shaping -- same can happen for the other half of a
CJK character).  You know if the font used supports shaping, you can have
access to a graphic view (as opposed to the array of codepoints) -- heck,
it's only you who know the text is rendered on a screen rather than a
Braille device.  And if you miss an opportunity to shape something, the
result is still readable to the user, merely not as good as it could be.

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode

> Date: Wed, 30 Jan 2019 15:49:34 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode 
> 
> I outline in the document problems that arise from the terminal
> emulator performing shaping on its contents in "explicit" mode, which
> is to be used by Emacs and others. The terminal emulator is not aware
> of the characters that are chopped off at the edge of the screen,
> required for shaping. The terminal emulator is not aware of which
> characters happen to be placed next to each other, but belong to
> semantically different UI elements, that is, shouldn't be shaped.
> 
> (And as a side note, FriBidi doesn't provide a method for doing
> shaping on _visual_ order. I'm unsure about other libraries, and
> unsure if there's an algorithm for it at all.)
> 
> Honestly, I have no idea how to best address all these problems at
> once. This is where we can think of extensions "expliti mode level 2",
> use control characters that explicitly specify how to shape certain
> glyphs. This is subject to further research.

Personally, I think we should simply assume that complex script
shaping is left to the terminal, and if the terminal cannot do that,
then that's a restriction of working on a text terminal.  There's
nothing you can do here, because correct shaping requires too many
features that applications running on text terminals cannot use, and
OTOH a terminal emulator who wants to perform shaping needs
information from the application (like the directionality of the text
and its language) that there's no way for the application to provide.

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode

> Date: Wed, 30 Jan 2019 15:25:32 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode 
> 
> > ╒═══╤══╕
> > │ filename1 │  123 │
> > │ FILENAME2 │   17 │
> > └───┴──┘
> >
> > I'm afraid there's no good way to do BiDi without support from individual
> > programs.
> 
> In this particular example, when the output consists of RTL text in
> logical order (the emitter does not reorder the characters to their
> visual order, nor emit any BiDi controls), combined with line drawing
> and such, there is hardly anything we could do purely on the terminal
> emulator's side.

I think the application could use TAB characters to get to the next
cell, then simplistic reordering would also work.

But in general, yes: this is one of the examples why sophisticated
text-editing applications cannot leave this to the terminal.  (Another
example is handling mouse clicks, if the terminal supports that.)

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode

> Date: Wed, 30 Jan 2019 15:07:22 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode 
> 
> Another possible approach is to leave the terminal doing BiDi, but
> embed all the text fragments in FSI...PDI blocks.

Does anyone know of a terminal emulator which supports isolates?

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode

> From: Egmont Koblinger 
> Date: Wed, 30 Jan 2019 14:36:42 +0100
> Cc: unicode@unicode.org
> 
> - GNU Emacs reshuffles the characters according to the BiDi algorithm,
> expecting that the terminal emulator doesn't do any BiDi.

Yes, users are told to disable bidi reordering of the terminal, if the
terminal supports that.

> - It doesn't do Arabic shaping.

It doesn't do _any_ shaping.  Complex script shaping is left to the
terminal, because it's impossible to do shaping in any reasonable way
without controlling the fonts being used and accessing the font
information, and this is not possible when you run on a terminal --
the user configures the terminal emulator, and the emulator chooses
the fonts it likes/needs according to that configuration.

(Emacs does support complex script shaping on GUI displays.)

> - When it comes to visually wrapping a line because it doesn't fit in
> the current width, Emacs goes its own way which doesn't match what the
> Unicode BiDi algorithm says.

Yes, this deviation is documented in the Emacs manuals.  The reason
for that is that the Emacs implementation of the UBA reorders
characters on the fly (i.e., it implements a function that can be
called character by character, and returns the next character in the
visual order).  This was done due to a special structure of the Emacs
display engine and efficiency considerations.

In practice, this problem happens very rarely.

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

> A formatted table is pretty unsuitable for automated processing, and
> obviously meant for human display.

Could you please clarify how exactly that data looks like? Maybe a
tiny hexdump of an example?

Is the RTL piece of text already stored in visual order, that is,
beginning with the leftmost (last logical) letter of the word? If so
then you can sure display it properly in BiDi-unaware rendering
engines (including most terminal emulators currently, as well as in
"explicit" mode according to my specification). That is, whoever
produces that data reverses that word for you?

Or is the RTL piece of text still in its logical order? Then in what
piece of software does this formatted data show up to you in a
readable way?

> You're a terminal emulator maintainer, thus it's natural for you to think
> it's the right place to come up with a solution.

No. I've been a maintainer/developer/contributor to all kinds of
software, including (but not limited to) terminal emulators, apps
running inside terminal emulators, or a pretty complex RTL homepage.
I'm doing my best in looking at the entire ecosystem, and coming up
with a good BiDi-aware interface between terminal emulators and
applications.

> I'd argue that it's not --
> all a terminal emulator can do is to display already formatted text, there's
> no sane way to move things around.

You missed that your use case with this table is not the only possible
use case. There are others where the terminal needs to do BiDi. My
work aims to address multiple use cases at once, yours being one of
them.


cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Adam Borowski via Unicode

On Wed, Jan 30, 2019 at 03:43:10PM +0100, Egmont Koblinger via Unicode wrote:
> On Wed, Jan 30, 2019 at 3:32 PM Adam Borowski  wrote:
> 
> > > > ╒═══╤══╕
> > > > │ filename1 │  123 │
> > > > │ FILENAME2 │   17 │
> > > > └───┴──┘
> 
> > That's possible only if the program in question is running directly attached
> > to the tty.  That's not an option if the output is redirected.  Frames in
> > a plain text file are a perfectly rational, and pretty widespread, use --
> > and your proposal will break them all.  Be it "cat" to the screen, "less" or
> > even "mutt" if the text was sent via a mail.
> 
> I'd argue that if you have such a data stored in a file, with logical
> order used in Arabic or Hebrew text, combined with line drawing chars
> as you showed, then your data is broken to begin with – broken in the
> sense that it's suitable for automated processing (*), but not for
> display.

A formatted table is pretty unsuitable for automated processing, and
obviously meant for human display.

> (*) but then line drawing chars are not really a nice choice over CSV,
> JSON, whatever.

That's why you use CSV and JSON for machine-readable, plain text for humans,
and XML for neither.

> The only possible choice is for some display engine to be aware that
> line drawing characters are part of a "higher level protocol", and
> BiDi should be applied only in the lower scope. I don't think the
> terminal emulator is the right place to make such decisions

At this point, required information is lost.  Any transformations such as
RTL reordering needs to be done earlier, when you still see _unformatted_
version of the data.

You're a terminal emulator maintainer, thus it's natural for you to think
it's the right place to come up with a solution.  I'd argue that it's not --
all a terminal emulator can do is to display already formatted text, there's
no sane way to move things around.  Any changes need to be localized -- for
example, you can do ligatures only if you keep total length unchanged.  Ie,
the terminal emulator is the right layer for things like complex script
shaping, but not RTL reordering.

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Frédéric,

> I guess Arabic shaping is doable through presentation form characters,
> because the latter are character inherited from legacy standards using
> them in such solutions. But if you want to support other “arabic like”
> scripts (like Syriac, N’ko), or even some LTR complex scripts, like
> Myanmar or Khmer, this “solution” cannot work, because no equivalent of
> “presentation form characters” exists for these scripts

Unfortunately my knowledge ends here, I'm not familiar with shaping
for Syriac and other similar scripts. I'd really appreciate input from
experts here.

I outline in the document problems that arise from the terminal
emulator performing shaping on its contents in "explicit" mode, which
is to be used by Emacs and others. The terminal emulator is not aware
of the characters that are chopped off at the edge of the screen,
required for shaping. The terminal emulator is not aware of which
characters happen to be placed next to each other, but belong to
semantically different UI elements, that is, shouldn't be shaped.

(And as a side note, FriBidi doesn't provide a method for doing
shaping on _visual_ order. I'm unsure about other libraries, and
unsure if there's an algorithm for it at all.)

Honestly, I have no idea how to best address all these problems at
once. This is where we can think of extensions "expliti mode level 2",
use control characters that explicitly specify how to shape certain
glyphs. This is subject to further research.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

On Wed, Jan 30, 2019 at 3:32 PM Adam Borowski  wrote:

> > > ╒═══╤══╕
> > > │ filename1 │  123 │
> > > │ FILENAME2 │   17 │
> > > └───┴──┘

> That's possible only if the program in question is running directly attached
> to the tty.  That's not an option if the output is redirected.  Frames in
> a plain text file are a perfectly rational, and pretty widespread, use --
> and your proposal will break them all.  Be it "cat" to the screen, "less" or
> even "mutt" if the text was sent via a mail.

I'd argue that if you have such a data stored in a file, with logical
order used in Arabic or Hebrew text, combined with line drawing chars
as you showed, then your data is broken to begin with – broken in the
sense that it's suitable for automated processing (*), but not for
display. I can't think of any utility that would display it properly,
because that's not what the Unicode BiDi algorithm run over this data
produces.

(*) but then line drawing chars are not really a nice choice over CSV,
JSON, whatever.

The only possible choice is for some display engine to be aware that
line drawing characters are part of a "higher level protocol", and
BiDi should be applied only in the lower scope. I don't think the
terminal emulator is the right place to make such decisions – I don't
think any other generic tool (graphical word processor, browser etc.)
does make such a call either.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Frédéric Grosshans via Unicode


Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :

- It doesn't do Arabic shaping. In my recommendation I'm arguing that
in this mode, where shuffling the characters is the task of the text
editor and not the terminal, so should it be for Arabic shaping using
presentation form characters.


I guess Arabic shaping is doable through presentation form characters, 
because the latter are character inherited from legacy standards using 
them in such solutions. But if you want to support other “arabic like” 
scripts (like Syriac, N’ko), or even some LTR complex scripts, like 
Myanmar or Khmer, this “solution” cannot work, because no equivalent of 
“presentation form characters” exists for these scripts

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Adam Borowski via Unicode

On Wed, Jan 30, 2019 at 03:25:32PM +0100, Egmont Koblinger via Unicode wrote:
> One more note, to hopefully clarify:
> > ╒═══╤══╕
> > │ filename1 │  123 │
> > │ FILENAME2 │   17 │
> > └───┴──┘
> >
> > I'm afraid there's no good way to do BiDi without support from individual
> > programs.
> 
> In this particular example, when the output consists of RTL text in
> logical order (the emitter does not reorder the characters to their
> visual order, nor emit any BiDi controls), combined with line drawing
> and such, there is hardly anything we could do purely on the terminal
> emulator's side.
> 
> I did not consider the possibility of certain characters (e.g. line
> drawing ones) being "stop characters", and BiDi to get applied only in
> runs of other characters. Any such magic would be arbitrary, fix a
> subset of the cases while cause other unforeseen breakages elsewhere.
> E.g. what if someone intentionally uses these characters as
> letter-like ones in a BiDi text, like """here I'm talking about the
> '└' shaped corner"""... or what if poor man's ASCII pipe and other
> symbols are used... it's way too risky to go into any kind of
> heuristics.
> 
> In this particular case the terminal cannot magically fix the output
> for you, you'll need to get the application fixed.

That's possible only if the program in question is running directly attached
to the tty.  That's not an option if the output is redirected.  Frames in
a plain text file are a perfectly rational, and pretty widespread, use --
and your proposal will break them all.  Be it "cat" to the screen, "less" or
even "mutt" if the text was sent via a mail.

I don't really see a possibility to do that on the terminal's side, any RTL
reordering would need to be done by the program in question.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Adam,

One more note, to hopefully clarify:

> ╒═══╤══╕
> │ filename1 │  123 │
> │ FILENAME2 │   17 │
> └───┴──┘
>
> I'm afraid there's no good way to do BiDi without support from individual
> programs.

In this particular example, when the output consists of RTL text in
logical order (the emitter does not reorder the characters to their
visual order, nor emit any BiDi controls), combined with line drawing
and such, there is hardly anything we could do purely on the terminal
emulator's side.

I did not consider the possibility of certain characters (e.g. line
drawing ones) being "stop characters", and BiDi to get applied only in
runs of other characters. Any such magic would be arbitrary, fix a
subset of the cases while cause other unforeseen breakages elsewhere.
E.g. what if someone intentionally uses these characters as
letter-like ones in a BiDi text, like """here I'm talking about the
'└' shaped corner"""... or what if poor man's ASCII pipe and other
symbols are used... it's way too risky to go into any kind of
heuristics.

In this particular case the terminal cannot magically fix the output
for you, you'll need to get the application fixed.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Adam,

> Even a line is way too big a piece to be safely reordered by the terminal.
> What you propose will break every full-screen program that uses line-drawing
> characters:

Certain terminal emulators already perform BiDi on their lines. They
already break every full-screen program with line-drawing and such, as
you pointed out. What my proposal adds, amongst plenty of other
things, is a means to automatically disable the terminal's BiDi,
rather than having to go to its settings. This way you can automate
the fix of the apps that aren't explicitly fixed, e.g. via wrapper
scripts, or terminfo entries with special ti/te definitions.

> I'm afraid there's no good way to do BiDi without support from individual
> programs.

Depends on the use case.

For complex apps, like text editors, you are right, the terminal
emulator must stay out of the game.

For simple utilities, like "cat" and friends, there's no way you can
implement BiDi support in "cat" itself. Here the terminal needs to do
it.

Your use case with tables is perhaps somewhat in the middle. One
possible approach for the emitting utility is to disable BiDi in the
terminal (switch to "explicit" mode) for the scope of this output.
Another possible approach is to leave the terminal doing BiDi, but
embed all the text fragments in FSI...PDI blocks. (This latter is
subject to a bit of further research, to be exactly specified in a
forthcoming version of the specs.)

What is extremely tough here is realizing that there are multiple
conflicting requirements (including the example you gave), and coming
up with a soluiton that satisfies the needs of all. This is what my
work aims to do.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Eli,

> My personal experience with bringing BiDi to Emacs led me to a firm
> conclusion that BiDi support by terminal emulators cannot be relied on
> by sophisticated text editing and display applications that are
> BiDi-aware.  The terminal emulator can never be smart enough to do
> what the editing needs require, so the application eventually ends up
> jumping through hoops in order to trick the terminal into doing TRT.
> It is easier to tell users to disable BiDi support of the terminal (if
> it even has one), and do everything in the app.  This is the only way
> of having full control of what is displayed, especially when
> "higher-level protocols" need to be used to tailor the UBA to the need
> of the user, because there's usually no way of asking the terminal to
> apply a behavior which deviates from the UBA.

We are absolutely on the same page here. As long as the use case is
text editing or something similar, it's harmful if the terminal
emulator aims to do any BiDi.

Having to tell users to turn off BiDi in the emulator's settings is in
my firm opinion a user experience no-go. It has to be automatic,
happen under the hood, that is, using escape sequences.

There's another side to the entire BiDi story, though. Simple
utilities like "echo", "cat", "ls", "grep" and so on, line editing
experience of your shell, these kinds. It's absolutely not feasible to
add BiDi support to these utilities. Here the only viable approach is
to have the terminal emulator do it.

Hence, as I confirm ECMA TR/53's realization of 28 years ago, there
have to be two substantially different modes. "Explicit" mode for what
you need for Emacs: the terminal to stay out of the game; and
"implicit" mode where the terminal performs BiDi for the sake of "cat"
and other simple utiltiies.

I'm also arguing that contrary to TR/53, there's no way to hook up a
mode switch to "cat" and a gazillion of other similar tools. The only
reaslisticly implementable approach is if the "implicit" mode is the
default so that simple utilities provide a proper BiDi experience.
Those very few fullscreen apps that do know what they are doing and do
want the terminal to leave the characters at their designated place
(such as Emacs, Vim etc.) will have to request this "explicit" mode
from the terminal.

cheers,
egmont

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Egmont Koblinger via Unicode

Hi Eli,

> > In turn, vim, emacs and friends stand there clueless, not knowing
> > how to do BiDi in terminals.
>
> This is inaccurate: [...]

I have to admit, I was somewhat sloppy in the phrasing of this
announcement. My bad, apologies.

Currently some terminal emulators shuffle the characters around for
display purposes, while most don't. There's absolutely no way an
editor (no matter if Emacs or any other) could produce the desired
look on both kinds. I actually present a proof that an editor cannot
always produce the desired look on ones that shuffle their contents
around. So it's a somewhat reasonable expectation to produce the
desired look on ones that don't shuffle their cells.

In the document, more precisely at [1] I evalute my findings with GNU
Emacs 25.2. (I've just fixed the page to add "GNU", thanks for
pointing this out!)

Brief summary:

- GNU Emacs reshuffles the characters according to the BiDi algorithm,
expecting that the terminal emulator doesn't do any BiDi.

- According to my recommendation, in order to address BiDi in the
entire ecosystem around terminal emulators, the default behavior will
have to be that terminals shuffle the characters around. Don't worry,
there'll be a mode where this shuffling doesn't occur. Emacs (and all
other BiDi-aware text editors) will have to switch to this mode.

- It doesn't do Arabic shaping. In my recommendation I'm arguing that
in this mode, where shuffling the characters is the task of the text
editor and not the terminal, so should it be for Arabic shaping using
presentation form characters.

- When it comes to visually wrapping a line because it doesn't fit in
the current width, Emacs goes its own way which doesn't match what the
Unicode BiDi algorithm says. I'm not saying Emacs's behavior is bad
per se or unreasonable, and it's out of the scope of my work to try to
get it changed, but I'm making a note that it's different.

[1] https://terminal-wg.pages.freedesktop.org/bidi/prior-work/applications.html

cheers,
egmont

Re: Encoding italic

2019-01-29 Thread Kent Karlsson via Unicode

Yes, great. But as I've said, we've ALREADY got a
default-ignorable-in-display (if implemented right)
way of doing such things.

And not only do we already have one, but it is also
standardised in multiple standards from different
standards institutions. See for instance "ISO/IEC 8613-6,
Information technology --- Open Document Architecture (ODA)
and Interchange Format: Character content architecture".
(In a little experiment I found that it seems that
Cygwin is one of the better implementations of this;
B.t.w. I have no relation to Cygwin other than using it.)

To boot, it's been around for decades and is still
alive and well. I see absolutely no need for a "bold"
new concept here; the one below is not better in any
significant way.

/Kent Karlsson

Den 2019-01-29 23:35, skrev "Andrew West via Unicode" :

> On Mon, 28 Jan 2019 at 01:55, James Kass via Unicode
>  wrote:
>> 
>> This 󠀼󠁢󠀾bold󠀼󠀯󠁢󠀾 new concept was not mine.  When I tested it
>> here, I was using the tag encoding recommended by the developer.
> 
> Congratulations James, you've successfully interchanged tag-styled
> plain text over the internet with no adverse side effects. I copied
> your email into BabelPad and your "bold" is shown bold (see attached
> screenshot).
> 
> Andrew

Re: Encoding italic

2019-01-29 Thread Andrew West via Unicode

On Mon, 28 Jan 2019 at 01:55, James Kass via Unicode
 wrote:
>
> This 󠀼󠁢󠀾bold󠀼󠀯󠁢󠀾 new concept was not mine.  When I tested it
> here, I was using the tag encoding recommended by the developer.

Congratulations James, you've successfully interchanged tag-styled
plain text over the internet with no adverse side effects. I copied
your email into BabelPad and your "bold" is shown bold (see attached
screenshot).

Andrew

Re: Encoding italic

2019-01-29 Thread James Kass via Unicode




Doug Ewell wrote,

> I can't speak for Andrew, but I strongly suspect he implemented this as
> a proof of concept, not to declare himself the Maker of Standards.

BabelPad also offers plain-text styling via math-alpha conversion, 
although this feature isn’t newly added.  Users interested in seeing how 
plain-text italics might work can try out the stateful approach using 
tags contrasted with the character-by-character approach using 
math-range italic letters.  (Of course, the math-range stuff is already 
being interchanged on the WWW, whilst the tagging method does not yet 
appear to be widely supported.)


A few miles upthread, ‘where are the third-party developers’ was asked.  
‘Everywhere’ is the answer.  Since third-party developers have to 
subsist on the crumbs dropped by the large corps, they tend to be 
responsive to user needs and requests.

Re: Encoding italic

2019-01-29 Thread James Kass via Unicode




On 2019-01-29 5:10 PM, Doug Ewell via Unicode wrote:

I thought we had established that someone had mentioned it on this list,
at some time during the past three weeks. Can someone look up what post
that was? I don't have time to go through scores of messages, and there
is no search facility.

http://www.unicode.org/mail-arch/unicode-ml/y2019-m01/0209.html

Re: Ancient Greek apostrophe marking elision

2019-01-29 Thread Richard Wordingham via Unicode

On Mon, 28 Jan 2019 20:55:39 -0500
"Mark E. Shoulson via Unicode"  wrote:

> On 1/28/19 2:31 AM, Mark Davis ☕️ via Unicode wrote:
> >
> > But the question is how important those are in daily life. I'm not 
> > sure why the double-click selection behavior is so much more of a 
> > problem for Ancient Greek users than it is for the somewhat larger 
> > community of English users. Word selection is not normally as 
> > important an operation as line break, which does work as expected.  
> 
> This is a good point.  Bottom line is that word-selection, at least,
> is not going to be _exactly_ right.  Oh, and for another example,
> note that Esperanto also regularly (in poetry, anyway) uses a
> word-final apostrophe (of some kind) to indicate elision of the final
> -o of a nominative singular noun, or the -a of the article "la".
> What shall we say to Esperantists who can't correctly the third word
> in «al la mond’ eterne militanta / Ĝi promesas sanktan harmonion»?  I
> guess "Suck it up and deal with it."  And that may indeed be the
> answer.

Who's going to punish them for using U+02BC?

I found some documentation of an Ancient Greek spell-checker for
OpenOffice.  It listed problem with the apostrophe as one of its
shortcomings.

Richard.

Re: Ancient Greek apostrophe marking elision

2019-01-29 Thread Richard Wordingham via Unicode

On Mon, 28 Jan 2019 21:10:19 -0500
"Mark E. Shoulson via Unicode"  wrote:

> On 1/28/19 3:58 PM, Richard Wordingham via Unicode wrote:
> > Interestingly, bringing this word breaker into line with TUS in the
> > UK may well be in breach of the Equality Act 2010.
> >
> > Richard.  
> 
> OK, I've got to ask: how would that be?  How would this impinge on 
> anyone's equality on the basis of "age, disability, gender
> reassignment, marriage and civil partnership, pregnancy and
> maternity, race, religion or belief, sex, and sexual orientation"?
> (quote from WP)

The most relevant clauses are 9(1), 9(4), 19(2), 29(5) and 29(7).

The change would restrict Thais' access to the provision of a service.
The service provided is to allow one to use a persistent, correctable
spell-checking system for one's native language.  Firefox and
LibreOffice provide this service.  Of course, one may have to supply
the spell-checking databases oneself.  Withdrawing this service for
some ethnic groups would be breach of the law.

By persistent, I means that corrections to the spell-checking remain
when the text is revisited.  For English plain-text, the easy
correction is to remove false positives by adding the word to
'personal dictionaries'.   The difficult correction, not always
possible, is to remove the word from the spell-checker's word list.

For scriptio continua scripts, line_break=complex_context in UCD terms,
there is the additional problem that word-breaking is not infrequently
wrong, even for Thai in Thai script.  (Recent loanwords into Thai can
be a nightmare.  So is Pali in Thai script, though Pali spell-checking
has its own issues.)  Line-breaking can be corrected with WJ and ZWSP.
At present, word-breaking can currently be corrected by inserting these
characters, and then spelling can be negotiated - the visible
characters are non-negotiable. The changes in the text will persist in
plain text. If WJ ceases to be treated as joining words, then the
service of a persistent, *correctable* spell-checking system is lost.

Now, one defence to the denial of the service would be that it would be
unreasonably difficult to allow users to solve the problem of
word-breaks in the wrong place.  However, if one is already providing
that service, that defence cannot be applied to subsequently denying
the service.

Richard.

Re: Encoding italic

2019-01-29 Thread Andrew West via Unicode

On Tue, 29 Jan 2019 at 10:25, Martin J. Dürst via Unicode
 wrote:
>
> The overall tag proposal had the desired effect: The original proposal
> to hijack some unused bytes in UTF-8 was defeated, and the tags itself
> were not actually used and therefore could be depreciated.

And the tag characters (all except E0001) are now no longer
deprecated. As flag tag sequences are now a thing
(http://www.unicode.org/reports/tr51/#valid-emoji-tag-sequences), and
are widely supported (including on Twitter), your and PV's objections
to using tag characters for a plain text font styling protocol simply
because they are tag characters carry zero weight.

Andrew

Re: Proposal for BiDi in terminal emulators

2019-01-29 Thread Adam Borowski via Unicode

On Tue, Jan 29, 2019 at 01:50:31PM +0100, Egmont Koblinger via Unicode wrote:
> Terminal emulators are a powerful tool used by many people for various
> tasks. Most terminal emulators' bugtracker has a request to add RTL /
> BiDi support.
[...]
> Some terminal emulators decided to run the BiDi algorithm for display
> purposes on its lines (rather than paragraphs, uh)

Even a line is way too big a piece to be safely reordered by the terminal.
What you propose will break every full-screen program that uses line-drawing
characters:

╒═══╤══╕
│ filename1 │  123 │
│ FILENAME2 │   17 │
└───┴──┘

would become:

╒═══╤══╕
│ filename1 │  123 │
│ 17   │ 2EMANELIF │
└───┴──┘

You can't even use character properties, because:

+===+==+
| filename1 |  123 |
| FILENAME2 |   17 |
+---+--+

or even, probably more popular:

  filename1123
  FILENAME2 17

I'm afraid there's no good way to do BiDi without support from individual
programs.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄

Re: Proposal for BiDi in terminal emulators

2019-01-29 Thread Eli Zaretskii via Unicode

> Date: Tue, 29 Jan 2019 13:50:31 +0100
> From: Egmont Koblinger via Unicode 
> 
> [1] https://terminal-wg.pages.freedesktop.org/bidi/

Interesting document, thanks for writing it.

My personal experience with bringing BiDi to Emacs led me to a firm
conclusion that BiDi support by terminal emulators cannot be relied on
by sophisticated text editing and display applications that are
BiDi-aware.  The terminal emulator can never be smart enough to do
what the editing needs require, so the application eventually ends up
jumping through hoops in order to trick the terminal into doing TRT.
It is easier to tell users to disable BiDi support of the terminal (if
it even has one), and do everything in the app.  This is the only way
of having full control of what is displayed, especially when
"higher-level protocols" need to be used to tailor the UBA to the need
of the user, because there's usually no way of asking the terminal to
apply a behavior which deviates from the UBA.

(If needed, I can provide examples of subtle problems with using BiDi
support of a terminal in BiDi-aware editing.  Not sure this will be
interesting to too many readers of this forum, though.)

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode

Martin J. Dürst wrote:

> Here's a little dirty secret about these tag characters: They were
> placed in one of the astral planes explicitly to make sure they'd use
> 4 bytes per tag character, and thus quite a few bytes for any actual
> complete tags. See https://tools.ietf.org/html/rfc2482 for details.
> Note that RFC 2482 has been obsoleted by
> https://tools.ietf.org/html/rfc6082, in parallel with a similar motion
> on the Unicode side.

I don't recall anyone mentioning Plane 14 language tags per se in this
thread. The tag characters themselves were un-deprecated to support
emoji flag sequences. But more on language tags in a moment.

> These tag characters were born only to shoot down an even worse
> proposal, https://tools.ietf.org/html/draft-ietf-acap-mlsf-01. For
> some additional background, please see
> https://tools.ietf.org/html/draft-ietf-acap-langtag-00.
>
> The overall tag proposal had the desired effect: The original proposal
> to hijack some unused bytes in UTF-8 was defeated, and the tags itself
> were not actually used and therefore could be depreciated.

I agree that the ACAP proposal was awful, for many reasons and on many
levels. But in general, introducing a new standardized mechanism SO THAT
it can be deprecated is a crummy idea. It engenders bad feelings and
distrust among loyal users of the standard. Major software vendors, one
in particular starting with M, have been castigated for decades for
employing tactics similar to this.

> Bad ideas turn up once every 10 or 20 years. It usually takes some
> time for some of the people to realize that they are bad ideas. But
> that doesn't make them any better when they turn up again.

The suggestions over the past three weeks to encode basic styling in
plain text (I'm not saying I'm for or against that) have some
similarities with Plane 14 language tags: many people consider both
types of information to be meta-information, unsuitable for plain text,
and many of the suggested mechanisms are stateful, which is an anti-goal
of Unicode. But these are NOT the same idea, and the fact that they both
use Plane 14 tag characters doesn't make them so.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode

Kent Karlsson wrote:

> We already have a well-established standard for doing this kind of
> things...

I thought we were having this discussion because none of the existing
methods, no matter how well documented, has been accepted on a
widespread basis as "the" standard.

Some people dislike markdown because it looks like lightweight markup
(which it is), not like actual italics and boldface. Some dislike ISO
6429 because escape characters are invisible and might interfere with
other protocols (though they really shouldn't). Some dislike math
alphanumerics abuse because it's abuse, doesn't cover other writing
systems, etc.

I'd be happy to work with Kent to campaign for ISO 6429 as "the"
well-established standard for applying simple styling to plain text, but
we would have to acknowledge the significant challenges.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode

Philippe Verdy replied to James Kass:
 
> You're not very explicit about the Tag encoding you use for these
> styles.
 
Of course, it was Andrew West who implemented the styling mechanism in a
beta release of BabelPad. James was just reporting on it.
 
> And what is then the interest compared to standard HTML
 
This entire discussion, for more than three weeks now, has been about
how to implement styling (e.g. italics) in plain text. Everyone knows it
can be done, and how to do it, in rich text.
 
> So you used "bold  U+E003E> I.e, you converted from ASCII to tag characters the full HTML
> sequences "" and "", including the HTML element name. I see
> little interest for that approach.
 
I thought we had established that someone had mentioned it on this list,
at some time during the past three weeks. Can someone look up what post
that was? I don't have time to go through scores of messages, and there
is no search facility.
 
I can't speak for Andrew, but I strongly suspect he implemented this as
a proof of concept, not to declare himself the Maker of Standards.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Proposal for BiDi in terminal emulators

2019-01-29 Thread Eli Zaretskii via Unicode

> Date: Tue, 29 Jan 2019 13:50:31 +0100
> From: Egmont Koblinger via Unicode 
> 
> In turn, vim, emacs and friends stand there clueless, not knowing
> how to do BiDi in terminals.

This is inaccurate: Emacs (at least the brand known as "GNU Emacs")
supports bidirectional editing in text terminals and GUI displays
alike, since Emacs 24.1, which was released in June 2012.  The latest
released version 26.1 supports all the latest changes in the UBA, up
to and including Unicode 10.0.

What Emacs version did you try which led you to the conclusion that
bidirectional editing and display in Emacs were not supported on text
terminals?

Proposal for BiDi in terminal emulators

2019-01-29 Thread Egmont Koblinger via Unicode

Hi,

Terminal emulators are a powerful tool used by many people for various
tasks. Most terminal emulators' bugtracker has a request to add RTL /
BiDi support. Unicode has supported BiDi for about 20 years now.
Still, the intersection of these two fields isn't solved. Even some
Unicode experts have stated over time that no one knows how to do it
properly.

The only documentation I could find (ECMA TR/53) predates the Unicode
BiDi algorithm, and as such no surprise that it doesn't follow the
current state of the art or best practices.

Some terminal emulators decided to run the BiDi algorithm for display
purposes on its lines (rather than paragraphs, uh), not seeing the big
picture that such a behavior turns them into a platform on top of
which it's literally impossible to implement proper BiDi-aware text
editing (vim, emacs, whatever) experience. In turn, vim, emacs and
friends stand there clueless, not knowing how to do BiDi in terminals.

With about 5 years of experience in terminal emulator development, and
some prior BiDi homepage developing experience with the kind mentoring
of one of the BiDi gurus (Aharon, if you're reading this, hi there!),
I decided to tackle this issue. I studied and evaluated the
aforementioned documentation and the behavior of such terminals,
pointed out the problems, and came up with a draft proposal.

My work isn't complete yet. One of the most important pending issues
is to figure out how to track BiDi control characters (e.g. which
character cells they belong to), it is to be addressed in a subsequent
version. But I sincerely hope I managed to get the basics right and
clean enough so that work can begin on implementing proper support in
terminal emulators as well as fullscreen text applications; and as we
gain experience and feedback, extending the spec to address the
missing bits too.

You can find this (draft) specification at [1]. Feedback is welcome –
if it's an actionable one then preferably over there in the project's
bugtracker.

[1] https://terminal-wg.pages.freedesktop.org/bidi/


cheers,
egmont (GNOME Terminal / VTE co-developer)

Re: Encoding italic

2019-01-29 Thread Martin J . Dürst via Unicode

On 2019/01/28 05:03, James Kass via Unicode wrote:
> 
> A new beta of BabelPad has been released which enables input, storing, 
> and display of italics, bold, strikethrough, and underline in plain-text 
> using the tag characters method described earlier in this thread.  This 
> enhancement is described in the release notes linked on this download page:
> 
> http://www.babelstone.co.uk/Software/index.html
>

I didn't say anything at the time this idea first came up, because I 
hoped people would understand that it was a bad idea.

Here's a little dirty secret about these tag characters: They were 
placed in one of the astral planes explicitly to make sure they'd use 4 
bytes per tag character, and thus quite a few bytes for any actual 
complete tags. See https://tools.ietf.org/html/rfc2482 for details. Note 
that RFC 2482 has been obsoleted by https://tools.ietf.org/html/rfc6082, 
in parallel with a similar motion on the Unicode side.

These tag characters were born only to shoot down an even worse 
proposal, https://tools.ietf.org/html/draft-ietf-acap-mlsf-01. For some 
additional background, please see 
https://tools.ietf.org/html/draft-ietf-acap-langtag-00.

The overall tag proposal had the desired effect: The original proposal 
to hijack some unused bytes in UTF-8 was defeated, and the tags itself 
were not actually used and therefore could be depreciated.

Bad ideas turn up once every 10 or 20 years. It usually takes some time 
for some of the people to realize that they are bad ideas. But that 
doesn't make them any better when they turn up again.

Regards,   Martin.

Re: Encoding italic

2019-01-29 Thread Martin J . Dürst via Unicode

On 2019/01/24 23:49, Andrew West via Unicode wrote:
> On Thu, 24 Jan 2019 at 13:59, James Kass via Unicode
>  wrote:

> We were told time and time again when emoji were first proposed that
> they were required for encoding for interoperability with Japanese
> telecoms whose usage had spilled over to the internet. At that time
> there was no suggestion that encoding emoji was anything other than a
> one-off solution to a specific problem with PUA usage by different
> vendors, and I at least had no idea that emoji encoding would become a
> constant stream with an annual quota of 60+ fast-tracked
> user-suggested novelties. Maybe that was the hidden agenda, and I was
> just naïve.

I don't think this was a hidden agenda. Nobody in the US or Europe 
thought that emoji would catch on like they did, with ordinary people 
and the press. Of course they had been popular in Japan, that's why the 
got into Unicode.

> The ESC and UTC do an appallingly bad job at regulating emoji, and I
> would like to see the Emoji Subcommittee disbanded, and decisions on
> new emoji taken away from the UTC, and handed over to a consortium or
> committee of vendors who would be given a dedicated vendor-use emoji
> plane to play with (kinda like a PUA plane with pre-assigned
> characters with algorithmic names [VENDOR-ASSIGNED EMOJI X] which
> the vendors can then associate with glyphs as they see fit; and as
> emoji seem to evolve over time they would be free to modify and
> reassign glyphs as they like because the Unicode Standard would not
> define the meaning or glyph for any characters in this plane).

To a small extent, that already happens. The example I'm thinking about 
is the transition from a (potentially bullet-carrying) pistol to a 
waterpistol. The Unicode consortium doesn't define the meaning of any of 
it's characters, and doesn't define stardard glyphs for characters, just 
example glyphs. Another example is a presenter at a conference who was 
using lots of emoji saying that he will need to redo his presentation 
because the vendor of his notebook's OS was in the process of changing 
their emoji designs.

Regards,Martin.

Re: Encoding italic

2019-01-28 Thread Phake Nick via Unicode

2019-1-25 13:46, Garth Wallace via Unicode  wrote:

>
> On Wed, Jan 23, 2019 at 1:27 AM James Kass via Unicode <
> unicode@unicode.org> wrote:
>
>>
>> Nobody has really addressed Andrew West's suggestion about using the tag
>> characters.
>>
>> It seems conformant, unobtrusive, requiring no official sanction, and
>> could be supported by third-partiers in the absence of corporate
>> interest if deemed desirable.
>>
>> One argument against it might be:  Whoa, that's just HTML.  Why not just
>> use HTML?  SMH
>>
>> One argument for it might be:  Whoa, that's just HTML!  Most everybody
>> already knows about HTML, so a simple subset of HTML would be
>> recognizable.
>>
>> After revisiting the concept, it does seem elegant and workable. It
>> would provide support for elements of writing in plain-text for anyone
>> desiring it, enabling essential (or frivolous) preservation of
>> editorial/authorial intentions in plain-text.
>>
>> Am I missing something?  (Please be kind if replying.)
>>
>
> There is also RFC 1896 "enriched text", which is an attempt at a
> lightweight HTML substitute for styling in email. But these, and the ANSI
> escape code suggestion, seem like they're trying to solve the wrong problem
> here.
>
> Here's how I understand the situation:
> * Some people using forms of text or mostly-text communication that do not
> provide styling features want to use styling, for emphasis or personal flair
> * Some of these people caught on to the existence of the "styled"
> mathematical alphanumerics and, not caring that this is "wrong", started
> using them as a workaround
> * The use of these symbols, which are not technically equivalent to basic
> Latin, make posts inaccessible to screen readers, among other problems
>
> These are suggestions for Unicode to provide a different, more
> "acceptable" workaround for a lack of functionality in these social media
> systems (this mostly seems to be an issue with Twitter; IME this shows up
> much less on Facebook). But the root problem isn't the kludge, it's the
> lack of functionality in these systems: if Twitter etc. simply implemented
> some styling on their own, the whole thing would be a moot point.
> Essentially, this is trying to add features to Twitter without waiting for
> their development team.
>
> Interoperability is not an issue, since in modern computers copying and
> pasting styled text between apps works just fine.
>

How about outside social media system? For example, Chinese Braille have
symbols that indicate the start and end position of proper name mark and
book name mark punctuation, however when converted to plain text they
cannot be displayed with Unicode text because of the mindset that it should
be the task of styling software to render this punctuation, just because
the two punctuations are basically straight underline and wavy underline
beneath text in normal Chinese text.

>

Re: Encoding italic

2019-01-28 Thread Phake Nick via Unicode

Gmail can do *Märchen* although I am not too sure about how they transmit
such formatting and not sure about how interoperatable are they.

在 2019年1月22日週二 14:43，Adam Borowski via Unicode  寫道：

> On Mon, Jan 21, 2019 at 12:29:42AM -0800, David Starner via Unicode wrote:
> > On Sun, Jan 20, 2019 at 11:53 PM James Kass via Unicode
> >  wrote:
> > >  Even though /we/ know how to do
> > > it and have software installed to help us do it.
> >
> > You're emailing from Gmail, which has support for italics in email.
>
> ... and how exactly can they send italics in an e-mail?  All they can do is
> to bundle a web page as an attachment, which some clients display instead
> of
> the main text.
>
> The e-mail's body text supports anything Unicode does, including
> 𝑖𝑡𝑎𝑙𝑖𝑐 and
> even 𐌏𐌋𐌃 𐌉𐌕𐌀𐌋𐌉𐌂, but, remarkably, not italic umlauted characters,
> thai nor
> han.
>

Re: Ancient Greek apostrophe marking elision

2019-01-28 Thread James Tauber via Unicode

On Mon, Jan 28, 2019 at 10:58 PM James Kass via Unicode 
wrote:

>
> On 2019-01-29 1:55 AM, Mark E. Shoulson via Unicode wrote:
> > I guess "Suck it up and deal with it."  And that may indeed be the
> answer.
>
> It would certainly make for shorter and simpler FAQ pages, anyway.
>

Except people will just respond with "okay, I'll use U+02BC instead" which
is what started all this :-)

James
-- 
*James Tauber*
Eldarion <https://eldarion.com/> | jktauber.com (Greek Linguistics)
<https://jktauber.com/> | Modelling Music
<https://modelling-music.com/> | Digital
Tolkien <https://digitaltolkien.com/>

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 3287 matches

Mail list logo