Re: Proposal for BiDi in terminal emulators
On Fri, 01 Feb 2019 15:18:13 -0700 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > > Language tagging is already available in Unicode, via the tag > > characters in the deprecated plane. > > Plane 14 isn't deprecated -- that isn't a property of planes -- and > the tag characters U+E0020 through U+E007E have been un-deprecated > for use with emoji flags. Only U+E0001 LANGUAGE TAG and U+E007F > CANCEL TAG are deprecated. Unicode may not deprecate the tag characters, but the characters of Plane 14 are widely deplored, despised or abhorred. That is why I think of it as the deprecated plane. Richard.
Re: Proposal for BiDi in terminal emulators
On Fri, Feb 01, 2019 at 06:57:43PM +, Richard Wordingham via Unicode wrote: > On Fri, 1 Feb 2019 13:02:45 +0200 > Khaled Hosny via Unicode wrote: > > > On Thu, Jan 31, 2019 at 11:17:19PM +, Richard Wordingham via > > Unicode wrote: > > > On Thu, 31 Jan 2019 12:46:48 +0100 > > > Egmont Koblinger wrote: > > > > > > No. How many cells do CJK ideographs occupy? We've had a strong > > > hint that a medial BEH should occupy one cell, while an isolated > > > BEH should occupy two. > > > > Monospaced Arabic fonts (there are not that many of them) are designed > > so that all forms occupy just one cell (most even including the > > mandatory lam-alef ligatures), unlike CJK fonts. > > > > I can imagine the terminal restricting itself to monspaced fonts, > > disable “liga” feature just in case, and expect the font to well > > behave. Any other magic is likely to fail. > > Of course, strictly speaking, a monospaced font cannot support harakat > as Egmont has proposed. There are two approaches for handling them in monospaced fonts; combining them with base characters as usual, or as spacing characters placed next to their bases. The later approach is a bit unusual, but makes editing heavily voweled text a bit more pleasant. It requires good OpenType support, though, so virtually no terminal supports it. Regards, Khaled
Re: Proposal for BiDi in terminal emulators
Den 2019-02-01 19:57, skrev "Richard Wordingham via Unicode" : > On Fri, 1 Feb 2019 13:02:45 +0200 > Khaled Hosny via Unicode wrote: > >> On Thu, Jan 31, 2019 at 11:17:19PM +0000, Richard Wordingham via >> Unicode wrote: >>> On Thu, 31 Jan 2019 12:46:48 +0100 >>> Egmont Koblinger wrote: >>> >>> No. How many cells do CJK ideographs occupy? We've had a strong >>> hint that a medial BEH should occupy one cell, while an isolated >>> BEH should occupy two. >> >> Monospaced Arabic fonts (there are not that many of them) are designed >> so that all forms occupy just one cell (most even including the >> mandatory lam-alef ligatures), unlike CJK fonts. >> >> I can imagine the terminal restricting itself to monspaced fonts, >> disable ³liga² feature just in case, and expect the font to well >> behave. Any other magic is likely to fail. > > Of course, strictly speaking, a monospaced font cannot support harakat > as Egmont has proposed. > > Richard. (harakat: non-spacing vowel mark in Arabic) "Monospaced font" is really a concept with modification. Even for "plain old ASCII" there are two advance widths, not just one: 0 for control characters (and escape/control sequences, neither of which should directly consult the font; even such things as OSC sequences, but the latter are a bad idea to have in any line one might wish to edit (vi/emacs/...) via a terminal emulator window). But terminals (read terminal emulators) can deal with mixed single width and double width characters (which is, IIUC, the motivation for the datafile EastAsianWidth.txt). Likewise non-spacing combining characters should be possible to deal reasonably with. It is a lot more difficult to deal with BiDi in a terminal emulator, also shaping may be hard to do, as well as reordering (or even splitting) combining characters. All sorts of problems arise; feeding the emulator a character (or "short" strings) at a time not allowed to buffer for display (causing reshaping or movement of already displayed characters, edit position movement even within a single line, etc.). Even if solvable for a "GUI" text editor (not via a terminal), they do not seem to be workable in a terminal (emulator) setting. Esp. not if one also wants to support multiline editing (vi/emacs/...) or even single-line editing. As long as editing is limited to a single line (such as the system line editor, or an "enhanced functionality" line editor (such as that used for bash; moving in the history sets the edit position at EOL) even variable width ("proportional) fonts should not pose a major problem. But for multiline editors (à la vi/emacs) it would not be possible to synch nicely (unless one accepts strange jums) the visual edit position and the actual edit position in the edit buffer: The program would not have access to the advance width data from the font that the terminal emulator uses, unless one revolutionise what terminal emulators do... (And I don't see a case for doing that.) But both a terminal emulator and multiline editing programs (for terminal emulators) still can have access to EastAsianWidth data as well as which characters are non-spacing; those are not font dependent. (There might be some glitches if the Unicode versions used do not match (the terminal emulator and the program being run are most often on different systems), but only for characters where these properties have changed, e.g. newly allocated non-spacing marks.) /Kent K PS No, I have not done extensive testing of various terminal emulators on how well the handle the stuff above.
Re: Proposal for BiDi in terminal emulators
Richard, On 2/1/2019 1:30 PM, Richard Wordingham via Unicode wrote: Language tagging is already available in Unicode, via the tag characters in the deprecated plane. Recte: 1. Plane 14 is not a "deprecated plane". 2. The tag characters in Tag Character block (U+E..U+E007F) are not deprecated. (They are used, for example, by UTS #51 to specify emoji tag sequences.) 3. However, the use of U+E0001 LANGUAGE TAG and the mechanism of using tag characters for spelling out language tags are explicitly deprecated by the standard. See: "Deprecated Use for Language Tagging" in Section 23.9 Tag Characters. https://www.unicode.org/versions/Unicode11.0.0/ch23.pdf#G30427 and PropList.txt: E0001 ; Deprecated # Cf LANGUAGE TAG As I stated earlier: language tags should use BCP 47, and belong in the markup level, not in the plain text stream. --Ken
Re: Proposal for BiDi in terminal emulators
On Fri, 1 Feb 2019 at 22:20, Doug Ewell via Unicode wrote: > > Richard Wordingham wrote: > > > Language tagging is already available in Unicode, via the tag > > characters in the deprecated plane. > > Plane 14 isn't deprecated -- that isn't a property of planes -- and the > tag characters U+E0020 through U+E007E have been un-deprecated for use > with emoji flags. Only U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are > deprecated. Cancel Tag is not deprecated any longer either (http://www.unicode.org/Public/UNIDATA/PropList.txt). Andrew
Re: Proposal for BiDi in terminal emulators
Richard Wordingham wrote: > Language tagging is already available in Unicode, via the tag > characters in the deprecated plane. Plane 14 isn't deprecated -- that isn't a property of planes -- and the tag characters U+E0020 through U+E007E have been un-deprecated for use with emoji flags. Only U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are deprecated. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Proposal for BiDi in terminal emulators
On Fri, 1 Feb 2019 14:47:22 +0100 Egmont Koblinger via Unicode wrote: > Hi Ken, > > > [language tag] > > That is a complete non-starter for the Unicode Standard. > > Thanks for your input! > > (I hope it was clear that I just started throwing in random ideas, as > in a brainstorming session. This one is ruled out, then.) Language tagging is already available in Unicode, via the tag characters in the deprecated plane. Richard.
Re: Proposal for BiDi in terminal emulators
On Fri, 1 Feb 2019 13:02:45 +0200 Khaled Hosny via Unicode wrote: > On Thu, Jan 31, 2019 at 11:17:19PM +, Richard Wordingham via > Unicode wrote: > > On Thu, 31 Jan 2019 12:46:48 +0100 > > Egmont Koblinger wrote: > > > > No. How many cells do CJK ideographs occupy? We've had a strong > > hint that a medial BEH should occupy one cell, while an isolated > > BEH should occupy two. > > Monospaced Arabic fonts (there are not that many of them) are designed > so that all forms occupy just one cell (most even including the > mandatory lam-alef ligatures), unlike CJK fonts. > > I can imagine the terminal restricting itself to monspaced fonts, > disable “liga” feature just in case, and expect the font to well > behave. Any other magic is likely to fail. Of course, strictly speaking, a monospaced font cannot support harakat as Egmont has proposed. Richard.
Re: Encoding italic
the proposal would contradict the goals of variation selectors and would pollute ther variation sequences registry (possibly even creating conflicts). And if we admit it for italics, than another VSn will be dedicated to bold, and another for monospace, and finally many would follow for various style modifiers. Finally we would no longer have enough variation selectors for all requests). And what we would have made was only trying to reproduce another existing styling standard, but very inefficiently (and this use wil be "abused" for all usages, creating new implementation constraints and contradicting goals with existing styling languages: they would then decide to make these characters incompatible for use in conforming applications. The Unicode encoding would have lost all its interest. I do not support the idea of encoding generic styles (applicable to more than 100k+ existing characters) using variation selectors. Their goal is only to allow semantic distinctions when two glyphs were unified in one language may occasionnaly (not always) have some significance in specific languages. But what you propose would apply to all languages, all scripts, and would definitely reserve some the the few existing VSn for this styling use, blocking further registration of needed distinctions (VSn characters are notably needed for sinographic scripts to properly represent toponyms or person names, or to solve some problems existing with generic character properties in Unicode that cannot be changed because of stability rules). Le jeu. 31 janv. 2019 à 16:32, wjgo_10...@btinternet.com via Unicode < unicode@unicode.org> a écrit : > Is the way to try to resolve this for a proposal document to be produced > for using Variation Selector 14 in order to produce italics and for the > proposal document to be submitted to the Unicode Technical Committee? > > If the proposal is allowed to go to the committee rather than being > ruled out of scope, then we can know whether the Unicode Technical > Committee will allow the encoding. > > William Overington > > Thursday 31 January 2019 > >
Re: Proposal for BiDi in terminal emulators
Hi, I'm trying to respond to every question, but I'm having a hard time keeping up :-) Thanks a lot for all the precious input about shaping! Here's my suggestion, for version 0.2 of the recommendation: - No longer encourage any use of presentation form characters. - State that it's the terminal emulator's task to perform shaping, both in implicit and explicit modes. - Leave it for a future enhancement to handle trickier cases in explicit mode, such as shaping of a word that's only partially visible, or prevent shaping when two words happen to touch each other and are visually separated by other means (e.g. background color). Leave it for further research whether we could use ZWJ/ZWNJ here, whether we could use ECMA's SAPV 5-8 & 21-11, or whether we should invent something new (perhaps even telling the terminal emulator what neighboring previous/next characters to imagine there for the purpose of shaping)... Let me know if you have any remaining problems/concerns/etc. As for the implementation in VTE: initially I'll still use presentation form characters, solely because that's a low hanging fruit approach (low investment, high gain). I've already implemented it in about an hour (a bit of further hacks will be necessary to extend it to explicit mode, but still easily doable), whereas switching to HarfBuzz is expected to take weeks of heavy work. We'll tackle that in a subsequent version. And if anyone's happy to help, there's already some bounty for harfbuzz support :) Thanks again for the great guidance! cheers, egmont On Tue, Jan 29, 2019 at 1:50 PM Egmont Koblinger wrote: > > Hi, > > Terminal emulators are a powerful tool used by many people for various > tasks. Most terminal emulators' bugtracker has a request to add RTL / > BiDi support. Unicode has supported BiDi for about 20 years now. > Still, the intersection of these two fields isn't solved. Even some > Unicode experts have stated over time that no one knows how to do it > properly. > > The only documentation I could find (ECMA TR/53) predates the Unicode > BiDi algorithm, and as such no surprise that it doesn't follow the > current state of the art or best practices. > > Some terminal emulators decided to run the BiDi algorithm for display > purposes on its lines (rather than paragraphs, uh), not seeing the big > picture that such a behavior turns them into a platform on top of > which it's literally impossible to implement proper BiDi-aware text > editing (vim, emacs, whatever) experience. In turn, vim, emacs and > friends stand there clueless, not knowing how to do BiDi in terminals. > > With about 5 years of experience in terminal emulator development, and > some prior BiDi homepage developing experience with the kind mentoring > of one of the BiDi gurus (Aharon, if you're reading this, hi there!), > I decided to tackle this issue. I studied and evaluated the > aforementioned documentation and the behavior of such terminals, > pointed out the problems, and came up with a draft proposal. > > My work isn't complete yet. One of the most important pending issues > is to figure out how to track BiDi control characters (e.g. which > character cells they belong to), it is to be addressed in a subsequent > version. But I sincerely hope I managed to get the basics right and > clean enough so that work can begin on implementing proper support in > terminal emulators as well as fullscreen text applications; and as we > gain experience and feedback, extending the spec to address the > missing bits too. > > You can find this (draft) specification at [1]. Feedback is welcome – > if it's an actionable one then preferably over there in the project's > bugtracker. > > [1] https://terminal-wg.pages.freedesktop.org/bidi/ > > > cheers, > egmont (GNOME Terminal / VTE co-developer)
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Fri, 1 Feb 2019 14:35:35 +0100 > Cc: Frédéric Grosshans , > unicode@unicode.org > > > You could do that, but it will require a lot of non-trivial processing > > from the applications. Text-mode applications don't want any complex > > tinkering, they want just to write their text and be done. The more > > overhead you add to that simple task, the less probable it is that > > applications will support such a terminal. > > I agree with your overall observation, but I'm not sure how much it > applies to this context. > > Text-mode applications have to run the BiDi algorithm. The one I > picked can also do shaping (well, the pretty limited one, using > presentation forms). Reordering and shaping have different requirements. Reordering can be done based only on the codepoints, whereas shaping needs also intimate knowledge of the fonts being used. The former can be done by a text-mode application, the latter cannot, not anywhere close to what readers of the respective scripts would expect.
Re: Proposal for BiDi in terminal emulators
Hi Richard, On Fri, Feb 1, 2019 at 12:19 AM Richard Wordingham via Unicode wrote: > Cropped why? If the problem is the truncation of lines, one can simple > store the next character. Yup, trancation of line for example. I agree that one could "store the next character". We could extend the terminal emulation protocol where by some means you can specify that column 80 contains a letter X, and even though there's no column 81, an app can still tell the terminal emulator that it should imagine that column 81 contans the letter Y, and perform shaping accordingly. This will need to be done not just at the end of the terminal, but at any position, and for both directions. Think of e.g. a vertically split tmux. You should be able to tell that column 40 contains X which should be shaped as if column 41 contained Y, and column 41 contains Z which should be shaped as if column 40 contained A. What I canont see at all is how this could be "simply". Could you please elaborate on that? I don't find this simple at all! >> > It's not able to > > separate different UI elements that happen to be adjacent in the > > terminal, separated by different background color or such. > > ZWJ and ZWNJ can handle that. Wouldn't it be a semantical misuse of these characters, though? They are supposed to be present in the logical order, and in logical order (that is: the terminal's implicit mode) they can work as desired. Are they okay to be present in visual order (the terminal's explicit mode, what we're discussing now) too? Anyway, ZWJ/ZWNJ aren't sufficient to handle the cases I outlined above. > If a general text manipulating application, e.g. cat, grep or awk, is > writing to a file, it should not convert normal Arabic characters to > presentation forms. You are now asking a general application to > determine whether it is writing to a terminal or not, and alter its > output if it is writing to a terminal. No, this absolutely not what I'm talking about! There are two vastly different modes of the terminal. For "cat", "grep" etc. the terminal will be in implicit mode. Absolutely no BiDi handling is expected from these apps, the terminal will do BiDi and shaping (perhaps using Harfbuzz; perhaps using presentation form characters as a temporarily low hanging fruit until a better one is implemented – the choice is obviously up to the implementation and not to the specification). For "emacs" and friends, an explicit mode is required where visual order is passed to the terminal. What we're discussing is how to handle shaping in this mode. > But it as an issue that needs to be addressed. As a terminal can be > addressed by cell, an application may need to keep track of what text > went into each cell. Misery results when the application gets it wrong. My recommendation doesn't change this principle at all. In the lower (emulation) layer every character still goes into the cell it used to go to, and is addressable using cursor motion escapes and so on exactly as without BiDi. > How many cells do CJK ideographs occupy? We've had a strong hint > that a medial BEH should occupy one cell, while an isolated BEH should > occupy two. CJK occupy two, but they do regardless of what's around them. That is, they already occupy two cells in the logical buffers, in the emulation layer. There is absolutely no sane way we can make in terminal emulation a character's logical width (as in number of cells it occupies) depend on its neighboring characters. (And even if we could by some terrible hacks, it would break the principle you just said as "misery results...", and the principle Eli said that things should remain reasonably simple, otherwise hardly anyone will bother implementing them.) This is a compromise Arabic folks will have to accept. When displayed, it's up for terminal emulators to perhaps enwiden/shrink cells as it wants to (they might even totally give up on monospace fonts), but then they'll risk vertical lines not aligning up perfectly vertically, content overflowing on the right etc. Konsole does such things. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Ken, > [language tag] > That is a complete non-starter for the Unicode Standard. Thanks for your input! (I hope it was clear that I just started throwing in random ideas, as in a brainstorming session. This one is ruled out, then.) cheers, egmont
Re: Proposal for BiDi in terminal emulators
On Thu, Jan 31, 2019 at 4:26 PM Eli Zaretskii wrote: > > Yes, I do argue that emacs will need to print a new escape sequence. > > Which is much-much-much-much-much better than having to tell users to > > go into the settings of their macOS Terminal / Konsole / > > gnome-terminal etc. and disable BiDi there, isn't it? > > I'm not sure I agree. Most users can disable bidi reordering of the > terminal once and for all. They don't need it. What users are we talking about? Those who don't need BiDi ever at all? Everything is already perfect for them! They should't care about the "enable BiDi" settings of their terminal, either value will result in the same, correct behavior for them. Or do we talk about users who care about BiDi inside Emacs, but don't care about BiDi when echo'ing, cat'ing...? Do such users exist? Well, even if they do, they're not the only target of my work. Remember: My proposal aims to address both the Emacs as well as the echo/cat/... use cases. These are substantially different use cases that require the terminal emulator to be in a different mode, and thus automatic switching between the two modes has to be solved. cheers, egmont
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Fri, 1 Feb 2019 14:16:03 +0100 > Cc: Adam Borowski , unicode@unicode.org > > There's absolutely no way we could reorder first, and then handle > TAB's cursor movement. TAB's cursor movement happens in the lower > layer, reordering happens in the upper one. But that means you won't ever be able to be in compliance with UAX#9, because TAB has distinct properties that affect the UBA. If you reorder after all TABs have been converted to spaces, you will not be able to implement the support for Segment Separator characters. Am I missing something?
Re: Proposal for BiDi in terminal emulators
Hi, On Thu, Jan 31, 2019 at 4:14 PM Eli Zaretskii wrote:> > I suggest that you show the result to someone who does read Arabic. I contacted one guy who is pretty knowledgeable in Arabic scripts, as well as terminal emulation, I sent out an early unpublished version of the proposal to him, but unfortunately he was busy and didn't have the chance to respond. Let this thread be one where we invite Arabic folks to comment :) > Small changes can be very unpleasant to the eyes of an Arabic reader. I can easily imagine that! I can assure you, seeing õ instead of ő in my native language is extremely unpleasant to my eyes. Depending on the font you're using, you may not even have spotted any difference. But could someone argue for example that seeing an "i" and "w" equally wide is unpleasant to their eyes? Where do we draw the lines of what's an acceptable compromise on a platform that has technical limitations (fixed grid) to begin with? We really need input from Arabic folks to answer this. I'm also wondering: how unpleasant it is if a letter is cut in half (e.g. overflows at the edge of the text editor), and is shaped not according to the entire word but according to the visible part? I took it from the CSS specification that the desired behavior is to shape it according to the entire word, but I honestly don't know how acceptable or how unpleasant the other approach is. > You could do that, but it will require a lot of non-trivial processing > from the applications. Text-mode applications don't want any complex > tinkering, they want just to write their text and be done. The more > overhead you add to that simple task, the less probable it is that > applications will support such a terminal. I agree with your overall observation, but I'm not sure how much it applies to this context. Text-mode applications have to run the BiDi algorithm. The one I picked can also do shaping (well, the pretty limited one, using presentation forms). Shouldn't any BiDi algorithm also provide methods for shaping that produce some output that can be easily sent to the terminals? Shouldn't we push for them? As far as I imagine the ideal solution, doing this part of shaping shouldn't be any harder for apps than doing BiDi, basically all they would need to do is hook up to existing API methods. Of course, given the current APIs, it's probably really not this simple. cheers, egmont
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Fri, 1 Feb 2019 13:54:02 +0100 > Cc: Adam Borowski , unicode@unicode.org > > For this behavior, the only feature you need from a terminal emulator > is to have a mode where it doesn't shuffle the characters. Currently > every emulator I'm aware of has such a mode, although in some of them > you have to tweak the settings to get to this mode (in my firm opinion > it's an unacceptable user experience), while in emulators according to > my specification there'll be an escape sequence for text-mode apps to > automatically switch to this mode. Like I said, as long as not every emulator supports this control, an application will need to detect its support, and that in itself is a complication. > > This is indeed a significant issue, because it means applications > > cannot force the terminal use a certain non-default base paragraph > > direction. > > They can, since there's a dedicated escape sequence (SCP) for setting > the base paragraph. Does this change the base direction globally for the whole screen, or only for the current text? The latter is what's needed. And again, just detecting whether this is supported is a complication. Emitting LRM or RLM as needed is much easier.
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Fri, 1 Feb 2019 13:40:48 +0100 > Cc: unicode@unicode.org > > I now understand that presentation forms isn't an ideal possible > approach, and the recommendation should be improved here. > > Until it happens, I'm uncertain whether using presentation form > characters is a decent low hanging fruit that significantly improves > the readability in some situations (e.g. "good enough" in some sense > for Arabic), or is a dead end we shouldn't propagate. IMNSHO, you shouldn't try solving this problem on your own. Instead, use a shaping engine, such as HarfBuzz, to do that for you, since the emulator does know which fonts it uses, and can access their properties. The only problem a terminal emulator does need to solve in this regard is what to do when N codepoints yield M /= N glyphs that the shaping engine tells you to emit, or, more generally, when the width on display after shaping is different from N times the character cell width. > I still do not agree however that the entire responsibility can be > shifted to the emulator. There are certain important bits of > information that are only available to the application, and not the > emulator – as with many other aspects, such as reordering, > copy-pasting, searching in the data in BiDi-aware text editors using > the terminal's explicit mode, which are all pushed to the application > because the emulator cannot do them correctly. As soon as you attempt to target applications that move cursor and use cursor addressing, you are in trouble, and should IMO refrain from trying to support such applications. For example, Emacs doesn't even write whole lines to the screen, it compares the internal representation of what's on the screen and what should be there, and only emits the parts that should be modified. (It does that to minimize screen writes, which might be expensive, especially if writing to a remote terminal.) In such cases, the emulator doesn't stand a chance of doing TRT, because the application doesn't provide enough context for it to reorder text correctly. So I don't think a bidi-aware terminal emulator can support any application more complex than those which write full lines to the terminal, like 'cat', 'sed', 'diff', 'grep', etc. > I believe we should further study the situation, e.g. see whether > ECMA-48's SAPV (8.3.18) parameters 5..8 (to explicitly specify whether > to use isolated/initial/medial/final form for each character) are > flexible enough to convey all this information, or perhaps a new, more > powerful means should be crafted. Once again, I think it's impractical to expect applications to emit these controls. The emulator must do this part of the job.
Re: Proposal for BiDi in terminal emulators
Hi, On Thu, Jan 31, 2019 at 4:10 PM Eli Zaretskii wrote: > The reordering happens before TABs are converted to cursor motion, > does it not? No, not at all. You cannot "mix" handling the input and reordering, since the input is not available as a single step but arrives continuously in a stream. Consider a heavy BiDi text such as (I'm making up some random gibberish, uppercase being RTL): foo BAR FSI BAz quUX 1234 PDI whatEVer Someone prints it to the terminal, but due to the internals, the terminal doesn't receive this in one single step but in two consecutive ones, broken in the middle. Maybe the app split it in half (e.g. a shell script printed fragments one by one using printf without a trailing newline). Maybe the emitter is a "dd" printing blocks of let's say 4kB and this line happens to cross a boundary. Maybe a transport layer such as ssh split it for whatever reason. Then would you take the first half of this text, let's say foo BAR FSI BAz quU even with unbalanced BiDi controls, then reorder it, and continue from it? Continue how? How to remember to reorder the second half too, but not the first half once again in order to avoid "double BiDi"? What to do with explicit cursor movement, would they jump to the visual positon? This would break absolutely basic principles, e.g. jumping twice to the same location to overwrite a letter twice in a row may actually end up overwriting two different letters, since everything was potentially rearranged after the first overwrite happened? Any application having any existing preconception about cursor movement would uncontrollably fall apart. This approach is doomed to fail big time (and was the reason I had to drop ECMA TR/53's DCSM "presentation" mode). The only reasonable way is if you have two layers. The bottom layer does the emulation almost exactly as it used to do, with no BiDi whatsoever (except for tiny additions, e.g. it tracks BiDi-related properties such as the paragraph direction). The upper layer displays the data, and this upper layer performs BiDi solely for display purposes: using the lower layer's data as input, but not modifying it. This is, by the way, also what current emulators that shuffle the characters arond do. Let's also mention that the lower layer (emulation) should be as fast as possible. e.g. VTE can handle input in the ballpark of 10MB/s. Reordering, that is, running BiDi for display purposes needs to happen much more rarely, maybe 20-60 times per second. It would be a performance killer having to run the BiDi algorithm upon every received chunk of data – in fact, to eliminate any possible behavior difference due to timing difference, it'd need to happen after every printable character received. There's absolutely no way we could reorder first, and then handle TAB's cursor movement. TAB's cursor movement happens in the lower layer, reordering happens in the upper one. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Eli, > So we will some day have one such terminal emulator. That's good, but > a text-mode application that needs to support bidi cannot rely on its > users all having access to that single terminal. No. A text-mode application that needs to support BiDi must do the BiDi itself and pass visual order to the emulator, and beforehand switch the emulator to explicit mode so that you don't end up with "double BiDi". Once you emit visual order, there's no need for any BiDi control characters. For this behavior, the only feature you need from a terminal emulator is to have a mode where it doesn't shuffle the characters. Currently every emulator I'm aware of has such a mode, although in some of them you have to tweak the settings to get to this mode (in my firm opinion it's an unacceptable user experience), while in emulators according to my specification there'll be an escape sequence for text-mode apps to automatically switch to this mode. What BiDi control characters (LRE, LRI, FSI etc.) in implicit mode will give you – if supported – is that you'll be able to execute "cat file", and it'll be displayed correctly, even taking FSI and friends as present in the file into account. Of course this will only work in terminal emulators that support this. > This is indeed a significant issue, because it means applications > cannot force the terminal use a certain non-default base paragraph > direction. They can, since there's a dedicated escape sequence (SCP) for setting the base paragraph. That being said, not being able to remember FSI at the beginning of a string is indeed a significant issue, we agree on this. We just need to figure out how to alter the emulation behavior to remember them, which I find the next big step to address in the specification. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Eli, > Arabic presentation forms are more like an exception than a rule, I > hope you understand this by now. Most languages/scripts don't have > such forms, and even for Arabic they cover only a part of what needs > to be done to present correctly shaped text. Complex script shaping > is much more than just substituting some glyphs with others, it > requires an intimate knowledge of the font being used and its > capabilities, and the ability to control how various glyphs of a > grapheme cluster are placed relative to one another, something that an > application running on a text terminal cannot do. > > So I suggest that you don't consider Arabic presentation forms a > representative of the direction in which terminal emulators supporting > such scripts should evolve. Thanks a lot for this information! I now understand that presentation forms isn't an ideal possible approach, and the recommendation should be improved here. Until it happens, I'm uncertain whether using presentation form characters is a decent low hanging fruit that significantly improves the readability in some situations (e.g. "good enough" in some sense for Arabic), or is a dead end we shouldn't propagate. I still do not agree however that the entire responsibility can be shifted to the emulator. There are certain important bits of information that are only available to the application, and not the emulator – as with many other aspects, such as reordering, copy-pasting, searching in the data in BiDi-aware text editors using the terminal's explicit mode, which are all pushed to the application because the emulator cannot do them correctly. I believe we should further study the situation, e.g. see whether ECMA-48's SAPV (8.3.18) parameters 5..8 (to explicitly specify whether to use isolated/initial/medial/final form for each character) are flexible enough to convey all this information, or perhaps a new, more powerful means should be crafted. At this point I lack sufficient knowledge to fix the design, I'd need to spend a lot of time studying the situation and/or working together with you guys, if you're up for it. Thanks a lot, egmont
Re: Proposal for BiDi in terminal emulators
On Thu, Jan 31, 2019 at 11:17:19PM +, Richard Wordingham via Unicode wrote: > On Thu, 31 Jan 2019 12:46:48 +0100 > Egmont Koblinger wrote: > > No. How many cells do CJK ideographs occupy? We've had a strong hint > that a medial BEH should occupy one cell, while an isolated BEH should > occupy two. Monospaced Arabic fonts (there are not that many of them) are designed so that all forms occupy just one cell (most even including the mandatory lam-alef ligatures), unlike CJK fonts. I can imagine the terminal restricting itself to monspaced fonts, disable “liga” feature just in case, and expect the font to well behave. Any other magic is likely to fail. Regards, Khaled
Re: Encoding italic
On 2019-01-31 3:18 PM, Adam Borowski via Unicode wrote: > They're only from a spammer's point of view. Spammers need love, too. They’re just not entitled to any.
Re: Proposal for BiDi in terminal emulators
> Date: Thu, 31 Jan 2019 23:17:19 + > From: Richard Wordingham via Unicode > > Emacs needs a lot of help - I can't write a generic Tai Tham > OpenType .flt file :-( Which is why Emacs is migrating towards HarfBuzz.
Re: Proposal for BiDi in terminal emulators
On Thu, 31 Jan 2019 12:46:48 +0100 Egmont Koblinger wrote: > Hi Richard, > > > Basic Arabic shaping, at the level of a typewriter, is > > straightforward enough to leave to a terminal emulator, as Eli has > > suggested. > > What is "basic" Arabic shaping exactly? Just using initial, medial and final forms, with no vertical stacking, In terms of glyphs, none of glyphs of the presentation forms with 'LIGATURE' in the name would be used. > I can see problems with leaving it to a terminal. It's not aware of > the neighboring character if the string is cropped. Cropped why? If the problem is the truncation of lines, one can simple store the next character. > It's not able to > separate different UI elements that happen to be adjacent in the > terminal, separated by different background color or such. ZWJ and ZWNJ can handle that. > On the other hand, let's reverse the question: > > "Basic Arabic shaping, at the level of a typewriter, is > straightforward enough to be implemented in the application, using > presentation form characters, as I suggest". Could you please point > out the problems with this statement? Apart from using presentation form characters in raw text being a sin? If a general text manipulating application, e.g. cat, grep or awk, is writing to a file, it should not convert normal Arabic characters to presentation forms. You are now asking a general application to determine whether it is writing to a terminal or not, and alter its output if it is writing to a terminal. If the terminal window is actually an emacs text buffer, I would not want such output to be converted. It is entirely natural to convert an emacs text buffer to a file. > > I believe combining marks present issues even in implicit modes. In > > implicit mode, one cannot simply delegate the task to normal text > > rendering, for one has to allocate text to cells. There are a > > number of complications that spring to mind: > > > > 1) Some characters decompose to two characters that may otherwise > > lay claim to their own cells: > > > > U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to > > <06D2, > > 0654>. Do you intend that your scheme be usable by > > Unicode-compliant processes? > > Decompose during which step? During shaping? > > Or do you mean they are NFC-NFD counterparts of each other? The latter. > > 4) Indic conjuncts. > > (i) There are some conjuncts, such as Devanagari K.SSA, where a > > display as , is simply unacceptable. In some > > closely related scripts, this conjunct has the status of a > > character. > > We (in GNOME Terminal / VTE) do have an open bug about Devanagari > spacing marks (currently they don't show up properly), plus Virama and > friends. I'd like to address the essentials along with the BiDi > implementation; although here we should discuss the design and not a > particular implementation thereof :) > > In case you're interested, at > https://bugzilla.gnome.org/show_bug.cgi?id=584160 comments 45-48, 95 > and perhaps a few others comments I wondered whether certain joining > operations should be done on the emulation layer or the display layer. > The answer is not yet clear. We can't fix suddenly everything, but > it's nice to move forward step by step. It's also proposed that we > used HarfBuzz, but it's unclear to me at this point how the grid > alignment could be preserved in the mean time. Thanks for the link. There are two different beasties. There are text windows into which the user and the application communicate using text, and this text tends to be rendered properly, as one might aim to do with HarfBuzz, and as an Emacs text buffer running the shell tries to do. (Emacs needs a lot of help - I can't write a generic Tai Tham OpenType .flt file :-( In my opinion, these are highly appropriate for application like diff, grep and cat. Do we have a good name for them/ They are, perhaps, 'teletype emulators'. > "simply unacceptable" – I'm not familiar with those languages, > cultures and so on, but I'd be hesitant to go as far as calling > anything "unacceptable". E.g. there's a physical typewriter in our > family, as far as I remember it has no digits 1 or 0 (use the letters > lowercase L and anycase O instead), it doesn't contain all the > accented letters of my mother tounge so sometimes a similarly looking > one has to be used. In today's computer world, I'd say such > limitations are "unacceptable", but at that time this was what we had > to live with. > > Terminal emulators, due to their strict character grid nature and > their legacy behavior of many decades, are a platform where a certain > level of compromise might be necessary for some scripts. I cannot tell > where to draw the line, cannot tell what is "extremely bad" vs. "not > nice" vs. "kind of okay but could be better", but we can't do > everything in a terminal emulator that a graphical app could do. If > someone wants to have a pixel perfect look, terminal emulators are
Re: Proposal for BiDi in terminal emulators
On Thu, 31 Jan 2019 08:28:41 + Martin J. Dürst via Unicode wrote: > > Basic Arabic shaping, at the level of a typewriter, is > > straightforward enough to leave to a terminal emulator, as Eli has > > suggested. Lam-alif would be trickier - one cell or two? > > Same for other characters. A medial Beh/Teh/Theh/... (ببب) in any > reasonably decent rendering should take quite a bit less space than a > Seen or Sheen (سسس). I remember that the multilingual Emacs version > mostly written by Ken'ichi Handa (was it called mEmacs or nEmacs or > something like that?) had different widths only just for Arabic. In > Thunderbird, which is what I'm using here, I get hopelessly > stretched/squeezed glyph shapes, which definitely don't look good. It's a long time since I last knowingly read typewritten Arabic script, but on reading the description of Haddad's design of the Arabic typewriter, I see what you mean. My point is correct, but your point is another argument for having single- and double-width characters. Richard.
Re: Encoding italic
On 1/31/2019 12:55 AM, Tex via Unicode wrote: As with the many problems with walls not being effective, you choose to ignore the legitimate issues pointed out on the list with the lack of italic standardization for Chinese braille, text to voice readers, etc. The choice of plain text isn't always voluntary. And the existing alternatives, like math italic characters, are problematic. The underlying issue is the lack of rich text support in places where users expect rich text. The solution is to find ways to enable rich text layers that are not full documents and make them interoperable. The solution is not to push this into plain text - which then becomes lowest common denominator rich text instead. A./
RE: Encoding italic
Kent Karlsson wrote: > ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control > sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one > is implemented in Cygwin (sorry for mentioning a product name).) Fair enough. This thread is mostly about italics and bold and such, not colors, but the point is well taken that one of these leads invariably to the others, especially if the standard or flavor in question implements them. > ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which > traditionally does not use bold or italic. But that's OK. For low-level mechanisms like these, it should be incumbent on the user to say, "Yes, I can use this styling with that script, but I shouldn't; it would look terrible and would fly in the face of convention." ISO 6429 also allows green text on a cyan background, which is about as good an idea as CJK italics. > Compare those specified for CSS > (https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and > https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style). > These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should > be of interest for the generalised subject of this thread. I'm hoping we can continue to restrict this thread to plain text. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Proposal for BiDi in terminal emulators
Egmont Koblinger wrote: > "Basic Arabic shaping, at the level of a typewriter, is > straightforward enough to be implemented in the application, using > presentation form characters, as I suggest". Could you please point > out the problems with this statement? As multiple people have pointed out, Arabic presentation forms don't cover the whole Arabic script and are not generally recommended for new applications, though they are not formally deprecated. If you take a look at the parallel discussion about italics in plain text, you will see a corollary in the use of Mathematical Alphanumeric Symbols: they look tempting and are (usually) easy to render, but among other things, they only cover [A-Za-zıȷΑ-Ωα-ω] and thus miss much of the text that may need to be italicized. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Proposal for BiDi in terminal emulators
On 1/31/2019 1:41 AM, Egmont Koblinger via Unicode wrote: I mean, for example we can introduce control characters that specify the language. That is a complete non-starter for the Unicode Standard. And if the terminal implementation introduces such as one-off hacks, they will fail completely for interoperability. https://en.wikipedia.org/wiki/IETF_language_tag That belongs to the markup level, not to the plain text stream. --Ken
Re: Proposal for BiDi in terminal emulators
> Date: Thu, 31 Jan 2019 10:58:54 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > Yes, I do argue that emacs will need to print a new escape sequence. > Which is much-much-much-much-much better than having to tell users to > go into the settings of their macOS Terminal / Konsole / > gnome-terminal etc. and disable BiDi there, isn't it? I'm not sure I agree. Most users can disable bidi reordering of the terminal once and for all. They don't need it. If terminals supported some control sequence to turn on and off the reordering, it might be a useful feature to support such sequences. But IME just querying the emulator whether it supports that or not is a hassle, and generally slows down the application startup. So it's a mixed blessing. > > On the other hand, all that the program can output is a sequence of Unicode > > codepoints. These don't include shaping information > > With "presentation form" characters, yes, they can, they do including > shaping information. Let's please stop talking about presentation forms, they solve only a small part of the shaping problem. > > and it's > > the terminal who's equipped with _most_ of the needed data > > Why? It's the app that knows the context characters, it's the app that > knows the language. But only the emulator knows which fonts it uses, and only the emulator can access the information about the font, like what OTF features it supports, what glyphs it has, etc.
Re: Encoding italic
On Thu, Jan 31, 2019 at 02:21:40PM +, James Kass via Unicode wrote: > David Starner wrote, > > The choice of using single-byte character sets isn't always voluntary. > > That's why we should use ISO-2022, not Unicode. Or we can expect > > people to fix their systems. What systems are we talking about, that > > support Unicode but compel you to use plain text? The use of Twitter > > is surely voluntary. > > This marketing-related web page, > > https://litmus.com/blog/best-practices-for-plain-text-emails-a-look-at-why-theyre-important > > ...lists various reasons for using plain-text e-mail. They're only from a spammer's point of view. > Besides marketing, there’s also newsletters and e-mail discussion groups. > Some of those discussion groups are probably scholarly. Anyone involved in > that would likely embrace ‘super cool Unicode text magic’ and it’s > surprising if none of them have stumbled across the math alphanumerics yet. Then there are technical mailing lists. In particular, on every single list other than Unicode I'm subscribed to, a HTML-only mail would get you flamed by several list members; even a plain+HTML alternative can get you an earful. Then there's LKML and other lists hosted at vger, where a mail that as much as has a HTML version attached will get outright rejected at mail software level. After 2½ decades of participating mailing in mailing lists, I got aversion to HTML mails burned in as a kind of involuntary reflex. Upon seeing Asmus' mails, the ingrained reflex kicks in, I start getting upset, only to realize what list I'm reading and that it's him who's a regular here, not me. So even when in principle adding such features would be possible, many communities decide to prefer interoperability over newest types of bling. Some prefer top-posted HTML mails, some prefer Twitter, some Unicode plain text, some perhaps want plain ASCII only. > It’s true that people don’t have to use Twitter. People don’t have to turn > on their computers, either. And sometimes they use a Braille reader or a text console. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Thu, 31 Jan 2019 10:41:02 +0100 > Cc: Frédéric Grosshans , > unicode@unicode.org > > > Personally, I think we should simply assume that complex script > > shaping is left to the terminal, and if the terminal cannot do that, > > then that's a restriction of working on a text terminal. > > I cannot read any of the Arabic, Syriac etc. scripts, but I did lots > of experimenting with picking random Arabic words, displaying in the > terminal their unshaped as well as shaped (using presentation form > characters) variants, and compared them to pango-view's (harfbuzz's) > rendering. > > To my eyes the version I got in the terminal with the presentation > form characters matched the "expected" (pango-view) rendering > extremely closely. I suggest that you show the result to someone who does read Arabic. Small changes can be very unpleasant to the eyes of an Arabic reader. As for Arabic presentation forms, I already explained why they cannot be considered a solution that is anywhere near the complete one. > > OTOH a terminal emulator who wants to perform shaping needs > > information from the application > > And the presentation form characters are pretty much exactly that > information, aren't they (for Arabic)? Much more is needed for correct shaping. > Instead of saying that it's not possible, could we perpahs try to > solve it, or at least substantially improve the situation? I mean, for > example we can introduce control characters that specify the language. > We can introduce a flag that tells the terminal whether to do shaping > or not. There are probably plenty of more ideas to be thrown in for > discussion and improvement. You could do that, but it will require a lot of non-trivial processing from the applications. Text-mode applications don't want any complex tinkering, they want just to write their text and be done. The more overhead you add to that simple task, the less probable it is that applications will support such a terminal.
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Thu, 31 Jan 2019 10:28:27 +0100 > Cc: Adam Borowski , unicode@unicode.org > > On Wed, Jan 30, 2019 at 5:10 PM Eli Zaretskii wrote: > > > I think the application could use TAB characters to get to the next > > cell, then simplistic reordering would also work. > > TAB handling is extremely complicated, because in terminal emulation > TAB is not a character, TAB is a control instruction (like escape > sequences) that moves the cursor (and jumps through the existing > content, if any, without erasing it). The reordering happens before TABs are converted to cursor motion, does it not? If so, their effect on reordering, by virtue of the TAB being Segment Separator for the UBA purposes, could happen nonetheless. Right?
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Thu, 31 Jan 2019 10:21:52 +0100 > Cc: Adam Borowski , unicode@unicode.org > > > Does anyone know of a terminal emulator which supports isolates? > > GNOME Terminal's (VTE's) current work-in-progress implementation does > remember BiDi control characters just like it remembers combining > accents, that is, tied to the preceding letter's cell. It uses FriBidi > 1.0 for the BiDi work, so yes, it supports Unicode 6.3's isolates. So we will some day have one such terminal emulator. That's good, but a text-mode application that needs to support bidi cannot rely on its users all having access to that single terminal. > There's one significant issue, though. Because we currently just > misuse our existing infrastructure of combining accents for the BiDi > controls, BiDi controls at the very beginning of a paragraph are > dropped. Addressing this issue would need core changes to the terminal > emulation behavior, such as introducing in-between-cells storage, or > zero-width special characters belonging to a cell _before_ the cell's > actual letter, or something like this. I outline one idea in my > specification, but it's subject to discussion to finalize it. This is indeed a significant issue, because it means applications cannot force the terminal use a certain non-default base paragraph direction.
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Thu, 31 Jan 2019 10:11:22 +0100 > Cc: unicode@unicode.org > > > It doesn't do _any_ shaping. Complex script shaping is left to the > > terminal, because it's impossible to do shaping in any reasonable way > > [...] > > Partially, you are right. On the other hand, as far as I know, shaping > should take into account the neighboring glyphs even if those are not > visible (e.g. overflow from the viewport), and the terminal is unaware > of what those glyps are. This is an area that "presentation form" > characters can address for Arabic – although as it was pointed out, > not for Syrian and some others. Arabic presentation forms are more like an exception than a rule, I hope you understand this by now. Most languages/scripts don't have such forms, and even for Arabic they cover only a part of what needs to be done to present correctly shaped text. Complex script shaping is much more than just substituting some glyphs with others, it requires an intimate knowledge of the font being used and its capabilities, and the ability to control how various glyphs of a grapheme cluster are placed relative to one another, something that an application running on a text terminal cannot do. So I suggest that you don't consider Arabic presentation forms a representative of the direction in which terminal emulators supporting such scripts should evolve.
Re: Encoding italic
Is the way to try to resolve this for a proposal document to be produced for using Variation Selector 14 in order to produce italics and for the proposal document to be submitted to the Unicode Technical Committee? If the proposal is allowed to go to the committee rather than being ruled out of scope, then we can know whether the Unicode Technical Committee will allow the encoding. William Overington Thursday 31 January 2019
Re: Proposal for BiDi in terminal emulators
Le 31/01/2019 à 10:41, Egmont Koblinger a écrit : Hi, Personally, I think we should simply assume that complex script shaping is left to the terminal, and if the terminal cannot do that, then that's a restriction of working on a text terminal. I cannot read any of the Arabic, Syriac etc. scripts, but I did lots of experimenting with picking random Arabic words, displaying in the terminal their unshaped as well as shaped (using presentation form characters) variants, and compared them to pango-view's (harfbuzz's) rendering. To my eyes the version I got in the terminal with the presentation form characters matched the "expected" (pango-view) rendering extremely closely. Of course there's still some tradeoffs due to fixed with cells (just as in English, arguably an "i" and "w" having the same width doesn't look as nice as with proportional fonts). In the mean time, the unshaped characters looks vastly differently. OTOH a terminal emulator who wants to perform shaping needs information from the application And the presentation form characters are pretty much exactly that information, aren't they (for Arabic)? There's nothing you can do here [...] there's no way for the application to provide Instead of saying that it's not possible, could we perpahs try to solve it, or at least substantially improve the situation? I mean, for example we can introduce control characters that specify the language. We can introduce a flag that tells the terminal whether to do shaping or not. There are probably plenty of more ideas to be thrown in for discussion and improvement. cheers, egmont
Re: Encoding italic
David Starner wrote, > The choice of using single-byte character sets isn't always voluntary. > That's why we should use ISO-2022, not Unicode. Or we can expect > people to fix their systems. What systems are we talking about, that > support Unicode but compel you to use plain text? The use of Twitter > is surely voluntary. This marketing-related web page, https://litmus.com/blog/best-practices-for-plain-text-emails-a-look-at-why-theyre-important ...lists various reasons for using plain-text e-mail. Here’s an excerpt. “Some people simply prefer it. Plain and simple—some people prefer text emails. ... Some users may also see HTML emails as a security and privacy risk, and choose not to load any images and have visibility over all links that are included in an email. In addition, the increased bandwidth that image-heavy emails tend to consume is another driver of why users simply prefer plain-text emails.” Besides marketing, there’s also newsletters and e-mail discussion groups. Some of those discussion groups are probably scholarly. Anyone involved in that would likely embrace ‘super cool Unicode text magic’ and it’s surprising if none of them have stumbled across the math alphanumerics yet. A web search for the string “plain text only” leads to all manner of applications for which searchers are trying to control their environments. There’s all kinds of reasons why some people prefer to use plain-text, it’s often an informed choice and it isn’t limited to e-mail. It’s true that people don’t have to use Twitter. People don’t have to turn on their computers, either.
Re: Proposal for BiDi in terminal emulators
Hi Richard, > Basic Arabic shaping, at the level of a typewriter, is straightforward > enough to leave to a terminal emulator, as Eli has suggested. What is "basic" Arabic shaping exactly? I can see problems with leaving it to a terminal. It's not aware of the neighboring character if the string is cropped. It's not able to separate different UI elements that happen to be adjacent in the terminal, separated by different background color or such. On the other hand, let's reverse the question: "Basic Arabic shaping, at the level of a typewriter, is straightforward enough to be implemented in the application, using presentation form characters, as I suggest". Could you please point out the problems with this statement? > I believe combining marks present issues even in implicit modes. In > implicit mode, one cannot simply delegate the task to normal text > rendering, for one has to allocate text to cells. There are a number > of complications that spring to mind: > > 1) Some characters decompose to two characters that may otherwise lay > claim to their own cells: > > U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2, > 0654>. Do you intend that your scheme be usable by Unicode-compliant > processes? Decompose during which step? During shaping? Or do you mean they are NFC-NFD counterparts of each other? Most terminal emulators are able to handle combining accents, and of course implicit mode would take them into account when rearranging the letters. Terminal emulators don't do explicit (de)composing, a.k.a. NFC->NFD or NFD->NFC conversion (at least I'm not aware of any that does). > 4) Indic conjuncts. > (i) There are some conjuncts, such as Devanagari K.SSA, where a > display as , is simply unacceptable. In some > closely related scripts, this conjunct has the status of a character. We (in GNOME Terminal / VTE) do have an open bug about Devanagari spacing marks (currently they don't show up properly), plus Virama and friends. I'd like to address the essentials along with the BiDi implementation; although here we should discuss the design and not a particular implementation thereof :) In case you're interested, at https://bugzilla.gnome.org/show_bug.cgi?id=584160 comments 45-48, 95 and perhaps a few others comments I wondered whether certain joining operations should be done on the emulation layer or the display layer. The answer is not yet clear. We can't fix suddenly everything, but it's nice to move forward step by step. It's also proposed that we used HarfBuzz, but it's unclear to me at this point how the grid alignment could be preserved in the mean time. "simply unacceptable" – I'm not familiar with those languages, cultures and so on, but I'd be hesitant to go as far as calling anything "unacceptable". E.g. there's a physical typewriter in our family, as far as I remember it has no digits 1 or 0 (use the letters lowercase L and anycase O instead), it doesn't contain all the accented letters of my mother tounge so sometimes a similarly looking one has to be used. In today's computer world, I'd say such limitations are "unacceptable", but at that time this was what we had to live with. Terminal emulators, due to their strict character grid nature and their legacy behavior of many decades, are a platform where a certain level of compromise might be necessary for some scripts. I cannot tell where to draw the line, cannot tell what is "extremely bad" vs. "not nice" vs. "kind of okay but could be better", but we can't do everything in a terminal emulator that a graphical app could do. If someone wants to have a pixel perfect look, terminal emulators are not for them. Maybe looking at typewriters of those scripts could be a good starting point. Anyway, we've drifted quite far away. What I've already implemented in VTE (in a work-in-progress branch), and to my eyes looks quite nice, is Arabic shape using presentation form characters as done by FriBidi (in implicit mode only). According to the API of this library, this shaping process keeps a 1:1 mapping between the original and shaped letters (at least the number of Unicode codepoints – I haven't double checked their terminal width, but I really hope they don't mess with us here). That is, I don't have to deal with a character cell splitting into two, or two character cells joining into one during shaping. Does this sound okay so far? cheers, egmont
Re: Proposal for BiDi in terminal emulators
On Thu, Jan 31, 2019 at 10:05 AM Richard Wordingham via Unicode wrote: > > How will "ls -l" possibly work? This is an example of the "table" > > layout you were already discussing. > > I think the answer is that it will use the same trickery as with a > default setting for the --color argument. Colour codes are emitted > only when the output is a terminal. Presumably the same would go for > Bidi controls. Exactly, that's what I have in mind in the long run. If coreutils folks like the idea, "ls" could have a new option --bidi=never/auto/always. With BiDi mode, it would enclose each of the logical segments of strings that potentially contain RTL text (filenames, dates etc.) separately inside an FSI...PDI block. That way its output would look as desired (over the terminal's new default "implicit" mode), since the terminal would take care of BiDi-ing each FSI...PDI block. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi, > > And if you argue "so make emacs print your > > new code to disable formatting", so do thousands of other programs that are > > less sophisticated than emacs. > > Yes, I do argue that emacs will need to print a new escape sequence. > Which is much-much-much-much-much better than having to tell users to > go into the settings of their macOS Terminal / Konsole / > gnome-terminal etc. and disable BiDi there, isn't it? Let me phrase it slightly differently. Emacs will not "need" to print a new escape sequence, but will have the possibility to do so. VTE is pretty certainly going to switch its default behavior to what Konsole, PuTTY, Mlterm, macOS Terminal do now: to perform BiDi on its contents. This mode is not suitable for Emacs or for any BiDi-aware text editor. Similarly to these terminal emulators, GNOME Terminal (and hopefully other VTE-based frontends) will also most likely have a user setting to force disable BiDi. But as opposed to the aforementioned terminals, VTE will also turn off BiDi upon a designated escape sequence. VTE is the terminal widget behind several emulator apps, such as GNOME Terminal, Xfce Terminal, Tilix, Terminator, Guake... I don't have metrics, but according to various user polls I have the feeling that VTE's usage share among Linux users is pretty significant, somewhere in the ballpark of 50%. Of course Emacs, or any other text editor, can still point its users to the terminal's setting to disable BiDi. And then if the user also wishes to have BiDi for "cat", they'll have to keep toggling it back and forth. Or Emacs can emit the new escape sequence and then it will be fully automatic. Which one puts less supporting burden on Emacs's developers and supporters? Which one is the better for the users? I think the answer is the same to these two questions, and you sure know which answer I'm thinking of. According to this specification, nothing is going to be "worse" than it already is in those few aforementioned terminal emulators. The new default behavior will be the same as their behavior. We'll just further extend it with the possibility of switching back to the old mode without annoying the user. I hope this clarifies a lot. cheers, egmont
Re: Encoding italic
David Starner wrote, > Emoji, as have been pointed out several times, were in the original > Unicode standard and date back to the 1980s; the first DOS character > page has similes at 0x01 and 0x02. That's disingenuous.
Re: Proposal for BiDi in terminal emulators
Hi, > Arabic terminals and terminal emulators existed at the time of Unicode 1.0. I haven't found any mention of them, let alone any documentation about them. > If you are trying to emulate those services, for example so that older > software can run, you would need to look at how these programs expected to be > fed their data. My goal is not to have those ancient software run. My goal is to look into the future. Address the requests often seen in current terminal emulator's bugtrackers. Stop the utterly unacceptable user experience of current self-proclaimed BiDi-aware terminals where in order to run Emacs you need to fiddle with the terminal's settings. Show that BiDi in terminals is a much more complex story than just shuffling around the characters, thus stopping new emulators from taking this broken route which causes about as much damage as good. Create a platform on top of which modern BiDi-aware experience can be created, to make both "cat" and "emacs" work properly out of the box for BiDi users. > I see little reason to reinvent things here, because we are talking about > emulating legacy hardware. Or are we not? As per the above, no, not really. I'm not aware of any hardware that supported BiDi, was there any? I look at terminal emulators as extremely powerful tools for getting all kinds of work done. They are continuously being improved, nowadays many terminal emulators contain plenty of features that weren't there in any hardware one. I'm looking for smoothlessly extending the terminal emulator experience to the RTL / BiDi world. > It's conceivable, that with modern fonts, one can show some characters that > could not be supported on the actual legacy hardware, because that was > limited by available character memory and available pre-Unicode character > sets. As long as the new characters otherwise fit the paradigm (character per > cell) they can be supported without other changes in the protocol beyond > change in character set. Which protocol, the protocol of non-BiDi-aware terminals that lays out everything from left to right, so the output of "echo", "cat" etc. are reversed; or the protocol of self-proclaimed BiDi-aware terminals where it's literally impossible to create a proper BiDi-aware text editor? My work focuses on proving that both of these modes are needed, and how the necessary mode switches could happen automatically behind the scenes. > However, I would not expect an emulator to accept data in NFD for example. Many emulators do. cheers, egmont
Re: Encoding italic
On Thu, Jan 31, 2019 at 12:56 AM Tex wrote: > > David, > > "italics has never been considered part of plain text and has always been > considered outside of plain text. " > > Time to change the definition if that is what is holding you back. That's not a definition; that's a fact. Again, it's like the 8-bit byte; there are systems with other sizes of byte, but you usually shouldn't worry about it. Building systems that don't have 8-bit bytes are possible, but it's likely to cost more than it's worth. > As has been said before, interlinear annotation, emoji and other features of > Unicode which are now considered plain text were not in the original > definition. https://www.w3.org/TR/unicode-xml/#Interlinear (which used to be Unicode Technical Report #20) says "The interlinear annotation characters were included in Unicode only in order to reserve code points for very frequent application-internal use. ... Including interlinear annotation characters in marked-up text does not work because the additional formatting information (how to position the annotation,...) is not available. ... The interlinear annotation characters are also problematic when used in plain text, and are not intended for that purpose." Emoji, as have been pointed out several times, were in the original Unicode standard and date back to the 1980s; the first DOS character page has similes at 0x01 and 0x02. > If Unicode encoded an italic mechanism it would be part of plain text, just > as the many other styled spaces, dashes and other characters have become > plain text despite being typographic. If Unicode encoded an italic mechanism, then some "plain text" would include italics. Maybe it would be successful, and maybe it would join the interlinear annotation characters as another discouraged poorly supported feature. > As with the many problems with walls not being effective, you choose to > ignore the legitimate issues pointed out on the list with the lack of italic > standardization for Chinese braille, text to voice readers, etc. Text to voice readers don't have problems with the lack of italic standardization; they have problems with people using mathematical characters instead of actual letters. > The choice of plain text isn't always voluntary. The choice of using single-byte character sets isn't always voluntary. That's why we should use ISO-2022, not Unicode. Or we can expect people to fix their systems. What systems are we talking about, that support Unicode but compel you to use plain text? The use of Twitter is surely voluntary. -- Kie ekzistas vivo, ekzistas espero.
Re: Proposal for BiDi in terminal emulators
Hi, On Wed, Jan 30, 2019 at 5:31 PM Adam Borowski wrote: > The program (emacs in this case) can do arbitrary reordering of characters > on the grid, it also has lots of information the terminal doesn't. For > example, what are you going to do when there's a line longer than what fits > on the screen? Emacs will cut and hide part of it; any attempts to reorder > that paragraph by the terminal are outright broken as you don't _have_ the > paragraph. Same for a popup window on the middle of the screen partially > obscuring some text underneath. This is absolutely correct so far. > And if you argue "so make emacs print your > new code to disable formatting", so do thousands of other programs that are > less sophisticated than emacs. Yes, I do argue that emacs will need to print a new escape sequence. Which is much-much-much-much-much better than having to tell users to go into the settings of their macOS Terminal / Konsole / gnome-terminal etc. and disable BiDi there, isn't it? Could you please give me a brief idea about those "thousands of other programs" that will need to be adjusted? What other apps can do BiDi? Not even Vim/NeoVim can do it. If an app doesn't support BiDi, it's broken anyways when encountering RTL text. It'll still be broken, just broken differently. Did you mean all these programs as those thousands? For ncurses apps there's also a workaround that you could apply: create a terminfo where the ti/te entries not only switch to/from the alternate screen but also disable/enable BiDi. In that case all these thousand ones will be "fixed" (that is: broken in the "old" way rather than broken in the "new" way). On the other hand, what you absolutely can *not* do automatically by emitting escape sequences at the right times, is to enclose the output of much lighter utilities like "echo", "cat", "grep", "head" and so on with any kind of BiDi controls. > On the other hand, all that the program can output is a sequence of Unicode > codepoints. These don't include shaping information With "presentation form" characters, yes, they can, they do including shaping information. > and are not supposed > to. The shaping is explicitly meant to be done by the terminal, Why? > and it's > the terminal who's equipped with _most_ of the needed data Why? It's the app that knows the context characters, it's the app that knows the language. What is it that the terminal knows, but the app doesn't although should, or what is it that the terminal doesn't know if presentation form characters are used? What is it that the app knows but cannot pass to the terminal? Shouldn't we then extend the protocol so that it can pass these, too? e.
Re: Proposal for BiDi in terminal emulators
Hi, > Personally, I think we should simply assume that complex script > shaping is left to the terminal, and if the terminal cannot do that, > then that's a restriction of working on a text terminal. I cannot read any of the Arabic, Syriac etc. scripts, but I did lots of experimenting with picking random Arabic words, displaying in the terminal their unshaped as well as shaped (using presentation form characters) variants, and compared them to pango-view's (harfbuzz's) rendering. To my eyes the version I got in the terminal with the presentation form characters matched the "expected" (pango-view) rendering extremely closely. Of course there's still some tradeoffs due to fixed with cells (just as in English, arguably an "i" and "w" having the same width doesn't look as nice as with proportional fonts). In the mean time, the unshaped characters looks vastly differently. > OTOH a terminal emulator who wants to perform shaping needs > information from the application And the presentation form characters are pretty much exactly that information, aren't they (for Arabic)? > There's nothing you can do here [...] there's no way for the application to > provide Instead of saying that it's not possible, could we perpahs try to solve it, or at least substantially improve the situation? I mean, for example we can introduce control characters that specify the language. We can introduce a flag that tells the terminal whether to do shaping or not. There are probably plenty of more ideas to be thrown in for discussion and improvement. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Eli, On Wed, Jan 30, 2019 at 5:10 PM Eli Zaretskii wrote: > I think the application could use TAB characters to get to the next > cell, then simplistic reordering would also work. TAB handling is extremely complicated, because in terminal emulation TAB is not a character, TAB is a control instruction (like escape sequences) that moves the cursor (and jumps through the existing content, if any, without erasing it). Some terminal emulators perform some magic to remember TABs in certain circumstances, but they cannot always do so. There are plenty of other problems, e.g. how they are handled at the end of line (no, they don't wrap to the next line), how their positions are user-configurable and not necessarily at every 8th column etc., I'm not going into these details now if you don't mind, it's just not a feasible approach. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Eli, > Does anyone know of a terminal emulator which supports isolates? GNOME Terminal's (VTE's) current work-in-progress implementation does remember BiDi control characters just like it remembers combining accents, that is, tied to the preceding letter's cell. It uses FriBidi 1.0 for the BiDi work, so yes, it supports Unicode 6.3's isolates. There's one significant issue, though. Because we currently just misuse our existing infrastructure of combining accents for the BiDi controls, BiDi controls at the very beginning of a paragraph are dropped. Addressing this issue would need core changes to the terminal emulation behavior, such as introducing in-between-cells storage, or zero-width special characters belonging to a cell _before_ the cell's actual letter, or something like this. I outline one idea in my specification, but it's subject to discussion to finalize it. (There's also a less significant issue: copy-pasting fragments of text probably doesn't produce the contents that make the most sense wrt. BiDi controls. I'm not sure what other software do here, though.) Mintty is also actively working on BiDi support, I believe its author just recently added support for isolates. It uses its own BiDi implementation. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Eli, > It doesn't do _any_ shaping. Complex script shaping is left to the > terminal, because it's impossible to do shaping in any reasonable way > [...] Partially, you are right. On the other hand, as far as I know, shaping should take into account the neighboring glyphs even if those are not visible (e.g. overflow from the viewport), and the terminal is unaware of what those glyps are. This is an area that "presentation form" characters can address for Arabic – although as it was pointed out, not for Syrian and some others. I'd say it's subject to further research and improvement to find the ideal behavior. cheers, egmont
Re: Proposal for BiDi in terminal emulators
On Wed, 30 Jan 2019 20:35:36 -0500 "Mark E. Shoulson via Unicode" wrote: > On 1/30/19 8:58 AM, Egmont Koblinger via Unicode wrote: > > There's another side to the entire BiDi story, though. Simple > > utilities like "echo", "cat", "ls", "grep" and so on, line editing > > experience of your shell, these kinds. It's absolutely not feasible > > to add BiDi support to these utilities. Here the only viable > > approach is to have the terminal emulator do it. > > How will "ls -l" possibly work? This is an example of the "table" > layout you were already discussing. I think the answer is that it will use the same trickery as with a default setting for the --color argument. Colour codes are emitted only when the output is a terminal. Presumably the same would go for Bidi controls. > > I think us command-line troglodytes just have to deal with not having > a whole lot of BiDi support. There's simply no way any terminal > emulator could possibly know what makes sense and what doesn't for a > given line of text, coming from some random program. Your "grep" > could be grepping from a file with ANY layout, not necessarily one > conducive to terminal layout, and so on. So how do editors work now? To avoid confusion, you will have to work with the terminal set to having LTR paragraphs (or RTL instead); that is how Notepad works. Richard.
RE: Encoding italic
David, "italics has never been considered part of plain text and has always been considered outside of plain text. " Time to change the definition if that is what is holding you back. As has been said before, interlinear annotation, emoji and other features of Unicode which are now considered plain text were not in the original definition. If Unicode encoded an italic mechanism it would be part of plain text, just as the many other styled spaces, dashes and other characters have become plain text despite being typographic. "The fact that italics can be handled elsewhere very much weighs against the value of your change. Everything you want to do can be done and is being done, except when someone chooses not to do it." I heard a recent similar argument that goes: walls have been around since medieval times and they work really well... (Except they provably don't.) As with the many problems with walls not being effective, you choose to ignore the legitimate issues pointed out on the list with the lack of italic standardization for Chinese braille, text to voice readers, etc. The choice of plain text isn't always voluntary. And the existing alternatives, like math italic characters, are problematic. tex -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of David Starner via Unicode Sent: Wednesday, January 30, 2019 11:59 PM To: Unicode Mailing List Subject: Re: Encoding italic On Wed, Jan 30, 2019 at 11:37 PM James Kass via Unicode wrote: > As Tex Texin observed, differences of opinion as to where we draw the > line between text and mark-up are somewhat ideological. If a compelling > case for handling italics at the plain-text level can be made, then the > fact that italics can already be handled elsewhere doesn’t matter. If a > compelling case cannot be made, there are always alternatives. To the extent I'd have ideology here, it's that that line is arbitrary and needs to fit practical demands. Should we have eight-bit bytes? I'm not sure that was the best solution, and other systems worked just fine, but we've got a computing environment that makes anything else unpractical. Unlike that question, italics has never been considered part of plain text and has always been considered outside of plain text. The fact that italics can be handled elsewhere very much weighs against the value of your change. Everything you want to do can be done and is being done, except when someone chooses not to do it. -- Kie ekzistas vivo, ekzistas espero.
Re: Proposal for BiDi in terminal emulators
On 2019/01/31 07:02, Richard Wordingham via Unicode wrote: > On Wed, 30 Jan 2019 15:33:38 +0100 > Frédéric Grosshans via Unicode wrote: > >> Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit : >>> - It doesn't do Arabic shaping. In my recommendation I'm arguing >>> that in this mode, where shuffling the characters is the task of >>> the text editor and not the terminal, so should it be for Arabic >>> shaping using presentation form characters. >> >> I guess Arabic shaping is doable through presentation form >> characters, because the latter are character inherited from legacy >> standards using them in such solutions. > > So long as you don't care about local variants, e.g. U+0763 ARABIC > LETTER KEHEH WITH THREE DOTS ABOVE. It has no presentation form > characters. Same also for characters used for other languages than Arabic. > Basic Arabic shaping, at the level of a typewriter, is straightforward > enough to leave to a terminal emulator, as Eli has suggested. Lam-alif > would be trickier - one cell or two? Same for other characters. A medial Beh/Teh/Theh/... (ببب) in any reasonably decent rendering should take quite a bit less space than a Seen or Sheen (سسس). I remember that the multilingual Emacs version mostly written by Ken'ichi Handa (was it called mEmacs or nEmacs or something like that?) had different widths only just for Arabic. In Thunderbird, which is what I'm using here, I get hopelessly stretched/squeezed glyph shapes, which definitely don't look good. Regards, Martin.
Re: Encoding italic
On Thursday, 31 January 2019, James Kass via Unicode wrote:. > > > As for use of other variant letter forms enabled by the math > alphanumerics, the situation exists. It’s an interesting phenomenon which > is sometimes worthy of comment and relates to this thread because the math > alphanumerics include italics. One of the web pages referring to > third-party input tools calls the practice “super cool Unicode text magic”. > > Although not all devices can render such text. Many Android handsets on the market do not have a sufficiently recent version of Android to have system fonts that can render such existing usage. -- Andrew Cunningham lang.supp...@gmail.com
Re: Encoding italic
On Wed, Jan 30, 2019 at 11:37 PM James Kass via Unicode wrote: > As Tex Texin observed, differences of opinion as to where we draw the > line between text and mark-up are somewhat ideological. If a compelling > case for handling italics at the plain-text level can be made, then the > fact that italics can already be handled elsewhere doesn’t matter. If a > compelling case cannot be made, there are always alternatives. To the extent I'd have ideology here, it's that that line is arbitrary and needs to fit practical demands. Should we have eight-bit bytes? I'm not sure that was the best solution, and other systems worked just fine, but we've got a computing environment that makes anything else unpractical. Unlike that question, italics has never been considered part of plain text and has always been considered outside of plain text. The fact that italics can be handled elsewhere very much weighs against the value of your change. Everything you want to do can be done and is being done, except when someone chooses not to do it. -- Kie ekzistas vivo, ekzistas espero.
RE: Encoding italic
David, Asmus, · “without external standards, then it's simply impossible.” · “And without external standard, not interoperable.“ As you both know there are de jure as well as de facto standards. So for years people typed : - ) as a smiley without a de facto standard and at some point long before emoji, systems began converting these to smiley faces. Even the utf-8 BOM began as one company’s non-interoperable convention for encoding identifier which later became part of the de facto standard. Ideally interoperability means supported everywhere but we have many useful mechanisms that simply don’t do harm without being interpreted. For example, Unicode relies on this for backward compatibility when it introduces new characters, properties, algorithms, et al that are not understood by all systems but are tolerated by older ones. = While I am at it, I am amused by the arguments earlier in this thread as well as other threads, that go: · If the feature was needed developers would have implemented it by now. It isn’t implemented so the standard doesn’t need it. · The feature was implemented without the standard, so we don’t need it in the standard. If men were meant to fly they would have wings… Apparently, for some, it is only when there are many conflicting implementations that a feature demonstrates both that it is a requirement and also that it should be standardized. In fact, this is sometimes not a bad view as it prevents adding features to the standard that go unused yet add complexity. But, it can also set too high a bar. And often it isn’t a true criteria but just resistance to change. You don’t need italics. When I went to school we just tilted the terminal a few degrees and voila. (You don’t need a car. When I went to school we walked 6 miles to get there. Uphill both ways. J ) tex From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag via Unicode Sent: Wednesday, January 30, 2019 10:20 PM To: unicode@unicode.org Subject: Re: Encoding italic On 1/30/2019 7:46 PM, David Starner via Unicode wrote: On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode <mailto:unicode@unicode.org> wrote: A new beta of BabelPad has been released which enables input, storing, and display of italics, bold, strikethrough, and underline in plain-text Okay? Ed can do that too, along with nano and notepad. It's called HTML (TeX, Troff). If by plain-text, you mean self-interpeting, without external standards, then it's simply impossible. It's either "markdown" or control/tag sequences. Both are out of band information. And without external standard, not interoperable. A./
Re: Encoding italic
David Starner wrote, >> ... italics, bold, strikethrough, and underline in plain-text > > Okay? Ed can do that too, along with nano and notepad. It's called > HTML (TeX, Troff). If by plain-text, you mean self-interpeting, > without external standards, then it's simply impossible. HTML source files are in plain-text. Hopefully everyone on this list understands that and has already explored the marvelous benefits offered by granting users the ability to make exciting and effective page layouts via any plain-text editor. HTML is standard and interchangeable. As Tex Texin observed, differences of opinion as to where we draw the line between text and mark-up are somewhat ideological. If a compelling case for handling italics at the plain-text level can be made, then the fact that italics can already be handled elsewhere doesn’t matter. If a compelling case cannot be made, there are always alternatives. As for use of other variant letter forms enabled by the math alphanumerics, the situation exists. It’s an interesting phenomenon which is sometimes worthy of comment and relates to this thread because the math alphanumerics include italics. One of the web pages referring to third-party input tools calls the practice “super cool Unicode text magic”.
Re: Encoding italic
On 1/30/2019 7:46 PM, David Starner via Unicode wrote: On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode wrote: A new beta of BabelPad has been released which enables input, storing, and display of italics, bold, strikethrough, and underline in plain-text Okay? Ed can do that too, along with nano and notepad. It's called HTML (TeX, Troff). If by plain-text, you mean self-interpeting, without external standards, then it's simply impossible. It's either "markdown" or control/tag sequences. Both are out of band information. And without external standard, not interoperable. A./
Re: Encoding italic
On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode wrote: > A new beta of BabelPad has been released which enables input, storing, > and display of italics, bold, strikethrough, and underline in plain-text Okay? Ed can do that too, along with nano and notepad. It's called HTML (TeX, Troff). If by plain-text, you mean self-interpeting, without external standards, then it's simply impossible. -- Kie ekzistas vivo, ekzistas espero.
Re: Encoding italic
On 1/30/2019 4:38 PM, Kent Karlsson via Unicode wrote: I did say "multiple" and "for instance". But since you ask: ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one is implemented in Cygwin (sorry for mentioning a product name).) No need to be sorry; we understand that the motivation is not so much advertising as giving a concrete example. It would be interesting if anything out there implements CMY(K). My expectation would be that this would be limited to interfaces for printers or their emulators. (The "named" ones, though very popular in terminal emulators, are all much too stark, I think, and the exact colour for them are implementation defined.) Muted colors are something that's become more popular as display hardware has improved. Modern displays are able to reproduce these both more predictably as well as with the necessary degree of contrast (although some users'/designer's fetish for low contrast text design is almost as bad as people randomly mixing "stark" FG/BG colors in the '90s.) ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which traditionally does not use bold or italic. Compare those specified for CSS (https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style). These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should be of interest for the generalised subject of this thread. Mapping all of these to CSS would be essential if you want this stuff to be interoperable. There are some other differences as well, but those are the major ones with regard to text styling. (I don't know those standards to a tee. I've just looked at the "m" control sequences for text styling. And yes, I looked at the free copies...) /Kent Karlsson PS If people insist on that EACH character in "plain text" italic/bold/etc "controls" be default ignorable: one could just take the control sequences as specified, but map the printable characters part to the corresponding tag characters... Not that I think that that is really necessary. Systems that support "markdown", i.e. simplified markup to provide the most main-stream features of rich-text tend to do that with printable characters, for a reason. Perhaps two reasons. Users find it preferable to have a visible fallback when "markdown" is not interpreted by a receiving system and users' generally like the ability to edit the markdown directly (even if, for convenience) there's some direct UI support for adding text styling. Loading up the text with lots of invisible characters that may be deleted or copied out of order by someone working on a system that neither interprets nor displays these code points is an interoperability nightmare in my opinion. Den 2019-01-30 22:24, skrev "Doug Ewell via Unicode" : Kent Karlsson wrote: Yes, great. But as I've said, we've ALREADY got a default-ignorable-in-display (if implemented right) way of doing such things. And not only do we already have one, but it is also standardised in multiple standards from different standards institutions. See for instance "ISO/IEC 8613-6, Information technology --- Open Document Architecture (ODA) and Interchange Format: Character content architecture". I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but has the advantage of not costing me USD 179, and it looks very similar to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we are talking about: setting text display properties such as bold and italics by means of escape sequences. Can you explain how ISO 8613-6 differs from ISO 6429 for what we are doing, and if it does not, why we should not simply refer to the more familiar 6429? -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Proposal for BiDi in terminal emulators
On 1/30/19 8:58 AM, Egmont Koblinger via Unicode wrote: There's another side to the entire BiDi story, though. Simple utilities like "echo", "cat", "ls", "grep" and so on, line editing experience of your shell, these kinds. It's absolutely not feasible to add BiDi support to these utilities. Here the only viable approach is to have the terminal emulator do it. How will "ls -l" possibly work? This is an example of the "table" layout you were already discussing. I think us command-line troglodytes just have to deal with not having a whole lot of BiDi support. There's simply no way any terminal emulator could possibly know what makes sense and what doesn't for a given line of text, coming from some random program. Your "grep" could be grepping from a file with ANY layout, not necessarily one conducive to terminal layout, and so on. ~mark
Re: Encoding italic
I did say "multiple" and "for instance". But since you ask: ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one is implemented in Cygwin (sorry for mentioning a product name).) (The "named" ones, though very popular in terminal emulators, are all much too stark, I think, and the exact colour for them are implementation defined.) ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which traditionally does not use bold or italic. Compare those specified for CSS (https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style). These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should be of interest for the generalised subject of this thread. There are some other differences as well, but those are the major ones with regard to text styling. (I don't know those standards to a tee. I've just looked at the "m" control sequences for text styling. And yes, I looked at the free copies...) /Kent Karlsson PS If people insist on that EACH character in "plain text" italic/bold/etc "controls" be default ignorable: one could just take the control sequences as specified, but map the printable characters part to the corresponding tag characters... Not that I think that that is really necessary. Den 2019-01-30 22:24, skrev "Doug Ewell via Unicode" : > Kent Karlsson wrote: > >> Yes, great. But as I've said, we've ALREADY got a >> default-ignorable-in-display (if implemented right) >> way of doing such things. >> >> And not only do we already have one, but it is also >> standardised in multiple standards from different >> standards institutions. See for instance "ISO/IEC 8613-6, >> Information technology --- Open Document Architecture (ODA) >> and Interchange Format: Character content architecture". > > I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but > has the advantage of not costing me USD 179, and it looks very similar > to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we > are talking about: setting text display properties such as bold and > italics by means of escape sequences. > > Can you explain how ISO 8613-6 differs from ISO 6429 for what we are > doing, and if it does not, why we should not simply refer to the more > familiar 6429? > > -- > Doug Ewell | Thornton, CO, US | ewellic.org >
Re: Proposal for BiDi in terminal emulators
Arabic terminals and terminal emulators existed at the time of Unicode 1.0. If you are trying to emulate those services, for example so that older software can run, you would need to look at how these programs expected to be fed their data. I see little reason to reinvent things here, because we are talking about emulating legacy hardware. Or are we not? It's conceivable, that with modern fonts, one can show some characters that could not be supported on the actual legacy hardware, because that was limited by available character memory and available pre-Unicode character sets. As long as the new characters otherwise fit the paradigm (character per cell) they can be supported without other changes in the protocol beyond change in character set. However, I would not expect an emulator to accept data in NFD for example. A./ On 1/30/2019 2:02 PM, Richard Wordingham via Unicode wrote: On Wed, 30 Jan 2019 15:33:38 +0100 Frédéric Grosshans via Unicode wrote: Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit : - It doesn't do Arabic shaping. In my recommendation I'm arguing that in this mode, where shuffling the characters is the task of the text editor and not the terminal, so should it be for Arabic shaping using presentation form characters. I guess Arabic shaping is doable through presentation form characters, because the latter are character inherited from legacy standards using them in such solutions. So long as you don't care about local variants, e.g. U+0763 ARABIC LETTER KEHEH WITH THREE DOTS ABOVE. It has no presentation form characters. Basic Arabic shaping, at the level of a typewriter, is straightforward enough to leave to a terminal emulator, as Eli has suggested. Lam-alif would be trickier - one cell or two? But if you want to support other “arabic like” scripts (like Syriac, N’ko), or even some LTR complex scripts, like Myanmar or Khmer, this “solution” cannot work, because no equivalent of “presentation form characters” exists for these scripts I believe combining marks present issues even in implicit modes. In implicit mode, one cannot simply delegate the task to normal text rendering, for one has to allocate text to cells. There are a number of complications that spring to mind: 1) Some characters decompose to two characters that may otherwise lay claim to their own cells: U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2, 0654>. Do you intend that your scheme be usable by Unicode-compliant processes? 2) 2-part vowels, such as U+0D4A MALAYALAM VOWEL SIGN O, which canonically decomposes into a preceding combining mark U+0D46 MALAYALAM VOWEL SIGN E and following combining mark U+0D3E MALAYALAM VOWEL SIGN AA. 3) Similar 2-part vowels that do not decompose, such as U+17C4 KHMER VOWEL SIGN OO. OpenType layout decomposes that into a preceding 'U+17C1 KHMER VOWEL SIGN E' and the second part. 4) Indic conjuncts. (i) There are some conjuncts, such as Devanagari K.SSA, where a display as , is simply unacceptable. In some closely related scripts, this conjunct has the status of a character. (ii) In some scripts, e.g. Khmer, the virama-equivalent is not an acceptable alternative to form a consonant stack. Khmer could equally well have been encoded with a set of subscript consonants in the same manner as Tibetan. (iii) In some scripts, there are marks named as medial consonants which function in exactly the same way as <'virama', consonant>; it is silly to render them in entirely different manners. 5) Some non-spacing marks are spacing marks in some contexts. U+102F MYANMAR VOWEL SIGN U is probably the best known example. Richard.
Re: Proposal for BiDi in terminal emulators
On Wed, 30 Jan 2019 15:33:38 +0100 Frédéric Grosshans via Unicode wrote: > Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit : > > - It doesn't do Arabic shaping. In my recommendation I'm arguing > > that in this mode, where shuffling the characters is the task of > > the text editor and not the terminal, so should it be for Arabic > > shaping using presentation form characters. > > I guess Arabic shaping is doable through presentation form > characters, because the latter are character inherited from legacy > standards using them in such solutions. So long as you don't care about local variants, e.g. U+0763 ARABIC LETTER KEHEH WITH THREE DOTS ABOVE. It has no presentation form characters. Basic Arabic shaping, at the level of a typewriter, is straightforward enough to leave to a terminal emulator, as Eli has suggested. Lam-alif would be trickier - one cell or two? > But if you want to support > other “arabic like” scripts (like Syriac, N’ko), or even some LTR > complex scripts, like Myanmar or Khmer, this “solution” cannot work, > because no equivalent of “presentation form characters” exists for > these scripts I believe combining marks present issues even in implicit modes. In implicit mode, one cannot simply delegate the task to normal text rendering, for one has to allocate text to cells. There are a number of complications that spring to mind: 1) Some characters decompose to two characters that may otherwise lay claim to their own cells: U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2, 0654>. Do you intend that your scheme be usable by Unicode-compliant processes? 2) 2-part vowels, such as U+0D4A MALAYALAM VOWEL SIGN O, which canonically decomposes into a preceding combining mark U+0D46 MALAYALAM VOWEL SIGN E and following combining mark U+0D3E MALAYALAM VOWEL SIGN AA. 3) Similar 2-part vowels that do not decompose, such as U+17C4 KHMER VOWEL SIGN OO. OpenType layout decomposes that into a preceding 'U+17C1 KHMER VOWEL SIGN E' and the second part. 4) Indic conjuncts. (i) There are some conjuncts, such as Devanagari K.SSA, where a display as , is simply unacceptable. In some closely related scripts, this conjunct has the status of a character. (ii) In some scripts, e.g. Khmer, the virama-equivalent is not an acceptable alternative to form a consonant stack. Khmer could equally well have been encoded with a set of subscript consonants in the same manner as Tibetan. (iii) In some scripts, there are marks named as medial consonants which function in exactly the same way as <'virama', consonant>; it is silly to render them in entirely different manners. 5) Some non-spacing marks are spacing marks in some contexts. U+102F MYANMAR VOWEL SIGN U is probably the best known example. Richard.
Re: Encoding italic
Kent Karlsson wrote: > Yes, great. But as I've said, we've ALREADY got a > default-ignorable-in-display (if implemented right) > way of doing such things. > > And not only do we already have one, but it is also > standardised in multiple standards from different > standards institutions. See for instance "ISO/IEC 8613-6, > Information technology --- Open Document Architecture (ODA) > and Interchange Format: Character content architecture". I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but has the advantage of not costing me USD 179, and it looks very similar to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we are talking about: setting text display properties such as bold and italics by means of escape sequences. Can you explain how ISO 8613-6 differs from ISO 6429 for what we are doing, and if it does not, why we should not simply refer to the more familiar 6429? -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Encoding italic
Martin J. Dürst wrote: > Here's a little dirty secret about these tag characters: They were > placed in one of the astral planes explicitly to make sure they'd use > 4 bytes per tag character, and thus quite a few bytes for any actual > complete tags. Aha. That explains why SCSU had to be banished to the hut, right around the same time the Plane 14 language tags were deprecated. In SCSU, astral characters can be 1 byte just like BMP characters. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Proposal for BiDi in terminal emulators
On Wed, Jan 30, 2019 at 05:56:00PM +0200, Eli Zaretskii via Unicode wrote: > > - It doesn't do Arabic shaping. > > It doesn't do _any_ shaping. Complex script shaping is left to the > terminal, because it's impossible to do shaping in any reasonable way > without controlling the fonts being used and accessing the font > information, and this is not possible when you run on a terminal It's the inverse of the situation with RTL reordering. The interface between the program and the terminal is a character cell grid (really, a sequence of printables and \e-based codes, but that's a technical detail). The program (emacs in this case) can do arbitrary reordering of characters on the grid, it also has lots of information the terminal doesn't. For example, what are you going to do when there's a line longer than what fits on the screen? Emacs will cut and hide part of it; any attempts to reorder that paragraph by the terminal are outright broken as you don't _have_ the paragraph. Same for a popup window on the middle of the screen partially obscuring some text underneath. And if you argue "so make emacs print your new code to disable formatting", so do thousands of other programs that are less sophisticated than emacs. On the other hand, all that the program can output is a sequence of Unicode codepoints. These don't include shaping information, and are not supposed to. The shaping is explicitly meant to be done by the terminal, and it's the terminal who's equipped with _most_ of the needed data (it might lack context just outside screen's end or under an overlapped window, but that's not specific to complex shaping -- same can happen for the other half of a CJK character). You know if the font used supports shaping, you can have access to a graphic view (as opposed to the array of codepoints) -- heck, it's only you who know the text is rendered on a screen rather than a Braille device. And if you miss an opportunity to shape something, the result is still readable to the user, merely not as good as it could be. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Re: Proposal for BiDi in terminal emulators
> Date: Wed, 30 Jan 2019 15:49:34 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > I outline in the document problems that arise from the terminal > emulator performing shaping on its contents in "explicit" mode, which > is to be used by Emacs and others. The terminal emulator is not aware > of the characters that are chopped off at the edge of the screen, > required for shaping. The terminal emulator is not aware of which > characters happen to be placed next to each other, but belong to > semantically different UI elements, that is, shouldn't be shaped. > > (And as a side note, FriBidi doesn't provide a method for doing > shaping on _visual_ order. I'm unsure about other libraries, and > unsure if there's an algorithm for it at all.) > > Honestly, I have no idea how to best address all these problems at > once. This is where we can think of extensions "expliti mode level 2", > use control characters that explicitly specify how to shape certain > glyphs. This is subject to further research. Personally, I think we should simply assume that complex script shaping is left to the terminal, and if the terminal cannot do that, then that's a restriction of working on a text terminal. There's nothing you can do here, because correct shaping requires too many features that applications running on text terminals cannot use, and OTOH a terminal emulator who wants to perform shaping needs information from the application (like the directionality of the text and its language) that there's no way for the application to provide.
Re: Proposal for BiDi in terminal emulators
> Date: Wed, 30 Jan 2019 15:25:32 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > > ╒═══╤══╕ > > │ filename1 │ 123 │ > > │ FILENAME2 │ 17 │ > > └───┴──┘ > > > > I'm afraid there's no good way to do BiDi without support from individual > > programs. > > In this particular example, when the output consists of RTL text in > logical order (the emitter does not reorder the characters to their > visual order, nor emit any BiDi controls), combined with line drawing > and such, there is hardly anything we could do purely on the terminal > emulator's side. I think the application could use TAB characters to get to the next cell, then simplistic reordering would also work. But in general, yes: this is one of the examples why sophisticated text-editing applications cannot leave this to the terminal. (Another example is handling mouse clicks, if the terminal supports that.)
Re: Proposal for BiDi in terminal emulators
> Date: Wed, 30 Jan 2019 15:07:22 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > Another possible approach is to leave the terminal doing BiDi, but > embed all the text fragments in FSI...PDI blocks. Does anyone know of a terminal emulator which supports isolates?
Re: Proposal for BiDi in terminal emulators
> From: Egmont Koblinger > Date: Wed, 30 Jan 2019 14:36:42 +0100 > Cc: unicode@unicode.org > > - GNU Emacs reshuffles the characters according to the BiDi algorithm, > expecting that the terminal emulator doesn't do any BiDi. Yes, users are told to disable bidi reordering of the terminal, if the terminal supports that. > - It doesn't do Arabic shaping. It doesn't do _any_ shaping. Complex script shaping is left to the terminal, because it's impossible to do shaping in any reasonable way without controlling the fonts being used and accessing the font information, and this is not possible when you run on a terminal -- the user configures the terminal emulator, and the emulator chooses the fonts it likes/needs according to that configuration. (Emacs does support complex script shaping on GUI displays.) > - When it comes to visually wrapping a line because it doesn't fit in > the current width, Emacs goes its own way which doesn't match what the > Unicode BiDi algorithm says. Yes, this deviation is documented in the Emacs manuals. The reason for that is that the Emacs implementation of the UBA reorders characters on the fly (i.e., it implements a function that can be called character by character, and returns the next character in the visual order). This was done due to a special structure of the Emacs display engine and efficiency considerations. In practice, this problem happens very rarely.
Re: Proposal for BiDi in terminal emulators
> A formatted table is pretty unsuitable for automated processing, and > obviously meant for human display. Could you please clarify how exactly that data looks like? Maybe a tiny hexdump of an example? Is the RTL piece of text already stored in visual order, that is, beginning with the leftmost (last logical) letter of the word? If so then you can sure display it properly in BiDi-unaware rendering engines (including most terminal emulators currently, as well as in "explicit" mode according to my specification). That is, whoever produces that data reverses that word for you? Or is the RTL piece of text still in its logical order? Then in what piece of software does this formatted data show up to you in a readable way? > You're a terminal emulator maintainer, thus it's natural for you to think > it's the right place to come up with a solution. No. I've been a maintainer/developer/contributor to all kinds of software, including (but not limited to) terminal emulators, apps running inside terminal emulators, or a pretty complex RTL homepage. I'm doing my best in looking at the entire ecosystem, and coming up with a good BiDi-aware interface between terminal emulators and applications. > I'd argue that it's not -- > all a terminal emulator can do is to display already formatted text, there's > no sane way to move things around. You missed that your use case with this table is not the only possible use case. There are others where the terminal needs to do BiDi. My work aims to address multiple use cases at once, yours being one of them. cheers, egmont
Re: Proposal for BiDi in terminal emulators
On Wed, Jan 30, 2019 at 03:43:10PM +0100, Egmont Koblinger via Unicode wrote: > On Wed, Jan 30, 2019 at 3:32 PM Adam Borowski wrote: > > > > > ╒═══╤══╕ > > > > │ filename1 │ 123 │ > > > > │ FILENAME2 │ 17 │ > > > > └───┴──┘ > > > That's possible only if the program in question is running directly attached > > to the tty. That's not an option if the output is redirected. Frames in > > a plain text file are a perfectly rational, and pretty widespread, use -- > > and your proposal will break them all. Be it "cat" to the screen, "less" or > > even "mutt" if the text was sent via a mail. > > I'd argue that if you have such a data stored in a file, with logical > order used in Arabic or Hebrew text, combined with line drawing chars > as you showed, then your data is broken to begin with – broken in the > sense that it's suitable for automated processing (*), but not for > display. A formatted table is pretty unsuitable for automated processing, and obviously meant for human display. > (*) but then line drawing chars are not really a nice choice over CSV, > JSON, whatever. That's why you use CSV and JSON for machine-readable, plain text for humans, and XML for neither. > The only possible choice is for some display engine to be aware that > line drawing characters are part of a "higher level protocol", and > BiDi should be applied only in the lower scope. I don't think the > terminal emulator is the right place to make such decisions At this point, required information is lost. Any transformations such as RTL reordering needs to be done earlier, when you still see _unformatted_ version of the data. You're a terminal emulator maintainer, thus it's natural for you to think it's the right place to come up with a solution. I'd argue that it's not -- all a terminal emulator can do is to display already formatted text, there's no sane way to move things around. Any changes need to be localized -- for example, you can do ligatures only if you keep total length unchanged. Ie, the terminal emulator is the right layer for things like complex script shaping, but not RTL reordering. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Re: Proposal for BiDi in terminal emulators
Hi Frédéric, > I guess Arabic shaping is doable through presentation form characters, > because the latter are character inherited from legacy standards using > them in such solutions. But if you want to support other “arabic like” > scripts (like Syriac, N’ko), or even some LTR complex scripts, like > Myanmar or Khmer, this “solution” cannot work, because no equivalent of > “presentation form characters” exists for these scripts Unfortunately my knowledge ends here, I'm not familiar with shaping for Syriac and other similar scripts. I'd really appreciate input from experts here. I outline in the document problems that arise from the terminal emulator performing shaping on its contents in "explicit" mode, which is to be used by Emacs and others. The terminal emulator is not aware of the characters that are chopped off at the edge of the screen, required for shaping. The terminal emulator is not aware of which characters happen to be placed next to each other, but belong to semantically different UI elements, that is, shouldn't be shaped. (And as a side note, FriBidi doesn't provide a method for doing shaping on _visual_ order. I'm unsure about other libraries, and unsure if there's an algorithm for it at all.) Honestly, I have no idea how to best address all these problems at once. This is where we can think of extensions "expliti mode level 2", use control characters that explicitly specify how to shape certain glyphs. This is subject to further research. cheers, egmont
Re: Proposal for BiDi in terminal emulators
On Wed, Jan 30, 2019 at 3:32 PM Adam Borowski wrote: > > > ╒═══╤══╕ > > > │ filename1 │ 123 │ > > > │ FILENAME2 │ 17 │ > > > └───┴──┘ > That's possible only if the program in question is running directly attached > to the tty. That's not an option if the output is redirected. Frames in > a plain text file are a perfectly rational, and pretty widespread, use -- > and your proposal will break them all. Be it "cat" to the screen, "less" or > even "mutt" if the text was sent via a mail. I'd argue that if you have such a data stored in a file, with logical order used in Arabic or Hebrew text, combined with line drawing chars as you showed, then your data is broken to begin with – broken in the sense that it's suitable for automated processing (*), but not for display. I can't think of any utility that would display it properly, because that's not what the Unicode BiDi algorithm run over this data produces. (*) but then line drawing chars are not really a nice choice over CSV, JSON, whatever. The only possible choice is for some display engine to be aware that line drawing characters are part of a "higher level protocol", and BiDi should be applied only in the lower scope. I don't think the terminal emulator is the right place to make such decisions – I don't think any other generic tool (graphical word processor, browser etc.) does make such a call either. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit : - It doesn't do Arabic shaping. In my recommendation I'm arguing that in this mode, where shuffling the characters is the task of the text editor and not the terminal, so should it be for Arabic shaping using presentation form characters. I guess Arabic shaping is doable through presentation form characters, because the latter are character inherited from legacy standards using them in such solutions. But if you want to support other “arabic like” scripts (like Syriac, N’ko), or even some LTR complex scripts, like Myanmar or Khmer, this “solution” cannot work, because no equivalent of “presentation form characters” exists for these scripts
Re: Proposal for BiDi in terminal emulators
On Wed, Jan 30, 2019 at 03:25:32PM +0100, Egmont Koblinger via Unicode wrote: > One more note, to hopefully clarify: > > ╒═══╤══╕ > > │ filename1 │ 123 │ > > │ FILENAME2 │ 17 │ > > └───┴──┘ > > > > I'm afraid there's no good way to do BiDi without support from individual > > programs. > > In this particular example, when the output consists of RTL text in > logical order (the emitter does not reorder the characters to their > visual order, nor emit any BiDi controls), combined with line drawing > and such, there is hardly anything we could do purely on the terminal > emulator's side. > > I did not consider the possibility of certain characters (e.g. line > drawing ones) being "stop characters", and BiDi to get applied only in > runs of other characters. Any such magic would be arbitrary, fix a > subset of the cases while cause other unforeseen breakages elsewhere. > E.g. what if someone intentionally uses these characters as > letter-like ones in a BiDi text, like """here I'm talking about the > '└' shaped corner"""... or what if poor man's ASCII pipe and other > symbols are used... it's way too risky to go into any kind of > heuristics. > > In this particular case the terminal cannot magically fix the output > for you, you'll need to get the application fixed. That's possible only if the program in question is running directly attached to the tty. That's not an option if the output is redirected. Frames in a plain text file are a perfectly rational, and pretty widespread, use -- and your proposal will break them all. Be it "cat" to the screen, "less" or even "mutt" if the text was sent via a mail. I don't really see a possibility to do that on the terminal's side, any RTL reordering would need to be done by the program in question. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Re: Proposal for BiDi in terminal emulators
Hi Adam, One more note, to hopefully clarify: > ╒═══╤══╕ > │ filename1 │ 123 │ > │ FILENAME2 │ 17 │ > └───┴──┘ > > I'm afraid there's no good way to do BiDi without support from individual > programs. In this particular example, when the output consists of RTL text in logical order (the emitter does not reorder the characters to their visual order, nor emit any BiDi controls), combined with line drawing and such, there is hardly anything we could do purely on the terminal emulator's side. I did not consider the possibility of certain characters (e.g. line drawing ones) being "stop characters", and BiDi to get applied only in runs of other characters. Any such magic would be arbitrary, fix a subset of the cases while cause other unforeseen breakages elsewhere. E.g. what if someone intentionally uses these characters as letter-like ones in a BiDi text, like """here I'm talking about the '└' shaped corner"""... or what if poor man's ASCII pipe and other symbols are used... it's way too risky to go into any kind of heuristics. In this particular case the terminal cannot magically fix the output for you, you'll need to get the application fixed. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Adam, > Even a line is way too big a piece to be safely reordered by the terminal. > What you propose will break every full-screen program that uses line-drawing > characters: Certain terminal emulators already perform BiDi on their lines. They already break every full-screen program with line-drawing and such, as you pointed out. What my proposal adds, amongst plenty of other things, is a means to automatically disable the terminal's BiDi, rather than having to go to its settings. This way you can automate the fix of the apps that aren't explicitly fixed, e.g. via wrapper scripts, or terminfo entries with special ti/te definitions. > I'm afraid there's no good way to do BiDi without support from individual > programs. Depends on the use case. For complex apps, like text editors, you are right, the terminal emulator must stay out of the game. For simple utilities, like "cat" and friends, there's no way you can implement BiDi support in "cat" itself. Here the terminal needs to do it. Your use case with tables is perhaps somewhat in the middle. One possible approach for the emitting utility is to disable BiDi in the terminal (switch to "explicit" mode) for the scope of this output. Another possible approach is to leave the terminal doing BiDi, but embed all the text fragments in FSI...PDI blocks. (This latter is subject to a bit of further research, to be exactly specified in a forthcoming version of the specs.) What is extremely tough here is realizing that there are multiple conflicting requirements (including the example you gave), and coming up with a soluiton that satisfies the needs of all. This is what my work aims to do. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Eli, > My personal experience with bringing BiDi to Emacs led me to a firm > conclusion that BiDi support by terminal emulators cannot be relied on > by sophisticated text editing and display applications that are > BiDi-aware. The terminal emulator can never be smart enough to do > what the editing needs require, so the application eventually ends up > jumping through hoops in order to trick the terminal into doing TRT. > It is easier to tell users to disable BiDi support of the terminal (if > it even has one), and do everything in the app. This is the only way > of having full control of what is displayed, especially when > "higher-level protocols" need to be used to tailor the UBA to the need > of the user, because there's usually no way of asking the terminal to > apply a behavior which deviates from the UBA. We are absolutely on the same page here. As long as the use case is text editing or something similar, it's harmful if the terminal emulator aims to do any BiDi. Having to tell users to turn off BiDi in the emulator's settings is in my firm opinion a user experience no-go. It has to be automatic, happen under the hood, that is, using escape sequences. There's another side to the entire BiDi story, though. Simple utilities like "echo", "cat", "ls", "grep" and so on, line editing experience of your shell, these kinds. It's absolutely not feasible to add BiDi support to these utilities. Here the only viable approach is to have the terminal emulator do it. Hence, as I confirm ECMA TR/53's realization of 28 years ago, there have to be two substantially different modes. "Explicit" mode for what you need for Emacs: the terminal to stay out of the game; and "implicit" mode where the terminal performs BiDi for the sake of "cat" and other simple utiltiies. I'm also arguing that contrary to TR/53, there's no way to hook up a mode switch to "cat" and a gazillion of other similar tools. The only reaslisticly implementable approach is if the "implicit" mode is the default so that simple utilities provide a proper BiDi experience. Those very few fullscreen apps that do know what they are doing and do want the terminal to leave the characters at their designated place (such as Emacs, Vim etc.) will have to request this "explicit" mode from the terminal. cheers, egmont
Re: Proposal for BiDi in terminal emulators
Hi Eli, > > In turn, vim, emacs and friends stand there clueless, not knowing > > how to do BiDi in terminals. > > This is inaccurate: [...] I have to admit, I was somewhat sloppy in the phrasing of this announcement. My bad, apologies. Currently some terminal emulators shuffle the characters around for display purposes, while most don't. There's absolutely no way an editor (no matter if Emacs or any other) could produce the desired look on both kinds. I actually present a proof that an editor cannot always produce the desired look on ones that shuffle their contents around. So it's a somewhat reasonable expectation to produce the desired look on ones that don't shuffle their cells. In the document, more precisely at [1] I evalute my findings with GNU Emacs 25.2. (I've just fixed the page to add "GNU", thanks for pointing this out!) Brief summary: - GNU Emacs reshuffles the characters according to the BiDi algorithm, expecting that the terminal emulator doesn't do any BiDi. - According to my recommendation, in order to address BiDi in the entire ecosystem around terminal emulators, the default behavior will have to be that terminals shuffle the characters around. Don't worry, there'll be a mode where this shuffling doesn't occur. Emacs (and all other BiDi-aware text editors) will have to switch to this mode. - It doesn't do Arabic shaping. In my recommendation I'm arguing that in this mode, where shuffling the characters is the task of the text editor and not the terminal, so should it be for Arabic shaping using presentation form characters. - When it comes to visually wrapping a line because it doesn't fit in the current width, Emacs goes its own way which doesn't match what the Unicode BiDi algorithm says. I'm not saying Emacs's behavior is bad per se or unreasonable, and it's out of the scope of my work to try to get it changed, but I'm making a note that it's different. [1] https://terminal-wg.pages.freedesktop.org/bidi/prior-work/applications.html cheers, egmont
Re: Encoding italic
Yes, great. But as I've said, we've ALREADY got a default-ignorable-in-display (if implemented right) way of doing such things. And not only do we already have one, but it is also standardised in multiple standards from different standards institutions. See for instance "ISO/IEC 8613-6, Information technology --- Open Document Architecture (ODA) and Interchange Format: Character content architecture". (In a little experiment I found that it seems that Cygwin is one of the better implementations of this; B.t.w. I have no relation to Cygwin other than using it.) To boot, it's been around for decades and is still alive and well. I see absolutely no need for a "bold" new concept here; the one below is not better in any significant way. /Kent Karlsson Den 2019-01-29 23:35, skrev "Andrew West via Unicode" : > On Mon, 28 Jan 2019 at 01:55, James Kass via Unicode > wrote: >> >> This bold new concept was not mine. When I tested it >> here, I was using the tag encoding recommended by the developer. > > Congratulations James, you've successfully interchanged tag-styled > plain text over the internet with no adverse side effects. I copied > your email into BabelPad and your "bold" is shown bold (see attached > screenshot). > > Andrew
Re: Encoding italic
On Mon, 28 Jan 2019 at 01:55, James Kass via Unicode wrote: > > This bold new concept was not mine. When I tested it > here, I was using the tag encoding recommended by the developer. Congratulations James, you've successfully interchanged tag-styled plain text over the internet with no adverse side effects. I copied your email into BabelPad and your "bold" is shown bold (see attached screenshot). Andrew
Re: Encoding italic
Doug Ewell wrote, > I can't speak for Andrew, but I strongly suspect he implemented this as > a proof of concept, not to declare himself the Maker of Standards. BabelPad also offers plain-text styling via math-alpha conversion, although this feature isn’t newly added. Users interested in seeing how plain-text italics might work can try out the stateful approach using tags contrasted with the character-by-character approach using math-range italic letters. (Of course, the math-range stuff is already being interchanged on the WWW, whilst the tagging method does not yet appear to be widely supported.) A few miles upthread, ‘where are the third-party developers’ was asked. ‘Everywhere’ is the answer. Since third-party developers have to subsist on the crumbs dropped by the large corps, they tend to be responsive to user needs and requests.
Re: Encoding italic
On 2019-01-29 5:10 PM, Doug Ewell via Unicode wrote: I thought we had established that someone had mentioned it on this list, at some time during the past three weeks. Can someone look up what post that was? I don't have time to go through scores of messages, and there is no search facility. http://www.unicode.org/mail-arch/unicode-ml/y2019-m01/0209.html
Re: Ancient Greek apostrophe marking elision
On Mon, 28 Jan 2019 20:55:39 -0500 "Mark E. Shoulson via Unicode" wrote: > On 1/28/19 2:31 AM, Mark Davis ☕️ via Unicode wrote: > > > > But the question is how important those are in daily life. I'm not > > sure why the double-click selection behavior is so much more of a > > problem for Ancient Greek users than it is for the somewhat larger > > community of English users. Word selection is not normally as > > important an operation as line break, which does work as expected. > > This is a good point. Bottom line is that word-selection, at least, > is not going to be _exactly_ right. Oh, and for another example, > note that Esperanto also regularly (in poetry, anyway) uses a > word-final apostrophe (of some kind) to indicate elision of the final > -o of a nominative singular noun, or the -a of the article "la". > What shall we say to Esperantists who can't correctly the third word > in «al la mond’ eterne militanta / Ĝi promesas sanktan harmonion»? I > guess "Suck it up and deal with it." And that may indeed be the > answer. Who's going to punish them for using U+02BC? I found some documentation of an Ancient Greek spell-checker for OpenOffice. It listed problem with the apostrophe as one of its shortcomings. Richard.
Re: Ancient Greek apostrophe marking elision
On Mon, 28 Jan 2019 21:10:19 -0500 "Mark E. Shoulson via Unicode" wrote: > On 1/28/19 3:58 PM, Richard Wordingham via Unicode wrote: > > Interestingly, bringing this word breaker into line with TUS in the > > UK may well be in breach of the Equality Act 2010. > > > > Richard. > > OK, I've got to ask: how would that be? How would this impinge on > anyone's equality on the basis of "age, disability, gender > reassignment, marriage and civil partnership, pregnancy and > maternity, race, religion or belief, sex, and sexual orientation"? > (quote from WP) The most relevant clauses are 9(1), 9(4), 19(2), 29(5) and 29(7). The change would restrict Thais' access to the provision of a service. The service provided is to allow one to use a persistent, correctable spell-checking system for one's native language. Firefox and LibreOffice provide this service. Of course, one may have to supply the spell-checking databases oneself. Withdrawing this service for some ethnic groups would be breach of the law. By persistent, I means that corrections to the spell-checking remain when the text is revisited. For English plain-text, the easy correction is to remove false positives by adding the word to 'personal dictionaries'. The difficult correction, not always possible, is to remove the word from the spell-checker's word list. For scriptio continua scripts, line_break=complex_context in UCD terms, there is the additional problem that word-breaking is not infrequently wrong, even for Thai in Thai script. (Recent loanwords into Thai can be a nightmare. So is Pali in Thai script, though Pali spell-checking has its own issues.) Line-breaking can be corrected with WJ and ZWSP. At present, word-breaking can currently be corrected by inserting these characters, and then spelling can be negotiated - the visible characters are non-negotiable. The changes in the text will persist in plain text. If WJ ceases to be treated as joining words, then the service of a persistent, *correctable* spell-checking system is lost. Now, one defence to the denial of the service would be that it would be unreasonably difficult to allow users to solve the problem of word-breaks in the wrong place. However, if one is already providing that service, that defence cannot be applied to subsequently denying the service. Richard.
Re: Encoding italic
On Tue, 29 Jan 2019 at 10:25, Martin J. Dürst via Unicode wrote: > > The overall tag proposal had the desired effect: The original proposal > to hijack some unused bytes in UTF-8 was defeated, and the tags itself > were not actually used and therefore could be depreciated. And the tag characters (all except E0001) are now no longer deprecated. As flag tag sequences are now a thing (http://www.unicode.org/reports/tr51/#valid-emoji-tag-sequences), and are widely supported (including on Twitter), your and PV's objections to using tag characters for a plain text font styling protocol simply because they are tag characters carry zero weight. Andrew
Re: Proposal for BiDi in terminal emulators
On Tue, Jan 29, 2019 at 01:50:31PM +0100, Egmont Koblinger via Unicode wrote: > Terminal emulators are a powerful tool used by many people for various > tasks. Most terminal emulators' bugtracker has a request to add RTL / > BiDi support. [...] > Some terminal emulators decided to run the BiDi algorithm for display > purposes on its lines (rather than paragraphs, uh) Even a line is way too big a piece to be safely reordered by the terminal. What you propose will break every full-screen program that uses line-drawing characters: ╒═══╤══╕ │ filename1 │ 123 │ │ FILENAME2 │ 17 │ └───┴──┘ would become: ╒═══╤══╕ │ filename1 │ 123 │ │ 17 │ 2EMANELIF │ └───┴──┘ You can't even use character properties, because: +===+==+ | filename1 | 123 | | FILENAME2 | 17 | +---+--+ or even, probably more popular: filename1123 FILENAME2 17 I'm afraid there's no good way to do BiDi without support from individual programs. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Re: Proposal for BiDi in terminal emulators
> Date: Tue, 29 Jan 2019 13:50:31 +0100 > From: Egmont Koblinger via Unicode > > [1] https://terminal-wg.pages.freedesktop.org/bidi/ Interesting document, thanks for writing it. My personal experience with bringing BiDi to Emacs led me to a firm conclusion that BiDi support by terminal emulators cannot be relied on by sophisticated text editing and display applications that are BiDi-aware. The terminal emulator can never be smart enough to do what the editing needs require, so the application eventually ends up jumping through hoops in order to trick the terminal into doing TRT. It is easier to tell users to disable BiDi support of the terminal (if it even has one), and do everything in the app. This is the only way of having full control of what is displayed, especially when "higher-level protocols" need to be used to tailor the UBA to the need of the user, because there's usually no way of asking the terminal to apply a behavior which deviates from the UBA. (If needed, I can provide examples of subtle problems with using BiDi support of a terminal in BiDi-aware editing. Not sure this will be interesting to too many readers of this forum, though.)
Re: Encoding italic
Martin J. Dürst wrote: > Here's a little dirty secret about these tag characters: They were > placed in one of the astral planes explicitly to make sure they'd use > 4 bytes per tag character, and thus quite a few bytes for any actual > complete tags. See https://tools.ietf.org/html/rfc2482 for details. > Note that RFC 2482 has been obsoleted by > https://tools.ietf.org/html/rfc6082, in parallel with a similar motion > on the Unicode side. I don't recall anyone mentioning Plane 14 language tags per se in this thread. The tag characters themselves were un-deprecated to support emoji flag sequences. But more on language tags in a moment. > These tag characters were born only to shoot down an even worse > proposal, https://tools.ietf.org/html/draft-ietf-acap-mlsf-01. For > some additional background, please see > https://tools.ietf.org/html/draft-ietf-acap-langtag-00. > > The overall tag proposal had the desired effect: The original proposal > to hijack some unused bytes in UTF-8 was defeated, and the tags itself > were not actually used and therefore could be depreciated. I agree that the ACAP proposal was awful, for many reasons and on many levels. But in general, introducing a new standardized mechanism SO THAT it can be deprecated is a crummy idea. It engenders bad feelings and distrust among loyal users of the standard. Major software vendors, one in particular starting with M, have been castigated for decades for employing tactics similar to this. > Bad ideas turn up once every 10 or 20 years. It usually takes some > time for some of the people to realize that they are bad ideas. But > that doesn't make them any better when they turn up again. The suggestions over the past three weeks to encode basic styling in plain text (I'm not saying I'm for or against that) have some similarities with Plane 14 language tags: many people consider both types of information to be meta-information, unsuitable for plain text, and many of the suggested mechanisms are stateful, which is an anti-goal of Unicode. But these are NOT the same idea, and the fact that they both use Plane 14 tag characters doesn't make them so. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Encoding italic
Kent Karlsson wrote: > We already have a well-established standard for doing this kind of > things... I thought we were having this discussion because none of the existing methods, no matter how well documented, has been accepted on a widespread basis as "the" standard. Some people dislike markdown because it looks like lightweight markup (which it is), not like actual italics and boldface. Some dislike ISO 6429 because escape characters are invisible and might interfere with other protocols (though they really shouldn't). Some dislike math alphanumerics abuse because it's abuse, doesn't cover other writing systems, etc. I'd be happy to work with Kent to campaign for ISO 6429 as "the" well-established standard for applying simple styling to plain text, but we would have to acknowledge the significant challenges. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Encoding italic
Philippe Verdy replied to James Kass: > You're not very explicit about the Tag encoding you use for these > styles. Of course, it was Andrew West who implemented the styling mechanism in a beta release of BabelPad. James was just reporting on it. > And what is then the interest compared to standard HTML This entire discussion, for more than three weeks now, has been about how to implement styling (e.g. italics) in plain text. Everyone knows it can be done, and how to do it, in rich text. > So you used "bold U+E003E> I.e, you converted from ASCII to tag characters the full HTML > sequences "" and "", including the HTML element name. I see > little interest for that approach. I thought we had established that someone had mentioned it on this list, at some time during the past three weeks. Can someone look up what post that was? I don't have time to go through scores of messages, and there is no search facility. I can't speak for Andrew, but I strongly suspect he implemented this as a proof of concept, not to declare himself the Maker of Standards. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Proposal for BiDi in terminal emulators
> Date: Tue, 29 Jan 2019 13:50:31 +0100 > From: Egmont Koblinger via Unicode > > In turn, vim, emacs and friends stand there clueless, not knowing > how to do BiDi in terminals. This is inaccurate: Emacs (at least the brand known as "GNU Emacs") supports bidirectional editing in text terminals and GUI displays alike, since Emacs 24.1, which was released in June 2012. The latest released version 26.1 supports all the latest changes in the UBA, up to and including Unicode 10.0. What Emacs version did you try which led you to the conclusion that bidirectional editing and display in Emacs were not supported on text terminals?
Proposal for BiDi in terminal emulators
Hi, Terminal emulators are a powerful tool used by many people for various tasks. Most terminal emulators' bugtracker has a request to add RTL / BiDi support. Unicode has supported BiDi for about 20 years now. Still, the intersection of these two fields isn't solved. Even some Unicode experts have stated over time that no one knows how to do it properly. The only documentation I could find (ECMA TR/53) predates the Unicode BiDi algorithm, and as such no surprise that it doesn't follow the current state of the art or best practices. Some terminal emulators decided to run the BiDi algorithm for display purposes on its lines (rather than paragraphs, uh), not seeing the big picture that such a behavior turns them into a platform on top of which it's literally impossible to implement proper BiDi-aware text editing (vim, emacs, whatever) experience. In turn, vim, emacs and friends stand there clueless, not knowing how to do BiDi in terminals. With about 5 years of experience in terminal emulator development, and some prior BiDi homepage developing experience with the kind mentoring of one of the BiDi gurus (Aharon, if you're reading this, hi there!), I decided to tackle this issue. I studied and evaluated the aforementioned documentation and the behavior of such terminals, pointed out the problems, and came up with a draft proposal. My work isn't complete yet. One of the most important pending issues is to figure out how to track BiDi control characters (e.g. which character cells they belong to), it is to be addressed in a subsequent version. But I sincerely hope I managed to get the basics right and clean enough so that work can begin on implementing proper support in terminal emulators as well as fullscreen text applications; and as we gain experience and feedback, extending the spec to address the missing bits too. You can find this (draft) specification at [1]. Feedback is welcome – if it's an actionable one then preferably over there in the project's bugtracker. [1] https://terminal-wg.pages.freedesktop.org/bidi/ cheers, egmont (GNOME Terminal / VTE co-developer)
Re: Encoding italic
On 2019/01/28 05:03, James Kass via Unicode wrote: > > A new beta of BabelPad has been released which enables input, storing, > and display of italics, bold, strikethrough, and underline in plain-text > using the tag characters method described earlier in this thread. This > enhancement is described in the release notes linked on this download page: > > http://www.babelstone.co.uk/Software/index.html > I didn't say anything at the time this idea first came up, because I hoped people would understand that it was a bad idea. Here's a little dirty secret about these tag characters: They were placed in one of the astral planes explicitly to make sure they'd use 4 bytes per tag character, and thus quite a few bytes for any actual complete tags. See https://tools.ietf.org/html/rfc2482 for details. Note that RFC 2482 has been obsoleted by https://tools.ietf.org/html/rfc6082, in parallel with a similar motion on the Unicode side. These tag characters were born only to shoot down an even worse proposal, https://tools.ietf.org/html/draft-ietf-acap-mlsf-01. For some additional background, please see https://tools.ietf.org/html/draft-ietf-acap-langtag-00. The overall tag proposal had the desired effect: The original proposal to hijack some unused bytes in UTF-8 was defeated, and the tags itself were not actually used and therefore could be depreciated. Bad ideas turn up once every 10 or 20 years. It usually takes some time for some of the people to realize that they are bad ideas. But that doesn't make them any better when they turn up again. Regards, Martin.
Re: Encoding italic
On 2019/01/24 23:49, Andrew West via Unicode wrote: > On Thu, 24 Jan 2019 at 13:59, James Kass via Unicode > wrote: > We were told time and time again when emoji were first proposed that > they were required for encoding for interoperability with Japanese > telecoms whose usage had spilled over to the internet. At that time > there was no suggestion that encoding emoji was anything other than a > one-off solution to a specific problem with PUA usage by different > vendors, and I at least had no idea that emoji encoding would become a > constant stream with an annual quota of 60+ fast-tracked > user-suggested novelties. Maybe that was the hidden agenda, and I was > just naïve. I don't think this was a hidden agenda. Nobody in the US or Europe thought that emoji would catch on like they did, with ordinary people and the press. Of course they had been popular in Japan, that's why the got into Unicode. > The ESC and UTC do an appallingly bad job at regulating emoji, and I > would like to see the Emoji Subcommittee disbanded, and decisions on > new emoji taken away from the UTC, and handed over to a consortium or > committee of vendors who would be given a dedicated vendor-use emoji > plane to play with (kinda like a PUA plane with pre-assigned > characters with algorithmic names [VENDOR-ASSIGNED EMOJI X] which > the vendors can then associate with glyphs as they see fit; and as > emoji seem to evolve over time they would be free to modify and > reassign glyphs as they like because the Unicode Standard would not > define the meaning or glyph for any characters in this plane). To a small extent, that already happens. The example I'm thinking about is the transition from a (potentially bullet-carrying) pistol to a waterpistol. The Unicode consortium doesn't define the meaning of any of it's characters, and doesn't define stardard glyphs for characters, just example glyphs. Another example is a presenter at a conference who was using lots of emoji saying that he will need to redo his presentation because the vendor of his notebook's OS was in the process of changing their emoji designs. Regards,Martin.
Re: Encoding italic
2019-1-25 13:46, Garth Wallace via Unicode wrote: > > On Wed, Jan 23, 2019 at 1:27 AM James Kass via Unicode < > unicode@unicode.org> wrote: > >> >> Nobody has really addressed Andrew West's suggestion about using the tag >> characters. >> >> It seems conformant, unobtrusive, requiring no official sanction, and >> could be supported by third-partiers in the absence of corporate >> interest if deemed desirable. >> >> One argument against it might be: Whoa, that's just HTML. Why not just >> use HTML? SMH >> >> One argument for it might be: Whoa, that's just HTML! Most everybody >> already knows about HTML, so a simple subset of HTML would be >> recognizable. >> >> After revisiting the concept, it does seem elegant and workable. It >> would provide support for elements of writing in plain-text for anyone >> desiring it, enabling essential (or frivolous) preservation of >> editorial/authorial intentions in plain-text. >> >> Am I missing something? (Please be kind if replying.) >> > > There is also RFC 1896 "enriched text", which is an attempt at a > lightweight HTML substitute for styling in email. But these, and the ANSI > escape code suggestion, seem like they're trying to solve the wrong problem > here. > > Here's how I understand the situation: > * Some people using forms of text or mostly-text communication that do not > provide styling features want to use styling, for emphasis or personal flair > * Some of these people caught on to the existence of the "styled" > mathematical alphanumerics and, not caring that this is "wrong", started > using them as a workaround > * The use of these symbols, which are not technically equivalent to basic > Latin, make posts inaccessible to screen readers, among other problems > > These are suggestions for Unicode to provide a different, more > "acceptable" workaround for a lack of functionality in these social media > systems (this mostly seems to be an issue with Twitter; IME this shows up > much less on Facebook). But the root problem isn't the kludge, it's the > lack of functionality in these systems: if Twitter etc. simply implemented > some styling on their own, the whole thing would be a moot point. > Essentially, this is trying to add features to Twitter without waiting for > their development team. > > Interoperability is not an issue, since in modern computers copying and > pasting styled text between apps works just fine. > How about outside social media system? For example, Chinese Braille have symbols that indicate the start and end position of proper name mark and book name mark punctuation, however when converted to plain text they cannot be displayed with Unicode text because of the mindset that it should be the task of styling software to render this punctuation, just because the two punctuations are basically straight underline and wavy underline beneath text in normal Chinese text. >
Re: Encoding italic
Gmail can do *Märchen* although I am not too sure about how they transmit such formatting and not sure about how interoperatable are they. 在 2019年1月22日週二 14:43,Adam Borowski via Unicode 寫道: > On Mon, Jan 21, 2019 at 12:29:42AM -0800, David Starner via Unicode wrote: > > On Sun, Jan 20, 2019 at 11:53 PM James Kass via Unicode > > wrote: > > > Even though /we/ know how to do > > > it and have software installed to help us do it. > > > > You're emailing from Gmail, which has support for italics in email. > > ... and how exactly can they send italics in an e-mail? All they can do is > to bundle a web page as an attachment, which some clients display instead > of > the main text. > > The e-mail's body text supports anything Unicode does, including > 𝑖𝑡𝑎𝑙𝑖𝑐 and > even 𐌏𐌋𐌃 𐌉𐌕𐌀𐌋𐌉𐌂, but, remarkably, not italic umlauted characters, > thai nor > han. >
Re: Ancient Greek apostrophe marking elision
On Mon, Jan 28, 2019 at 10:58 PM James Kass via Unicode wrote: > > On 2019-01-29 1:55 AM, Mark E. Shoulson via Unicode wrote: > > I guess "Suck it up and deal with it." And that may indeed be the > answer. > > It would certainly make for shorter and simpler FAQ pages, anyway. > Except people will just respond with "okay, I'll use U+02BC instead" which is what started all this :-) James -- *James Tauber* Eldarion <https://eldarion.com/> | jktauber.com (Greek Linguistics) <https://jktauber.com/> | Modelling Music <https://modelling-music.com/> | Digital Tolkien <https://digitaltolkien.com/>