Re: 0027, 02BC, 2019, or a new character?
On 2018/02/21 12:15, Michael Everson via Unicode wrote: I absolutely disagree. There’s a whole lot of related languages out there, and the speakers share some things in common. Orthographic harmonization between these languages can ONLY help any speaker of one to access information in any of the others. That expands people’s worlds. That would be a good goal. It's definitely a good goal. But it's not rocket science to learn the different orthographies. If the languages are similar, then different orthographies are just a minor nuisance. As an example, German and Dutch also have different orthographies, but that's really a very minor issue when learning one language from the other even though these languages are very close. Regards, Martin.
Re: Suggestions?
On 22.02.2018 05:01, David Starner via Unicode wrote: On Wed, Feb 21, 2018 at 7:55 AM Jeb Eldridge via Unicode wrote: Where can I post suggestions and feedback for Unicode? Here is as good as any place. There are specific places for a few specific things, but likely if you do have something thats likely to get changed, youll need the help of someone here to get through the process. It is a quarter-century old technical standard embedded in most electronics, so I would temper any expectations for major changes; it works the way it works because thats the way previous versions worked, and nobody is interested in the trouble changing them would involve. Yes and no. This list is for informal discussion, so someone unsure about things may start here, but posting on this list does not count as feedback or suggestions to Unicode. So by all means post here some of your ideas and understand more. Regards John Knightley Links: -- [1] mailto:unicode@unicode.org
Re: Bidi edge cases in Hangul and Indic
David, On 2/22/2018 7:21 PM, David Corbett via Unicode wrote: My confusion stems from Unicode’s online bidi utility. That bidi utility has known defects in it. It is not yet conformant with changes to UBA 6.3, let alone later changes to UBA. And the mapping of memory position to display position in that utility does not take into account complex mapping that has to occur in the layout engines and fonts in real applications. --Ken
Re: Bidi edge cases in Hangul and Indic
On Thu, Feb 22, 2018 at 6:32 PM, Ken Whistler wrote: > > If you override the normal left-to-right ordering with bidi override > controls, then the layout order is reversed, but what is actually laid out > is those two glyphs. So you just reverse the order of the two syllables for > display, in either case. > My confusion stems from Unicode’s online bidi utility. Compare https://unicode.org/cldr/utility/bidi.jsp?a=%E2%80%AE%EB%B3%B4%EA%B8%B0 (NFC) to https://unicode.org/cldr/utility/bidi.jsp?a=%E2%80%AE%E1%84% 87%E1%85%A9%E1%84%80%E1%85%B5 (NFD). Concatenating each one’s characters in reordered display position order produces canonically different results. Here is more practical example. A sequence of an emoji modifier base and an emoji modifier in an RTL run will be display-reordered such that the modifier is to left of the base. Clearly, the right thing is to not reorder them, because they should ligate to form a single glyph. Contrast this with “fl” in an RTL run, which will be display-reordered to “lf”: it would be wrong to apply the previous rationale here just because “fl” may have a single glyph. It sounds like the UBA doesn’t specify how to reorder the glyphs of the characters within a level run. That’s about what I expected. I was just worried it might require an easily implemented but wrong order, so thanks for the reassurance.
Re: Bidi edge cases in Hangul and Indic
On 2/22/2018 11:39 AM, David Corbett via Unicode wrote: For example, after a right-to-left override, the Hangul string 보기 (“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form is reordered by jamo instead of by syllable; that is, it looks like “igob”. Nope. *tilt* The UBA reorders the display order in layout -- not the underlying string. "bogi" is the sequence <1107, 1169, 1100, 1175> in NFD or in NFC. Because of canonical equivalence, for display of the NFD string, the sequence <1107,1169> needs to be mapped onto the same *glyph* as BCF4, and the sequence <1100,1175> onto the same *glyph* as AE30. If you override the normal left-to-right ordering with bidi override controls, then the layout order is reversed, but what is actually laid out is those two glyphs. So you just reverse the order of the two syllables for display, in either case. You could force display of "igob", but only if you had inserted some character in between the conjoining jamos that was preventing their equivalence to the syllables, anyway. I don’t think it is the intent of the algorithm that canonically equivalent strings display so very differently, but I can’t find any explicit guidance. What should a UBA-conformant renderer do? The right thing. ;-) --Ken
Bidi edge cases in Hangul and Indic
Although the Unicode Bidirectional Algorithm clearly defines how to reorder characters in memory, I don’t understand precisely what it means to display one character after another once they’ve been reordered; specifically, when bidi reordering changes the number of user-perceived characters. For example, after a right-to-left override, the Hangul string 보기 (“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form is reordered by jamo instead of by syllable; that is, it looks like “igob”. I don’t think it is the intent of the algorithm that canonically equivalent strings display so very differently, but I can’t find any explicit guidance. What should a UBA-conformant renderer do? Another unclear case is Indic clusters. षिक् is unambiguously two clusters, but after an RLO, and after following rule L3 to put combining marks after their bases, it looks like one cluster: क्षि. If Devanagari were actually written right-to-left, I would expect it to stay as two clusters: क्षि. Does the UBA prefer one rendering over the other, or is this outside its scope?
Re: Coloured Characters
Richard Wordingham wrote: > 'Foreground' and 'background' are the only externally defined colours. > There's no ability to explicitly choose, say 'text stroked sable and dotted > gules'. Instead, it's 'text stroked sable and dotted proper', with a choice > of palettes to define 'proper'. External selection of decoration colours would theroretically be possible, I do not know how difficult this would be to implement. I remember posting about that somewhere some years ago but I cannot find it at the moment. The following thread now mentions that possibility and also has, from 2014, an idea of how to have shading from one colour to another. https://forum.high-logic.com/viewtopic.php?f=37&t=5024 In that thread, on 7 June 2014, I wrote as follows. quote The standardization process has a rule that if someone (individual or company) puts forward a proposal for standardization, then that person has to agree to provide a working demonstration. I put forward some ideas for how to extend the COLR/CPAL model so as to provide colour shading of glyphs as well as the existing solid colour. Yet I could not formally propose them for standardization as I do not have the facilities to provide a working demonstration. end quote So the ideas are there and maybe they could be implemented, though alas I cannot implement them myself. William Overington Thursday 22 February 2018
AW: metric for block coverage
Thanks a lot. If I understand it right, these are examples in Sanskrit language using Tamil script? More precisely, my question is whether there are examples in (today's) Tamil language using Danda or Double Danda. I tried to detect these characters in Tamil's Wikipedia texts, but I didn't find some. Albrecht -Ursprüngliche Nachricht- Von: Unicode [mailto:unicode-boun...@unicode.org] Im Auftrag von Richard Wordingham via Unicode Gesendet: Dienstag, 20. Februar 2018 21:13 An: unicode@unicode.org Betreff: Re: metric for block coverage On Tue, 20 Feb 2018 15:13:16 + "Dreiheller, Albrecht via Unicode" wrote: > Could someone please supply an example (web link ...) for usage of > danda / double danda in Tamil? Thanks, Albrecht Take your pick from http://www.prapatti.com/slokas/slokasbyname.html . Do they meet your requirements, or do you perhaps want text in the Tamil language as opposed to PDFs of Sanskrit in Tamil script? I found the likes of my example by googling for 'Tamil Shloka' without quotes. Richard.
Re: Coloured Characters
On Thu, 22 Feb 2018 10:55:23 + (GMT) William_J_G Overington wrote: > Richard Wordingham wrote: > > > 'Foreground' and 'background' are the only externally defined > > colours. There's no ability to explicitly choose, say 'text stroked > > sable and dotted gules'. Instead, it's 'text stroked sable and > > dotted proper', with a choice of palettes to define 'proper'. > External selection of decoration colours would theroretically be > possible, I do not know how difficult this would be to implement. The problem lies in changing existing interfaces. I can only speak with any real knowledge for the OpenType COLR/CPAL method. The change would be a major pain in programming languages with obligatory (even if implicit) typing. At present, foreground and background need to be specified (if only be default) and passed into the painting routines. You now want to expand the foreground argument into a list of colours - or possibly a callback routine. The next issue is what is to happen when the list provided is too short. Without suitable handling, this may cause problems with fonts that already work in applications that at one interface level know nothing about colour fonts. For example, the HTML code that I have been using with my font knows nothing about colour fonts as such. To get colour with my web page, I just select a coloured font. The final issue that springs to mind is that the COLR table of OpenType allows for 65,535 different colours in glyphs; 0x is the only reserved colour ID. It represents the foreground colour. If there is only one palette in the font, 0xFFFE can be a legitimate user-defined colour ID. I wouldn't be surprised if such an assignment survived the transition from a proof-of-principle font to a released font. A less painful method for interfaces might be the selection of palettes by name. However, there are rather more possible colour combinations than can be accommodated in an sfnt name table, so an approximation algorithm would be required. It would also make the CPAL tables larger and much more difficult to generate. There are also 30 unassigned bits left in the palette's type attribute. Of course, Unicode is not constrained by what is currently available, and as an entity is interested at most in what is feasible rather than the precise mechanisms. Several full members, though, will care about precise mechanisms. Richard.
Re: IDC's versus Egyptian format controls
Martin J. Dürst wrote: > Is it only me or did you get some of this data wrong? Yes, sorry. There's an offset. I copy/pasted data from an archive which apparently predates the formal release of Ext C, and IIRC there was some shifting. Unfortunately the font I used to view the data matches the data, and so is also incorrect.