Consonant shifters and ZWNJ in Khmer
The section on consonant shifters in the Khmer section of the Unicode standard (page 647 of Unicode 11 [1]) isn’t entirely clear on where the zero width non-joiner should be placed to prevent a consonant shifter that’s followed by an above-base vowel from being changed to a below-base glyph. First, it says “U+200C zero width non-joiner should be inserted before the consonant shifter” to prevent the change. Then it continues “in such cases, U+200C zero width non-joiner is inserted before the vowel sign”, which could be interpreted as “after the consonant shifter”. Finally, the examples show ZWNJ inserted before the consonant shifter. The OpenType Khmer shaping description [2], on the other hand, expects ZWNJ to be inserted between the consonant shifter (here called RegShift) and the above-base vowel. Questions to the people here who have dealt with Khmer: How is this handled in real life? Thanks, Norbert [1] https://www.unicode.org/versions/Unicode11.0.0/ch16.pdf [2] https://docs.microsoft.com/en-us/typography/script-development/khmer
Re: metric for block coverage
> On Feb 18, 2018, at 3:26 , Khaled Hosny via Unicode > wrote: > > On Sun, Feb 18, 2018 at 02:14:46AM -0800, James Kass via Unicode wrote: >> Adam Borowski wrote, >> >>> I'm looking for a way to determine a font's coverage of available scripts. >>> It's probably reasonable to do this per Unicode block. Also, it's a safe >>> assumption that a font which doesn't know a codepoint can do no complex >>> shaping of such a glyph, thus looking at just codepoints should be adequate >>> for our purposes. >> >> You probably already know that basic script coverage information is >> stored internally in OpenType fonts in the OS/2 table. >> >> https://docs.microsoft.com/en-us/typography/opentype/spec/os2 >> >> Parsing the bits in the "ulUnicodeRange..." entries may be the >> simplest way to get basic script coverage info. > > Though this might not be very reliable since OpenType does not have a > definition of what it means for a Unicode block to be supported; some > font authoring tools use a percentage, others use the presence of any > characters in the range, and fonts might even provide incorrect data for > any reason. > > However, I don’t think script or block coverage is that useful, what > users are usually interested in is the language coverage. > > Regards, > Khaled All true. In addition, ulUnicodeRange ran out of bits around Unicode 5.1, so scripts/blocks added to Unicode after that, such as Javanese, Tangut, or Adlam, cannot be represented. Norbert
Re: Unicode education in Schools
ECMAScript 6 fixed that, largely along the lines of my proposal: http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html Norbert > On Aug 24, 2017, at 22:14 , Peter Constable via Unicode > wrote: > > I thought Javascript had a UCS-2 understanding of Unicode strings. Has it > managed to progress beyond that? > > > > > > Peter > > > > > > From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of David Starner > via Unicode > Sent: Thursday, August 24, 2017 5:18 PM > To: Unicode Mailing List > Subject: Fwd: Unicode education in Schools > > > > > > -- Forwarded message - > From: David Starner > Date: Thu, Aug 24, 2017, 6:16 PM > Subject: Re: Unicode education in Schools > To: Richard Wordingham > > > > > > On Thu, Aug 24, 2017, 5:26 PM Richard Wordingham via Unicode > wrote: > > Just steer them away from UTF-16! (And vigorously prohibit the very > concept of UCS-2). > > Richard. > > > > Steer them away from reinventing the wheel. If they use Java, use Java > strings. If they're using GTK, use strings compatible with GTK. If they're > writing JavaScript, use JavaScript strings. There's basically no system > without Unicode strings or that they would be better off rewriting the wheel. >