Consonant shifters and ZWNJ in Khmer

2018-07-20 Thread Norbert Lindenberg via Unicode
The section on consonant shifters in the Khmer section of the Unicode standard 
(page 647 of Unicode 11 [1]) isn’t entirely clear on where the zero width 
non-joiner should be placed to prevent a consonant shifter that’s followed by 
an above-base vowel from being changed to a below-base glyph.

First, it says “U+200C zero width non-joiner should be inserted before the 
consonant shifter” to prevent the change. Then it continues “in such cases, 
U+200C zero width non-joiner is inserted before the vowel sign”, which could be 
interpreted as “after the consonant shifter”. Finally, the examples show ZWNJ 
inserted before the consonant shifter.

The OpenType Khmer shaping description [2], on the other hand, expects ZWNJ to 
be inserted between the consonant shifter (here called RegShift) and the 
above-base vowel.

Questions to the people here who have dealt with Khmer: How is this handled in 
real life?

Thanks,
Norbert

[1] https://www.unicode.org/versions/Unicode11.0.0/ch16.pdf
[2] https://docs.microsoft.com/en-us/typography/script-development/khmer


Re: metric for block coverage

2018-02-23 Thread Norbert Lindenberg via Unicode

> On Feb 18, 2018, at 3:26 , Khaled Hosny via Unicode  
> wrote:
> 
> On Sun, Feb 18, 2018 at 02:14:46AM -0800, James Kass via Unicode wrote:
>> Adam Borowski wrote,
>> 
>>> I'm looking for a way to determine a font's coverage of available scripts.
>>> It's probably reasonable to do this per Unicode block.  Also, it's a safe
>>> assumption that a font which doesn't know a codepoint can do no complex
>>> shaping of such a glyph, thus looking at just codepoints should be adequate
>>> for our purposes.
>> 
>> You probably already know that basic script coverage information is
>> stored internally in OpenType fonts in the OS/2 table.
>> 
>> https://docs.microsoft.com/en-us/typography/opentype/spec/os2
>> 
>> Parsing the bits in the "ulUnicodeRange..." entries may be the
>> simplest way to get basic script coverage info.
> 
> Though this might not be very reliable since OpenType does not have a
> definition of what it means for a Unicode block to be supported; some
> font authoring tools use a percentage, others use the presence of any
> characters in the range, and fonts might even provide incorrect data for
> any reason.
> 
> However, I don’t think script or block coverage is that useful, what
> users are usually interested in is the language coverage.
> 
> Regards,
> Khaled


All true. In addition, ulUnicodeRange ran out of bits around Unicode 5.1, so 
scripts/blocks added to Unicode after that, such as Javanese, Tangut, or Adlam, 
cannot be represented. 

Norbert




Re: Unicode education in Schools

2017-08-26 Thread Norbert Lindenberg via Unicode
ECMAScript 6 fixed that, largely along the lines of my proposal:
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html

Norbert


> On Aug 24, 2017, at 22:14 , Peter Constable via Unicode  
> wrote:
> 
> I thought Javascript had a UCS-2 understanding of Unicode strings. Has it 
> managed to progress beyond that?
> 
>  
> 
>  
> 
> Peter
> 
>  
> 
>  
> 
> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of David Starner 
> via Unicode
> Sent: Thursday, August 24, 2017 5:18 PM
> To: Unicode Mailing List 
> Subject: Fwd: Unicode education in Schools
> 
>  
> 
>  
> 
> -- Forwarded message -
> From: David Starner 
> Date: Thu, Aug 24, 2017, 6:16 PM
> Subject: Re: Unicode education in Schools
> To: Richard Wordingham 
> 
>  
> 
>  
> 
> On Thu, Aug 24, 2017, 5:26 PM Richard Wordingham via Unicode 
>  wrote:
> 
> Just steer them away from UTF-16!  (And vigorously prohibit the very
> concept of UCS-2).
> 
> Richard.
> 
>  
> 
> Steer them away from reinventing the wheel. If they use Java, use Java 
> strings. If they're using GTK, use strings compatible with GTK. If they're 
> writing JavaScript, use JavaScript strings. There's basically no system 
> without Unicode strings or that they would be better off rewriting the wheel.
>