On 20/1/14 15:26, Dohyun Kim wrote:


I just have tested this kind of input string and the result is a
little disappointing:
Input string <U+1107,U+1109,U+1110,U+1161> does not rendered well. The
output of current (patched) harfbuzz with UnBatang font is
[uni1121=0+1024|uniD0C0=2+1024], the expected output being
[uniA972.xxxx|uni1161.xxxx]

IIRC, <U+1107,U+1109,U+1110> is *not* canonically equivalent to U+A972, even though it may be perfectly logical to spell the complex jamo as a sequence of simpler jamo letters.


The reason seems to be that we are currently applying "ccmp" opentype
feature too late. If "ccmp" feature could be applied before the
process of hangul shaper, the issue would disappear.

Currently, this example fails because the pair <U+1110,U+1161> gets composed to U+D0C0 during the preprocess_text function, and so by the time any OpenType features are applied, it's too late.

Fixing this is tricky within the current structure of the shaper, as the main hangul shaper function needs to run before we map the Unicode characters to glyphs, but the ccmp feature needs to run after the default char-to-glyph mapping has been done.

Is this actually important? Note that Windows behaves similarly, and so data that has "spelled-out" representations of complex jamos won't work there either. AIUI, the recommended practice is to use the precomposed Unicode characters such as U+A972 directly - and because these do *not* have decompositions, mixing the two forms will lead to confusion and problems for users. Perhaps it's better that the non-preferred spelling does not render "correctly".

JK

_______________________________________________
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to