Re: UCM file and combining character sequences

SADAHIRO Tomoyuki Mon, 22 Sep 2003 08:46:36 -0700

> Hank Tt <[EMAIL PROTECTED]> writes:
> >Hi,
> >
> >I'm trying to make a UCM file to feed to enc2xs.  The legacy encoding for
> >Taiwanese romanization *must* have its code points mapped to Unicode
> >character sequences, for the simple reason that the UCS lacks the
> >corresponding precomposed characters (and is unlikely to have them in the
> >future, as they are composable using existing characters from the Latin
> >script and the Diacritical Combining Marks blocks).  (See [1] for script
> >details.)
> 
> Are the Unicode character sequences in [1] normalized?
> Can you explain what the diacritics mean I assume '`^ etc. are tone marks?
> What do the macron and dot and dots-below signify?


Apparently POJ system uses ten vowels
(a, e, i, m, ng, o, o dot above, u, u diaeresis below) and
five tone marks (acute, grave, circumflex, macron, vertical bar).

However, <dot above> (U+0307) and <acute> (U+0301) has the same
combining class (230: above), <o + acute + dot above> is
not canonically equivalent to <o + dot above + acute>.
If <o dot above> is a vowel and acute is a tone mark, their
combination <LATIN SMALL LETTER O WITH DOT ABOVE AND ACUTE>
should be encoded as <o + dot above + acute>, I think.
Similarly <o + dot above + circumflex>, <o + dot above + grave>,
and <o + dot above + macron>.

SADAHIRO Tomoyuki

Re: UCM file and combining character sequences

Reply via email to