> Hank Tt <[EMAIL PROTECTED]> writes: > >Hi, > > > >I'm trying to make a UCM file to feed to enc2xs. The legacy encoding for > >Taiwanese romanization *must* have its code points mapped to Unicode > >character sequences, for the simple reason that the UCS lacks the > >corresponding precomposed characters (and is unlikely to have them in the > >future, as they are composable using existing characters from the Latin > >script and the Diacritical Combining Marks blocks). (See [1] for script > >details.) > > Are the Unicode character sequences in [1] normalized? > Can you explain what the diacritics mean I assume '`^ etc. are tone marks? > What do the macron and dot and dots-below signify?
Apparently POJ system uses ten vowels (a, e, i, m, ng, o, o dot above, u, u diaeresis below) and five tone marks (acute, grave, circumflex, macron, vertical bar). However, <dot above> (U+0307) and <acute> (U+0301) has the same combining class (230: above), <o + acute + dot above> is not canonically equivalent to <o + dot above + acute>. If <o dot above> is a vowel and acute is a tone mark, their combination <LATIN SMALL LETTER O WITH DOT ABOVE AND ACUTE> should be encoded as <o + dot above + acute>, I think. Similarly <o + dot above + circumflex>, <o + dot above + grave>, and <o + dot above + macron>. SADAHIRO Tomoyuki