Sadahiro Tomoyuki <[EMAIL PROTECTED]> writes: >> Are the Unicode character sequences in [1] normalized? >> Can you explain what the diacritics mean I assume '`^ etc. are tone marks? >> What do the macron and dot and dots-below signify? > >Apparently POJ system uses ten vowels >(a, e, i, m, ng, o, o dot above, u, u diaeresis below)
Wearing my speech-synthesis hat for a change, I would call m and ng nasals rather than vowels but distinction is a fine one. If anyone here knows what these would be in IPA phonetics please let me know off-list. The choice of "o dot above" is asking for trouble when composing glyphs, and presumably is why "diaeresis below" was used for the u variant rather than mainstream latin-1 one with it above. >and >five tone marks (acute, grave, circumflex, macron, vertical bar). > >However, <dot above> (U+0307) and <acute> (U+0301) has the same >combining class (230: above), <o + acute + dot above> is >not canonically equivalent to <o + dot above + acute>. >If <o dot above> is a vowel and acute is a tone mark, their >combination <LATIN SMALL LETTER O WITH DOT ABOVE AND ACUTE> >should be encoded as <o + dot above + acute>, I think. >Similarly <o + dot above + circumflex>, <o + dot above + grave>, >and <o + dot above + macron>. > >SADAHIRO Tomoyuki