Self-comments:
----- Original Message -----From: "Soobok Lee" <[EMAIL PROTECTED]>
1) last resort: somewhat tricky
Future TAGALOG may provides two sets of TAGALOG basic alphabets.
One set A in official lexicographical ordering and the other set B
is in frequecy ordering (sub-optimal one OKAY) with 1:1 NFKC defined
from A onto B.
Then all tagalog basic alphabets in Set A will be "reordered by NFKC"
in nameprep , not in ACE. A and B share the same font.
Then valid ACE labels of TAGALOG script only contains characters from B.
This may have some problems in comparisons which i have no full analysis.
2) UTC solution
IF UTC accepts REORDERING as an official normalization form like
NF-REORDERING , then we need no such tricks like above, and
TAGALOG support can be done within that NF in the new
NAMEPREP steps: mapping/NFKC/PROHIBIT and then NF-REORDERING .
NF-REORDERING should work only for
newly-added SCRIPT BLOCK like TAGALOG.
NF-REORDERING requires separate
code points for reordered ones mapped from
basic alphabets in the
same TAGALOG script block like solutions
1).
The only difference between
solution 1) and 2) is whether the REORDERING is
in NFKC-trick or in Official
new NF.
With these two sets of basic
alphabets, old nameprep/ACE+REORDERING libaries
may encode the reordered
character set and get the compression transparently
without any code/data upgrades in
old applications.
If it uses the other un-reordered
character set, it will have
un-compressed
different lACE label which
is , of course, made to
be equivalent to the other
reordered ACE label by
zone-master's multiple-registrations provisions.
NF-REORDERING should be applied
before ACE-encoding and
should be
reversed after ACE-decoding for rendering for new added
SCRIPT.
Existing scripts could be reordered
in ACE as now.
Any suggestion
welcomed.
Soobok Lee
Frequency tables are always sub-optimal in its nature, and
marginal frequency fluctuation will occurs to make marginal
efficieny changes, but in most cases, i am sure, it will benefit
most of TAGALOG labels, and that's why i push REORDERING into UTC.
In conclusion, 2) is the my preferenece.
It's the best way for acquiring stability and authority and applicability
of REORDERING.
