Hi Brian,
Am 07.08.2017 um 21:07 schrieb Brian Inglis:
...
Implementation considerations for handling the Unicode tables described in
http://www.unicode.org/versions/Unicode10.0.0/ch05.pdf
and implemented in
https://www.strchr.com/multi-stage_tables
ICU icu4[cj] uses a folded trie of the properties, where the unique property
combinations are indexed, strings of those indices are generated for fixed size
groups of character codes, unique values of those strings are then indexed, and
those indices assigned to each character code group. The result is a multi-level
indexing operation that returns the required property combination for each
character.
https://slidegur.com/doc/4172411/folded-trie--efficient-data-structure-for-all-of-unicode
The FOX Toolkit uses a similar approach, splitting the 21 bit character code
into 7 bit groups, with two higher levels of 7 bit indices, and more tweaks to
eliminate redundancy.
ftp://ftp.fox-toolkit.org/pub/FOX_Unicode_Tables.pdf
Thanks for the interesting links, I'll chech them out.
But such multi-level tables don't really help without a given procedure
how to update them (that's only available for the lowest level, not for
the code-embedded levels).
Also, as I've demonstrated, my more straight-forward and more efficient
approach will even use less total space than the multi-level approach if
packed table entries are used.
Thomas
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple