Re: Unicode width data inconsistent/outdated

Thomas Wolff Mon, 07 Aug 2017 12:31:45 -0700

Hi Brian,

Am 07.08.2017 um 21:07 schrieb Brian Inglis:

...
Implementation considerations for handling the Unicode tables described in
        http://www.unicode.org/versions/Unicode10.0.0/ch05.pdf
and implemented in
        https://www.strchr.com/multi-stage_tables


ICU icu4[cj] uses a folded trie of the properties, where the unique property
combinations are indexed, strings of those indices are generated for fixed size
groups of character codes, unique values of those strings are then indexed, and
those indices assigned to each character code group. The result is a multi-level
indexing operation that returns the required property combination for each
character.

https://slidegur.com/doc/4172411/folded-trie--efficient-data-structure-for-all-of-unicode

The FOX Toolkit uses a similar approach, splitting the 21 bit character code
into 7 bit groups, with two higher levels of 7 bit indices, and more tweaks to
eliminate redundancy.

ftp://ftp.fox-toolkit.org/pub/FOX_Unicode_Tables.pdf

Thanks for the interesting links, I'll chech them out.

But such multi-level tables don't really help without a given procedurehow to update them (that's only available for the lowest level, not forthe code-embedded levels).Also, as I've demonstrated, my more straight-forward and more efficientapproach will even use less total space than the multi-level approach ifpacked table entries are used.

Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Re: Unicode width data inconsistent/outdated

Reply via email to