Does the existing perl5.8.* Unicode support have a way to efficently
determine which script(s) or block (in unicode sense) a code point belongs
to?
In Unicode-aware Tk I am still doing battle with mechanism to select X11 font to display a particular codepoint (for now glossing over glyph vs character issues). The present code is still rather dumb.
That's what Encode::InCharset is for. Available via CPAN.
http://search.cpan.org/author/DANKOGAI/Encode-InCharset-0.03/
It seems to make sense to have a hash which maps script names to probable (font) encodings
(Hiragana | Katakana | Han) => 'jisx0208.1990-0'
The module makes it \p{InJIS0208} ...
(Greek) => 'iso8859-7',
And \p{InISO_8859_7}, respectively.
So give a (1 character) string how do I get Unicode script/block it is in?
One caveat, however. It is slightly out of sync w/ the latest Encode. You should stay away from vendor encodings that are thoroughly revised in Encode 1.75 -> 1.98 (FYI ENcode::InCharset is still based upon 1.75).
Dan the Encode Maintainer