Re: Inverse of /\p{script}/

Owen Taylor Fri, 29 Aug 2003 14:18:21 +0000

On Fri, 2003-08-29 at 03:07, Nick Ing-Simmons wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:
> >On Thu, Aug 28, 2003 at 03:16:20PM +0100, [EMAIL PROTECTED] wrote:
> >> 
> >> Does the existing perl5.8.* Unicode support have a way to efficently 
> >> determine which script(s) or block (in unicode sense) a code point belongs
> >> to?
> >
> >     use Unicode::UCD qw(charscript charblock);
> >     print charscript(0x0388);
> >     print charblock (0x30a0);
> 
> Great.
> 
> 
> >
> >> It seems to make sense to have a hash which maps script names to 
> >> probable (font) encodings 
> >> 
> >>  (Hiragana | Katakana | Han) => 'jisx0208.1990-0'
> >>  (Greek)                     => 'iso8859-7',  
> >
> >I dunno about script->font mappings...
> 
> That is Tk's (i.e. my) problem.
> XFree86 has the font encodings bundled so I think I can pre-analysze 
> them.


You might want to look at what we did for Pango - see 
pango/modules/basic/tables-big.i in
ftp://ftp.gtk.org/pub/gtk/v2.2/pango-1.2.5.tar.gz.

There is a big map there that for each Unicode codepoint lists
possible encodings with a moderately clever encoding scheme to save
memory. Then based on the current language tag (either from 
the program or from the current locale setting), there is an order
in which to try encodings.

We're dropping support for this code and for core X fonts
in the next release of Pango, but if you find it useful, feel
free to borrow the techniques, tables, generation tools, 
or table lookup code and use it under whatever license you
want.

Regards,
                                        Owen

Re: Inverse of /\p{script}/

Reply via email to