Re: More character matching bits

Dan Sugalski Fri, 15 Jun 2001 17:21:54 -0700
At 11:28 PM 6/15/2001 +0100, Simon Cozens wrote:
>On Fri, Jun 15, 2001 at 11:50:49AM -0400, Dan Sugalski wrote:
> > Unless I'm missing something (Simon? Hong?) Japanese (and potentially all
> > the languages that use the Han characters) can interpret a particular
> > character as either a number or not a number, depending on context.
>
>Uh, don't think so, no. The numerals are, ooh, let's see:
>U+4E00, U+4E8C, U+4E09, U+56DB, U+4E94, U+4E03, U+516B, U+5341, U+5343,
>U+4E07 and two more I can't find. The rest aren't (usually) treated as
>numbers, no. It's certainly not the case that a given character is both
>non-number and number.

The kanji dictionary I have handy gives non-numeric translations for 
several of the numeric kanji, though it might be something that gets lost 
in translation. Some of the examples for ya (and I don't have the Unicode 
set  handy to look up the code points, alas) are "vegetable store" (yaoya) 
and "afternoon refreshments" (oyatsu).

Ah, well, better to think it's possible and be shown wrong than to have it 
correct and not consider it. (I'm not nearly as dense as I seem sometimes. 
I hope...)

> > >module Locale::Hawaiian;
> > >use re 'class (\w => [aeiouâêîôûhklmnpw`])';
> > >...
> >
> > Sure. I expect Damian will write us something that lets you specify them
> > upside-down in Klingon or something by the time this is done. :)
>
>This is handy, but this means the regexp engine needs to be *VERY* dynamic
>at runtime.

Yep. The trick is to figure out how to do this without it being expensive. 
Handy trick, that one.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk
Re: More character matching bits

Reply via email to