Re: More character matching bits

Buddha Buck Tue, 12 Jun 2001 20:11:23 -0700
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:

> > Perl came from ASCII-centric roots, so it's likely that most of our
> > biases are ASCII-centric.  And for a couple of reasons, it's going to
> > be hard to deal with that:
> > 
> > 1. Backwards compatability with existing Perl practice,
> > 
> > and
> > 
> > 2. To do language-neutral right is -really- hard; look at locales and
> > Unicode as examples.
> > 
> > As such, instead of trying to make Perl work for all languages out of
> > the box, why not make Perl's language handling extensible from within
> > the language and have it be as language-free as possible (except for
> > backwards compatability stuff) out of the box.
> 
> Right on.
> 
> > Examples of what we can do:
> > 
> > I. Make ranges work on Unicode code-points (if they don't already).
> 
> Urrrr, yes, they do, if you by code-point ranges mean \x{...}-\x{...}
> but in general I would like to discourage the use of ranges.  What do
> you think [a-\N{KATAKANA LETTER KI}] should mean?  I think it should
> mean a compile time error.  People misuse ranges for classes.  Ranges
> also imply some collation, which is, as discussed, really bad.

I think, following my line of thought, that [a-\N{KATAKANA LETTER KI}]
should be equivalent to [\x{0061}-\x{30AD}], which would match any of
the 12365 characters between \x{0061} and \x{30AD}.  Admittedly, this
probably isn't that useful of a class, but it's what I see was asked
for.

Collation is something I hadn't considered.  My initial thought would
be that by default, collation order would be code-point order, but
that should probably be able to be overridden.

Code-point order at least allows us to collate 'a' and KATAKANA LETTER
KI, which I can't think of any other sensible way to do it.
> 
> -- 
> $jhi++; # http://www.iki.fi/jhi/
>         # There is this special biologist word we use for 'stable'.
>         # It is 'dead'. -- Jack Cohen
Re: More character matching bits

Reply via email to