Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:
> > Perl came from ASCII-centric roots, so it's likely that most of our
> > biases are ASCII-centric. And for a couple of reasons, it's going to
> > be hard to deal with that:
> >
> > 1. Backwards compatability with existing Perl practice,
> >
> > and
> >
> > 2. To do language-neutral right is -really- hard; look at locales and
> > Unicode as examples.
> >
> > As such, instead of trying to make Perl work for all languages out of
> > the box, why not make Perl's language handling extensible from within
> > the language and have it be as language-free as possible (except for
> > backwards compatability stuff) out of the box.
>
> Right on.
>
> > Examples of what we can do:
> >
> > I. Make ranges work on Unicode code-points (if they don't already).
>
> Urrrr, yes, they do, if you by code-point ranges mean \x{...}-\x{...}
> but in general I would like to discourage the use of ranges. What do
> you think [a-\N{KATAKANA LETTER KI}] should mean? I think it should
> mean a compile time error. People misuse ranges for classes. Ranges
> also imply some collation, which is, as discussed, really bad.
I think, following my line of thought, that [a-\N{KATAKANA LETTER KI}]
should be equivalent to [\x{0061}-\x{30AD}], which would match any of
the 12365 characters between \x{0061} and \x{30AD}. Admittedly, this
probably isn't that useful of a class, but it's what I see was asked
for.
Collation is something I hadn't considered. My initial thought would
be that by default, collation order would be code-point order, but
that should probably be able to be overridden.
Code-point order at least allows us to collate 'a' and KATAKANA LETTER
KI, which I can't think of any other sensible way to do it.
>
> --
> $jhi++; # http://www.iki.fi/jhi/
> # There is this special biologist word we use for 'stable'.
> # It is 'dead'. -- Jack Cohen