On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote:

> > As for character encodings, we're forcing everything to UTF-32 in
> > regular expressions.  No exceptions.  If you use a string in a regex,
> > it'll be transcoded.  I honestly can't think of a better way to
> > guarantee efficient string indexing.
> 
> I'm fine with that.  The bloat is of course a shame, but as long as
> that's not a real problem for someone, let's not worry about it too
> much.

Forcing everything to UTF-32 in the API?
Or just forcing everything to UTF-32 until perl 6.0 is released, as trying
to do UTF-8 (and UTF-16 ...) regexps now is premature optimisation?

To me it seems that making UTF-32 do everything correctly which the real
world can use while encoding optimised versions are written is better than
having a snazzy 4 encoding autoswitcher that is wrong and therefore not
releasable to the world.

But I don't know about how the internals of all these things work, so I
may well be wrong on any technical detail.


Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

Reply via email to