On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote: > > As for character encodings, we're forcing everything to UTF-32 in > > regular expressions. No exceptions. If you use a string in a regex, > > it'll be transcoded. I honestly can't think of a better way to > > guarantee efficient string indexing. > > I'm fine with that. The bloat is of course a shame, but as long as > that's not a real problem for someone, let's not worry about it too > much.
Forcing everything to UTF-32 in the API? Or just forcing everything to UTF-32 until perl 6.0 is released, as trying to do UTF-8 (and UTF-16 ...) regexps now is premature optimisation? To me it seems that making UTF-32 do everything correctly which the real world can use while encoding optimised versions are written is better than having a snazzy 4 encoding autoswitcher that is wrong and therefore not releasable to the world. But I don't know about how the internals of all these things work, so I may well be wrong on any technical detail. Nicholas Clark -- ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html