Benjamin Stuhl writes:
: --- Bart Schuller <[EMAIL PROTECTED]> wrote:
: > Larry knew what he was doing when he decided on utf8.
: 
: It has also led to the perl5 internals being, to put it
: bluntly, a horrible mess. And forget about the regex
: engine.

That's a vast oversimplification.  It has very little to do with
choosing utf8 over utf16.  The internals were already a mess, from the
standpoint of not using vtables, and the standpoint of assuming that
characters are 8 bits.  It's those lingering assumptions that infest
the regex optimizer.  (And the fact that we didn't actually *finish*
the utf8 support for 5.6.0.)

: Perhaps if it was designed in from the beginning things
: would be better, but this is something that needs serious 
: discussion.

Certainly, but I still think that utf8 must be supported as the default
string datatype--at least in any Perl east of the Pacific Ocean.  We
can of course support polymorphically support utf16 and utf32 as
well--the language abstraction is such that there's little problem with
that.  The only places we should have to worry about it is in the
interfaces to the outside world.  As far as Perl is concerned (these
days), a string is a sequence of integers.  And utf8 supports that
rather more smoothly than utf16!

What you have to realize is that going to utf16 does not solve your
variable-width character problems.  I consider it a requirement that
Perl handle plane 1 characters as smoothly as it handles plane 0
characters.  To my American mind, the only value of utf16 is that it is
a poorly compressed form of utf32.  People from Japan may differ, of
course.  :-)  But as long as we can allow for those differences in the
design through string polymorphism and lazy conversion, we can
hopefully make everyone happy.

Whether or not strings appear to be objects in Perl, they will
certainly need vtables in perl.

Larry

Reply via email to