At 01:42 PM 9/20/2001 -0700, Paul Prescod wrote:
>Dan Sugalski wrote:
> > >I think that the extra complexity of dealing with multiple character
> > >sets has more cost than benefit. What will chr(10203) return?
> >
> > The default character set's chr(10203). In which case it's no different
> > than chr(65), which isn't an A on EBCDIC platforms... :)
>
>Is it really a good idea for the meaning of your Perl program to change
>in this way between platforms?

Beats having ord('A') show as 65 on an EBCDIC platform.

If the code in question wants a default character set, that can be specified.

>In XML we tried hard not to do that. Java
>and JavaScript are also good about this. Python does not expose its
>default encoding machinery either (or did not last time I checked).
>
>It seems like just one more platform dependency that the programmer must
>be careful of.

Generally I don't think this'll be a big deal. Parrot will work as expected 
locally. Programs that require a particular set of character semantics 
should specify them. Bytecode-compiled programs will probably have the 
default forced to whatever was in place when they were compiled.

> >...
> > We're only going to do variable width for I/O, and only if the source or
> > destination are in a variable width format. The internal bits that need to
> > care will work on fixed-width representations.
>
>Then you'll pay the memory cost for Unicode up-front. I'd suggest you
>take advantage of the simplification you can get from using its
>character set also.

We only pay the cost if we use it. For most things I'm hoping we won't. 
It'll come in only if explicitly selected, or if mixed strings are dealt 
with. The former is fine, and the latter is likely rare.

>In principle I have nothing against a multi-character set system but I
>have a sense that the details are going to be extremely hairy and I'm
>afraid that maybe those details will bubble up to the programmer and
>make the usage model harder than on VMs that standardize (essentially
>every other VM in the world!).

That's one of the reasons we're making the character interface abstract and 
the interpreter set-agnostic.

I am also willing to put up with some hair if it means things go faster 
many places. I've my eye on China and Japan (amongst other places) as 
targets for parrot, and Unicode's not gonna cut it there.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to