Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:
>On Mon, Dec 18, 2000 at 03:21:05PM +0000, Nick Ing-Simmons wrote:
>> Simon Cozens <[EMAIL PROTECTED]> writes:
>> >
>> >So, before we start even thinking about what we need, it's time to look at the
>> >vexed question of string representation. How do we do Unicode without getting
>> >into the horrendous non-Latin1 cockups we're seeing on p5p right now? 
>> 
>> Well - my theorist's answer is that everything is Unicode - like Java.
>
>That would be nice, yes.
>
>> As I pointed out on p5p even EBCDIC machines can use that model - but 
>> the downside is that ord('A') == 65 which will breaks backward compatibility 
>> with EBCDIC scripts. 
>
>Maybe we need $ENV{PERL_ENCODING} to control ord() and chr(), too?

That was my suggestion last week some time - though not stated as clearly!

>
>> Tagging a string with a repertoire and encoding is horrible - you are aware 
>
>Indeed.  We have had a very rough ride trying to get just two
>encodings to play well together, trying to support more simultaneously
>would be pure combinatorial masochism.  I say we should strive for
>converting everything to/from one agreed-upon internal encoding.  Yes,
>this is somewhat counter to the idea 'no preferred internal encoding'.
>After pondering about the issue I have come around to "Oh, yes, there
>should be one preferred internal encoding.", otherwise we banish
>ourselves to much nashing of the teeth.  Off-hand, I think it's only
>when there would be information loss when the One True Encoding
>conversion shouldn't be done.  What's the OTE, then?  Well, UTF-16 or
>UTF-32, I guess.  The redeeming features of UTF-8, that it is 1:1 for
>ASCII, and also compact for ASCII, frankly are getting rather thing in
>my eyes.

But not in mine (yet) - but then IO is just throwing gobs of bytes about
and regexps are introspecting. (And Encode has to handle variable-length
multi-byte gunk anyway.) 

-- 
Nick Ing-Simmons

Reply via email to