Re: encoding neutral unpack

Ton Hospel Mon, 31 Jan 2005 12:53:16 -0800

In article <[EMAIL PROTECTED]>,
        Sam Vilain <[EMAIL PROTECTED]> writes:
> Ton Hospel wrote:
>> compatibility. So what if I leave "C" as old style "look through
>> the encoding", make all other formats (except "c" and "C") encoding 
>> neutral and introduce a new letter for encoding neutral "character",
>> let's say "E" (suggestions for a better letter welcome, a pity "u" is
>> already taken).
> 
> How is it possible to have an encoding neutral conversion from a
> character to its ordinal code?  If anything, I'd say that was *driven*
> by the encoding, no?
>


Basically if the string doesn't have the utf8 flag "W" will do:

val = *s++;

If it has utf8 set "W" will do:

val = utf8n_to_uvchr(s, end-s, &retlen, flags);
..some error detection..
s+= retlen;

This makes the result the same irrespective of if you upgrade or downgrade 
the packed string. Always processing "logical characters" is the general idea 
behind what I'm trying to do.

> I like the suggestion of keeping unpack("C*", ...) 8-bit clean.

It will actually make "C*" the least clean of the pack formats since the
result will depend on if you are upgraded or not, so to predict the
result you will have to keep track of an internal implementation detail...
Most users of "C" should probably switch to using "W" (I'll go with W
unless anyone can think of a better letter)

Re: encoding neutral unpack

Reply via email to