On Sat, 2004-05-01 at 15:09, Jarkko Hietaniemi wrote: > > How are you defining "valid UTF-8"? Is there a codepoint in UTF-8 > > between \x00 and \xff that isn't valid? Is there a reason to ever do > > Like, half of them? \x80 .. \xff are all invalid as UTF-8.
Heh, damn Ken Thompson and his placemat! I am too new to UCS and UTF-8, and had thought it was always 8-bit. I stand corrected, having read up on the UTF-8 and Unicode FAQ. Jeff, yeah I have to take back my statement. If Perl defaults to UTF-8, then it's not a valid assumption that a UTF-8 input string won't throw an exception. I still think that's ok, and better than representation-expanding to the larger representation and doing the bit-op in that, since that means that bit-vectors would have to be valid in enum_stringrep_one, _two and _four as sort of alternate datastructures. I don't think we want to go there. For everything else, as Jeff correctly points out, this has nothing to do with encoding. Only in the sense that default encoding in a language like (only one example) Perl 6 dictates what representation you will have to expect to be the common case. > > bitwise operations on anything other than 8-bit codepoints? > > I am very confused. THIS IS WHAT WE ALL SEEM TO BE SAYING. BITOPS ONLY > ON EIGHT-BIT DATA. AM I WRONG? No, it's not, and could you please not get emotional about this? It's what you, Dan and I have been saying, but I was responding to Jeff who said: "Just FYI, the way I implemented bitwise-not so far, was to bitwise-not code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{FFFF} as uint16-sized things, and > 0x{FFFF} as uint32-sized things (but then bit-masking them with 0xFFFFF to make sure that they fell into a valid code point range)." It was kind of important that I deal with the fact that I was proposing a very different behavior for bit-shifting than exists currently for boolean operations, I thought. The question becomes should I CHANGE the existing bit-ops so that they don't work on representations in two or four bytes for symmetry? If this continues to be so contentious, I'm tempted to agree with the nay-sayers and say that Parrot shouldn't do bit-vectors on strings, and we should just implement a bit-vector class later on. Perl will just have to suffer the overhead of translation. This just IS NOT important enough to waste this many brain cells on. -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback
signature.asc
Description: This is a digitally signed message part