Re: [Pharo-project] Who broke UTF8TextConverter?

Mariano Martinez Peck Tue, 28 Aug 2012 05:41:15 -0700

igor, maybe this is related to
http://code.google.com/p/pharo/issues/detail?id=6565
?



On Tue, Aug 28, 2012 at 2:35 PM, Igor Stasenko <siguc...@gmail.com> wrote:

> or it was like that from the birth??
>
> arrrrgghhhh...
>
> | stream |
> stream := WriteStream on: (ByteArray new: 100).
>
> UTF8TextConverter new nextPut: (Character value: 129 ) toStream: stream.
>
> stream contents
>  #[129]
>
> This is WRONG! RTFM, about utf8 encoding, please! :)
>
> ---------
>
> nextPut: aCharacter toStream: aStream
>         | leadingChar nBytes mask shift ucs2code |
>         aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream].
>
> in my case, stream is binary, so it goes directly to #storeBinaryOn:
>
> storeBinaryOn: aStream
>         "Store the receiver on a binary (file) stream"
>         value < 256
>                 ifTrue:[aStream basicNextPut: value]
>                 ifFalse:[aStream nextInt32Put: value].
>
> This is not even close to UTF8.
> If character code is less than 256, it will store a single byte (wtf?),
> and if more than that, it will store 32-bit integer value in
> big-endian order (wtf raisedToPower: 2)..
>
> i wonder , for what purpose we actually having this code path? this
> stuff is completely useless.
> according to implementation of storeBinaryOn:
> there's no way how you can read the same character value back.
> because it can be 1 byte or 4 bytes.. but you simply cannot determine
> which one.
> this is one of the reasons we using utf8 encoding, btw ;)
>
> --
> Best regards,
> Igor Stasenko.
>
>


-- 
Mariano
http://marianopeck.wordpress.com

Re: [Pharo-project] Who broke UTF8TextConverter?

Reply via email to