Re: [Pharo-dev] TextConverter is broken

Nicolas Cellier Wed, 02 Mar 2016 16:06:31 -0800

In other words Max, if you correctly invoke

    MacRomanTextConverter initializeLatin1MapAndEncodings.


Then the old converter retrieves a good health.
Now that you know that, you can remove the old converters and keep Zn
modernized ones ;)

2016-03-03 0:56 GMT+01:00 Nicolas Cellier <
nicolas.cellier.aka.n...@gmail.com>:

>
>
> 2016-03-02 23:16 GMT+01:00 Henrik Sperre Johansen <
> henrik.s.johan...@veloxit.no>:
>
>> Not sure I'd say Squeak's (5.0 at least) MacRoman conversion is free of
>> bugs
>> either, at least the "legacy" ByteTextConverter subclass in Pharo passes
>> the
>> following:
>>
>> "U+0152, Latin capital ligature OE is codepoint 16rCE in mac-roman"
>> ((Character value: 16r0152) asString convertToEncoding: 'mac-roman') first
>> charCode = 16rCE.
>>
>> "Codepoint 170 in MacRoman is TM sign, U+2122"
>> ((Character value: 170) asString convertFromEncoding: 'mac-roman') first
>> charCode = 16r2122.
>>
>> Cheers,
>> Henry
>>
>>
>>
> Yes, you're right, it's because squeak tables did and still use CP1252
> instead of ISO8859-L1 and thus do not match unicode. That might have made
> sense when porting from mac to windows while keeping ByteString, but at
> least since the switch to unicode that's bogus. I guess it's still here
> because some in image fonts would support cp1252 but I am too tired to
> check it now...
>
> My mistake is that character unicode 216 -> MacRoman was already false in
> Pharo 1.1.
> It was false because Pharo picked a a bogus table manually crafted from
> the internet pages (from Sophie project?)
>
> Then Sven did correct the table by automagically decoding the url...
> But this didn't correct anything because the
> initializeLatin1MapAndEncodings was never invoked (it was missing already
> in Pharo 1.1).
> Unfortunately, those maps are a speed-up cache and will mask the
> correction of table if not updated.
>
> In Squeak, initializeLatin1MapAndEncodings was called from class side
> initialization right from the beginning, but this was forgotten during the
> port to Pharo, that would be interesting to know why...
>
> Ah yes, lazy initialization made it work without the need for class
> initialization, but that was a one shot gun, not robust to further table
> changes, that's the drawback of being lazy.
>
> So, most probably code was too complex and this is enough to explain the
> mistakes.
> Why was it too complex?
> Because it was an optimization for speed (fast scanning of bytes NOT
> NEEDING ANY conversion).
> And the initialization was too much convoluted because it was reusing
> convoluted multilngual API.
> My feeling is that it's an effect of the "least possible change that could
> possibly extend functionality".
>
> For me, it's never enough to say "old converters were broken".
> There's allways to learn from one mistake and that's why I'm asking.
> My feeling is that Pharo guys allways sprint and never look behind.
> This is at the risk of repeating some mistake...
>
>
>
>>
>> --
>> View this message in context:
>> http://forum.world.st/TextConverter-is-broken-tp4882039p4882095.html
>> Sent from the Pharo Smalltalk Developers mailing list archive at
>> Nabble.com.
>>
>>
>

Re: [Pharo-dev] TextConverter is broken

Reply via email to