Re: [Pharo-dev] TextConverter is broken

Nicolas Cellier Wed, 02 Mar 2016 15:58:59 -0800

2016-03-02 23:16 GMT+01:00 Henrik Sperre Johansen <
henrik.s.johan...@veloxit.no>:


> Not sure I'd say Squeak's (5.0 at least) MacRoman conversion is free of
> bugs
> either, at least the "legacy" ByteTextConverter subclass in Pharo passes
> the
> following:
>
> "U+0152, Latin capital ligature OE is codepoint 16rCE in mac-roman"
> ((Character value: 16r0152) asString convertToEncoding: 'mac-roman') first
> charCode = 16rCE.
>
> "Codepoint 170 in MacRoman is TM sign, U+2122"
> ((Character value: 170) asString convertFromEncoding: 'mac-roman') first
> charCode = 16r2122.
>
> Cheers,
> Henry
>
>
>
Yes, you're right, it's because squeak tables did and still use CP1252
instead of ISO8859-L1 and thus do not match unicode. That might have made
sense when porting from mac to windows while keeping ByteString, but at
least since the switch to unicode that's bogus. I guess it's still here
because some in image fonts would support cp1252 but I am too tired to
check it now...

My mistake is that character unicode 216 -> MacRoman was already false in
Pharo 1.1.
It was false because Pharo picked a a bogus table manually crafted from the
internet pages (from Sophie project?)

Then Sven did correct the table by automagically decoding the url...
But this didn't correct anything because the
initializeLatin1MapAndEncodings was never invoked (it was missing already
in Pharo 1.1).
Unfortunately, those maps are a speed-up cache and will mask the correction
of table if not updated.

In Squeak, initializeLatin1MapAndEncodings was called from class side
initialization right from the beginning, but this was forgotten during the
port to Pharo, that would be interesting to know why...

Ah yes, lazy initialization made it work without the need for class
initialization, but that was a one shot gun, not robust to further table
changes, that's the drawback of being lazy.

So, most probably code was too complex and this is enough to explain the
mistakes.
Why was it too complex?
Because it was an optimization for speed (fast scanning of bytes NOT
NEEDING ANY conversion).
And the initialization was too much convoluted because it was reusing
convoluted multilngual API.
My feeling is that it's an effect of the "least possible change that could
possibly extend functionality".

For me, it's never enough to say "old converters were broken".
There's allways to learn from one mistake and that's why I'm asking.
My feeling is that Pharo guys allways sprint and never look behind.
This is at the risk of repeating some mistake...



>
> --
> View this message in context:
> http://forum.world.st/TextConverter-is-broken-tp4882039p4882095.html
> Sent from the Pharo Smalltalk Developers mailing list archive at
> Nabble.com.
>
>

Re: [Pharo-dev] TextConverter is broken

Reply via email to