2016-03-02 23:16 GMT+01:00 Henrik Sperre Johansen < henrik.s.johan...@veloxit.no>:
> Not sure I'd say Squeak's (5.0 at least) MacRoman conversion is free of > bugs > either, at least the "legacy" ByteTextConverter subclass in Pharo passes > the > following: > > "U+0152, Latin capital ligature OE is codepoint 16rCE in mac-roman" > ((Character value: 16r0152) asString convertToEncoding: 'mac-roman') first > charCode = 16rCE. > > "Codepoint 170 in MacRoman is TM sign, U+2122" > ((Character value: 170) asString convertFromEncoding: 'mac-roman') first > charCode = 16r2122. > > Cheers, > Henry > > > Yes, you're right, it's because squeak tables did and still use CP1252 instead of ISO8859-L1 and thus do not match unicode. That might have made sense when porting from mac to windows while keeping ByteString, but at least since the switch to unicode that's bogus. I guess it's still here because some in image fonts would support cp1252 but I am too tired to check it now... My mistake is that character unicode 216 -> MacRoman was already false in Pharo 1.1. It was false because Pharo picked a a bogus table manually crafted from the internet pages (from Sophie project?) Then Sven did correct the table by automagically decoding the url... But this didn't correct anything because the initializeLatin1MapAndEncodings was never invoked (it was missing already in Pharo 1.1). Unfortunately, those maps are a speed-up cache and will mask the correction of table if not updated. In Squeak, initializeLatin1MapAndEncodings was called from class side initialization right from the beginning, but this was forgotten during the port to Pharo, that would be interesting to know why... Ah yes, lazy initialization made it work without the need for class initialization, but that was a one shot gun, not robust to further table changes, that's the drawback of being lazy. So, most probably code was too complex and this is enough to explain the mistakes. Why was it too complex? Because it was an optimization for speed (fast scanning of bytes NOT NEEDING ANY conversion). And the initialization was too much convoluted because it was reusing convoluted multilngual API. My feeling is that it's an effect of the "least possible change that could possibly extend functionality". For me, it's never enough to say "old converters were broken". There's allways to learn from one mistake and that's why I'm asking. My feeling is that Pharo guys allways sprint and never look behind. This is at the risk of repeating some mistake... > > -- > View this message in context: > http://forum.world.st/TextConverter-is-broken-tp4882039p4882095.html > Sent from the Pharo Smalltalk Developers mailing list archive at > Nabble.com. > >