In other words Max, if you correctly invoke MacRomanTextConverter initializeLatin1MapAndEncodings.
Then the old converter retrieves a good health. Now that you know that, you can remove the old converters and keep Zn modernized ones ;) 2016-03-03 0:56 GMT+01:00 Nicolas Cellier < nicolas.cellier.aka.n...@gmail.com>: > > > 2016-03-02 23:16 GMT+01:00 Henrik Sperre Johansen < > henrik.s.johan...@veloxit.no>: > >> Not sure I'd say Squeak's (5.0 at least) MacRoman conversion is free of >> bugs >> either, at least the "legacy" ByteTextConverter subclass in Pharo passes >> the >> following: >> >> "U+0152, Latin capital ligature OE is codepoint 16rCE in mac-roman" >> ((Character value: 16r0152) asString convertToEncoding: 'mac-roman') first >> charCode = 16rCE. >> >> "Codepoint 170 in MacRoman is TM sign, U+2122" >> ((Character value: 170) asString convertFromEncoding: 'mac-roman') first >> charCode = 16r2122. >> >> Cheers, >> Henry >> >> >> > Yes, you're right, it's because squeak tables did and still use CP1252 > instead of ISO8859-L1 and thus do not match unicode. That might have made > sense when porting from mac to windows while keeping ByteString, but at > least since the switch to unicode that's bogus. I guess it's still here > because some in image fonts would support cp1252 but I am too tired to > check it now... > > My mistake is that character unicode 216 -> MacRoman was already false in > Pharo 1.1. > It was false because Pharo picked a a bogus table manually crafted from > the internet pages (from Sophie project?) > > Then Sven did correct the table by automagically decoding the url... > But this didn't correct anything because the > initializeLatin1MapAndEncodings was never invoked (it was missing already > in Pharo 1.1). > Unfortunately, those maps are a speed-up cache and will mask the > correction of table if not updated. > > In Squeak, initializeLatin1MapAndEncodings was called from class side > initialization right from the beginning, but this was forgotten during the > port to Pharo, that would be interesting to know why... > > Ah yes, lazy initialization made it work without the need for class > initialization, but that was a one shot gun, not robust to further table > changes, that's the drawback of being lazy. > > So, most probably code was too complex and this is enough to explain the > mistakes. > Why was it too complex? > Because it was an optimization for speed (fast scanning of bytes NOT > NEEDING ANY conversion). > And the initialization was too much convoluted because it was reusing > convoluted multilngual API. > My feeling is that it's an effect of the "least possible change that could > possibly extend functionality". > > For me, it's never enough to say "old converters were broken". > There's allways to learn from one mistake and that's why I'm asking. > My feeling is that Pharo guys allways sprint and never look behind. > This is at the risk of repeating some mistake... > > > >> >> -- >> View this message in context: >> http://forum.world.st/TextConverter-is-broken-tp4882039p4882095.html >> Sent from the Pharo Smalltalk Developers mailing list archive at >> Nabble.com. >> >> >