Stephan Eggermont-3 wrote
> On 05-12-17 08:59, Peter Uhnák wrote:
>>  > In my case, it turned out to be a non-UTF8 encoded character in one 
>> of the commit messages.
>> 
>> I've ran into this problem in a sister project (tonel-migration), and do 
>> not have a proper resolution yet. I was forcing everything to be 
>> unicode, so I need a better way to read and write encoded strings. :<
> 
> To be exact, exactly none of the older commits will be UTF8 encoded. For 
> most it doesn't matter as they are ASCII, but if we want to have a 
> change of converting older french or german code (or japanese), we need 
> support for what was done with WideString. That probably needs a look in 
> the squeak mailing list archives.
> 
> Stephan

The mcz reader used to import the .bin file (which contained correctly
serialized WideStrings), only falling back to reading the .st file if .bin
was not present, has this changed?

Or do these tools explicitly ignore the .bin file and try to read the .st
file directly?
If so, the MCDataStream class used to read .bin format still seems to be in
the image...

One could also create a tool to check/convert all mcz in a repo as a
preprocess;
if .bin contents decode as WideString, 
check that .st starts with utf8 BOM, 
if not, convert.

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html

Reply via email to