It would be great to be able to sanitize the files.
We should get a test about tonel misbehavior.

Stef

On Thu, Dec 14, 2017 at 3:11 PM, Henrik Sperre Johansen
<[email protected]> wrote:
> Stephan Eggermont-3 wrote
>> On 05-12-17 08:59, Peter Uhnák wrote:
>>>  > In my case, it turned out to be a non-UTF8 encoded character in one
>>> of the commit messages.
>>>
>>> I've ran into this problem in a sister project (tonel-migration), and do
>>> not have a proper resolution yet. I was forcing everything to be
>>> unicode, so I need a better way to read and write encoded strings. :<
>>
>> To be exact, exactly none of the older commits will be UTF8 encoded. For
>> most it doesn't matter as they are ASCII, but if we want to have a
>> change of converting older french or german code (or japanese), we need
>> support for what was done with WideString. That probably needs a look in
>> the squeak mailing list archives.
>>
>> Stephan
>
> The mcz reader used to import the .bin file (which contained correctly
> serialized WideStrings), only falling back to reading the .st file if .bin
> was not present, has this changed?
>
> Or do these tools explicitly ignore the .bin file and try to read the .st
> file directly?
> If so, the MCDataStream class used to read .bin format still seems to be in
> the image...
>
> One could also create a tool to check/convert all mcz in a repo as a
> preprocess;
> if .bin contents decode as WideString,
> check that .st starts with utf8 BOM,
> if not, convert.
>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>

Reply via email to