Hi, any news or experiences on the migration tools? I'd try again on next days. Martín
On Sat, Dec 16, 2017 at 5:06 AM, Stephane Ducasse <[email protected]> wrote: > It would be great to be able to sanitize the files. > We should get a test about tonel misbehavior. > > Stef > > On Thu, Dec 14, 2017 at 3:11 PM, Henrik Sperre Johansen > <[email protected]> wrote: > > Stephan Eggermont-3 wrote > >> On 05-12-17 08:59, Peter Uhnák wrote: > >>> > In my case, it turned out to be a non-UTF8 encoded character in one > >>> of the commit messages. > >>> > >>> I've ran into this problem in a sister project (tonel-migration), and > do > >>> not have a proper resolution yet. I was forcing everything to be > >>> unicode, so I need a better way to read and write encoded strings. :< > >> > >> To be exact, exactly none of the older commits will be UTF8 encoded. For > >> most it doesn't matter as they are ASCII, but if we want to have a > >> change of converting older french or german code (or japanese), we need > >> support for what was done with WideString. That probably needs a look in > >> the squeak mailing list archives. > >> > >> Stephan > > > > The mcz reader used to import the .bin file (which contained correctly > > serialized WideStrings), only falling back to reading the .st file if > .bin > > was not present, has this changed? > > > > Or do these tools explicitly ignore the .bin file and try to read the .st > > file directly? > > If so, the MCDataStream class used to read .bin format still seems to be > in > > the image... > > > > One could also create a tool to check/convert all mcz in a repo as a > > preprocess; > > if .bin contents decode as WideString, > > check that .st starts with utf8 BOM, > > if not, convert. > > > > Cheers, > > Henry > > > > > > > > -- > > Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837. > html > > > >
