Re: [Pharo-dev] Moving from mc to tonel?

2018-01-25 Thread Peter Uhnák
Hi Martin,

there are some pull requests regarding encoding, I finally have a bit more
time so I'll look into that in the coming days.

Peter

On Thu, Jan 25, 2018 at 3:57 AM, Martin Dias  wrote:

> Hi, any news or experiences on the migration tools? I'd try again on next
> days.
> Martín
>
> On Sat, Dec 16, 2017 at 5:06 AM, Stephane Ducasse  > wrote:
>
>> It would be great to be able to sanitize the files.
>> We should get a test about tonel misbehavior.
>>
>> Stef
>>
>> On Thu, Dec 14, 2017 at 3:11 PM, Henrik Sperre Johansen
>>  wrote:
>> > Stephan Eggermont-3 wrote
>> >> On 05-12-17 08:59, Peter Uhnák wrote:
>> >>>  > In my case, it turned out to be a non-UTF8 encoded character in one
>> >>> of the commit messages.
>> >>>
>> >>> I've ran into this problem in a sister project (tonel-migration), and
>> do
>> >>> not have a proper resolution yet. I was forcing everything to be
>> >>> unicode, so I need a better way to read and write encoded strings. :<
>> >>
>> >> To be exact, exactly none of the older commits will be UTF8 encoded.
>> For
>> >> most it doesn't matter as they are ASCII, but if we want to have a
>> >> change of converting older french or german code (or japanese), we need
>> >> support for what was done with WideString. That probably needs a look
>> in
>> >> the squeak mailing list archives.
>> >>
>> >> Stephan
>> >
>> > The mcz reader used to import the .bin file (which contained correctly
>> > serialized WideStrings), only falling back to reading the .st file if
>> .bin
>> > was not present, has this changed?
>> >
>> > Or do these tools explicitly ignore the .bin file and try to read the
>> .st
>> > file directly?
>> > If so, the MCDataStream class used to read .bin format still seems to
>> be in
>> > the image...
>> >
>> > One could also create a tool to check/convert all mcz in a repo as a
>> > preprocess;
>> > if .bin contents decode as WideString,
>> > check that .st starts with utf8 BOM,
>> > if not, convert.
>> >
>> > Cheers,
>> > Henry
>> >
>> >
>> >
>> > --
>> > Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.ht
>> ml
>> >
>>
>>
>


Re: [Pharo-dev] Moving from mc to tonel?

2018-01-24 Thread Martin Dias
Hi, any news or experiences on the migration tools? I'd try again on next
days.
Martín

On Sat, Dec 16, 2017 at 5:06 AM, Stephane Ducasse 
wrote:

> It would be great to be able to sanitize the files.
> We should get a test about tonel misbehavior.
>
> Stef
>
> On Thu, Dec 14, 2017 at 3:11 PM, Henrik Sperre Johansen
>  wrote:
> > Stephan Eggermont-3 wrote
> >> On 05-12-17 08:59, Peter Uhnák wrote:
> >>>  > In my case, it turned out to be a non-UTF8 encoded character in one
> >>> of the commit messages.
> >>>
> >>> I've ran into this problem in a sister project (tonel-migration), and
> do
> >>> not have a proper resolution yet. I was forcing everything to be
> >>> unicode, so I need a better way to read and write encoded strings. :<
> >>
> >> To be exact, exactly none of the older commits will be UTF8 encoded. For
> >> most it doesn't matter as they are ASCII, but if we want to have a
> >> change of converting older french or german code (or japanese), we need
> >> support for what was done with WideString. That probably needs a look in
> >> the squeak mailing list archives.
> >>
> >> Stephan
> >
> > The mcz reader used to import the .bin file (which contained correctly
> > serialized WideStrings), only falling back to reading the .st file if
> .bin
> > was not present, has this changed?
> >
> > Or do these tools explicitly ignore the .bin file and try to read the .st
> > file directly?
> > If so, the MCDataStream class used to read .bin format still seems to be
> in
> > the image...
> >
> > One could also create a tool to check/convert all mcz in a repo as a
> > preprocess;
> > if .bin contents decode as WideString,
> > check that .st starts with utf8 BOM,
> > if not, convert.
> >
> > Cheers,
> > Henry
> >
> >
> >
> > --
> > Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.
> html
> >
>
>


Re: [Pharo-dev] Moving from mc to tonel?

2017-12-16 Thread Stephane Ducasse
It would be great to be able to sanitize the files.
We should get a test about tonel misbehavior.

Stef

On Thu, Dec 14, 2017 at 3:11 PM, Henrik Sperre Johansen
 wrote:
> Stephan Eggermont-3 wrote
>> On 05-12-17 08:59, Peter Uhnák wrote:
>>>  > In my case, it turned out to be a non-UTF8 encoded character in one
>>> of the commit messages.
>>>
>>> I've ran into this problem in a sister project (tonel-migration), and do
>>> not have a proper resolution yet. I was forcing everything to be
>>> unicode, so I need a better way to read and write encoded strings. :<
>>
>> To be exact, exactly none of the older commits will be UTF8 encoded. For
>> most it doesn't matter as they are ASCII, but if we want to have a
>> change of converting older french or german code (or japanese), we need
>> support for what was done with WideString. That probably needs a look in
>> the squeak mailing list archives.
>>
>> Stephan
>
> The mcz reader used to import the .bin file (which contained correctly
> serialized WideStrings), only falling back to reading the .st file if .bin
> was not present, has this changed?
>
> Or do these tools explicitly ignore the .bin file and try to read the .st
> file directly?
> If so, the MCDataStream class used to read .bin format still seems to be in
> the image...
>
> One could also create a tool to check/convert all mcz in a repo as a
> preprocess;
> if .bin contents decode as WideString,
> check that .st starts with utf8 BOM,
> if not, convert.
>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>



Re: [Pharo-dev] Moving from mc to tonel?

2017-12-14 Thread Henrik Sperre Johansen
Stephan Eggermont-3 wrote
> On 05-12-17 08:59, Peter Uhnák wrote:
>>  > In my case, it turned out to be a non-UTF8 encoded character in one 
>> of the commit messages.
>> 
>> I've ran into this problem in a sister project (tonel-migration), and do 
>> not have a proper resolution yet. I was forcing everything to be 
>> unicode, so I need a better way to read and write encoded strings. :<
> 
> To be exact, exactly none of the older commits will be UTF8 encoded. For 
> most it doesn't matter as they are ASCII, but if we want to have a 
> change of converting older french or german code (or japanese), we need 
> support for what was done with WideString. That probably needs a look in 
> the squeak mailing list archives.
> 
> Stephan

The mcz reader used to import the .bin file (which contained correctly
serialized WideStrings), only falling back to reading the .st file if .bin
was not present, has this changed?

Or do these tools explicitly ignore the .bin file and try to read the .st
file directly?
If so, the MCDataStream class used to read .bin format still seems to be in
the image...

One could also create a tool to check/convert all mcz in a repo as a
preprocess;
if .bin contents decode as WideString, 
check that .st starts with utf8 BOM, 
if not, convert.

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html



Re: [Pharo-dev] Moving from mc to tonel?

2017-12-13 Thread Nicolas Cellier
2017-12-13 11:09 GMT+01:00 stephan :

> On 05-12-17 08:59, Peter Uhnák wrote:
>
>>  > In my case, it turned out to be a non-UTF8 encoded character in one of
>> the commit messages.
>>
>> I've ran into this problem in a sister project (tonel-migration), and do
>> not have a proper resolution yet. I was forcing everything to be unicode,
>> so I need a better way to read and write encoded strings. :<
>>
>
> To be exact, exactly none of the older commits will be UTF8 encoded. For
> most it doesn't matter as they are ASCII, but if we want to have a change
> of converting older french or german code (or japanese), we need support
> for what was done with WideString. That probably needs a look in the squeak
> mailing list archives.
>
> Stephan
>
>
> The thread would be"MC should really write snaphsot/source.st in UTF8"
end of may 2013.
I've made the switch in Squeak short after (june 2013).
I've propsed a fix for Pharo (issues 10801 and 10811) that has been
integrated in 3.0 end of june 2013.

So the .mcz created before this might be problematic...

As I've just reported here, I think there are problems in tonel
(like if it was reading and writing with different encoding...)


Re: [Pharo-dev] Moving from mc to tonel?

2017-12-13 Thread stephan

On 05-12-17 08:59, Peter Uhnák wrote:
 > In my case, it turned out to be a non-UTF8 encoded character in one 
of the commit messages.


I've ran into this problem in a sister project (tonel-migration), and do 
not have a proper resolution yet. I was forcing everything to be 
unicode, so I need a better way to read and write encoded strings. :<


To be exact, exactly none of the older commits will be UTF8 encoded. For 
most it doesn't matter as they are ASCII, but if we want to have a 
change of converting older french or german code (or japanese), we need 
support for what was done with WideString. That probably needs a look in 
the squeak mailing list archives.


Stephan




Re: [Pharo-dev] Moving from mc to tonel?

2017-12-05 Thread Alistair Grant
Hi Peter,

On 5 December 2017 at 11:55, Peter Uhnák  wrote:
>> Would you be open to a PR that incorporates Sven's changes suggested
>> above?
>
> Certainly. One thing to keep in mind is that I had to modify
> MemoryStore/MemoryHandle so it can process unicode at all, but it also
> forces everything to be unicode.

This is probably a good thing.  The fast-import documentation says
that it is strict about character encoding and only accepts UTF-8.

I'll let you know when the PRs are ready.

Thanks,
Alistair


>> I hate having to deal with string encoding, and know as little as
>> possible. :-)
>
> Apparently me taking this approach resulted in the current situation. :-D
>
> Peter
>
> On Tue, Dec 5, 2017 at 10:27 AM, Alistair Grant 
> wrote:
>>
>> On Tue, Dec 05, 2017 at 08:56:53AM +0100, Sven Van Caekenberghe wrote:
>> >
>> > > On 5 Dec 2017, at 08:34, Alistair Grant  wrote:
>> > >
>> > > On 5 December 2017 at 03:41, Martin Dias  wrote:
>> > >> I suspect it's related to the large number of commits in my repo. I
>> > >> made
>> > >> some tweaks and succeeded to create the fast-import file. But I get:
>> > >>
>> > >> fatal: Unsupported command: .
>> > >> fast-import: dumping crash report to .git/fast_import_crash_10301
>> > >>
>> > >> Do you recognize this error?
>> > >> I will check my changes tweaking the git-migration tool to see if I
>> > >> modified
>> > >> some behavior my mistake...
>> > >
>> > > I had the same error just last night.
>> > >
>> > > In my case, it turned out to be a non-UTF8 encoded character in one of
>> > > the commit messages.
>> > >
>> > > I tracked it down by looking at the crash report and searching for a
>> > > nearby command.  I've deleted the crash reports now, but I think it
>> > > was the number associated with a mark command that got me near the
>> > > problem character in the fast-import file.
>> > >
>> > > I also modified the code to halt whenever it found a non-UTF8
>> > > character.  I'm sure there are better ways to do this, but:
>> > >
>> > >
>> > > GitMigrationCommitInfo>>inlineDataFor: aString
>> > >
>> > >| theString |
>> > >theString := aString.
>> > >"Ensure the message has valid UTF-8 encoding (this will raise an
>> > > error if it doesn't)"
>> > >[ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
>> > > asByteArray ]
>> > >on: Error
>> > >do: [ :ex | self halt: 'Illegal string encoding'.
>> > >theString := aString select: [ :each | each asciiValue
>> > > between: 32 and: 128 ] ].
>> > >^ 'data ' , theString size asString , String cr , (theString
>> > > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
>> >
>> > There is also ByteArray>>#utf8Decoded (as well as
>> > String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So
>> > you could write it shorter as:
>> >
>> >   aString asByteArray utf8Decoded
>> >
>> > Instead of Error you could use the more intention revealing
>> > ZnCharacterEncodingError.
>>
>> I agree, this was a hack to get me past the immediate issue and see if
>> I could get the migration to work.
>>
>>
>> > Apart from that, and I known you did not create this mess, encoding a
>> > String into a String, although it can be done, is so wrong. You encode
>> > a String (a collection of Characters, of Unicode code points) into a
>> > ByteArray and you decode a ByteArray into a String.
>> >
>> > Sending #asByteArray to a String in almost always wrong, as is sending
>> > #asString to a ByteArray. These are implicit conversions (null
>> > conversions, like ZnNullEncoder) that only work for pure ASCII or
>> > Latin1 (iso-8859-1), but not for the much richer set of Characters
>> > that Pharo supports. So these will eventually fail, one day, in a
>> > country far away.
>>
>> I hate having to deal with string encoding, and know as little as
>> possible. :-)
>>
>>
>>
>> On Tue, Dec 05, 2017 at 08:59:34AM +0100, Peter Uhn??k wrote:
>> > > In my case, it turned out to be a non-UTF8 encoded character in one of
>> > > the
>> > commit messages.
>> >
>> > I've ran into this problem in a sister project (tonel-migration), and do
>> > not
>> > have a proper resolution yet. I was forcing everything to be unicode, so
>> > I need
>> > a better way to read and write encoded strings. :<
>>
>> I think this is always going to be a tricky problem since the file may
>> simply be corrupt, and the resolution won't always be the same.  In my
>> case I was happy to simply remove the offending characters.
>>
>> I do think it would be nice if the migration tool at least stopped and
>> warned the user that the input is invalid.
>>
>> Would you be open to a PR that incorporates Sven's changes suggested
>> above?
>>
>>
>>
>> On Tue, Dec 05, 2017 at 09:05:24AM +0100, Sven Van Caekenberghe wrote:
>> >
>> > Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it
>> > is far from perfect.
>>
>> In this particular 

Re: [Pharo-dev] Moving from mc to tonel?

2017-12-05 Thread Peter Uhnák
> Would you be open to a PR that incorporates Sven's changes suggested
above?

Certainly. One thing to keep in mind is that I had to modify
MemoryStore/MemoryHandle so it can process unicode at all, but it also
forces everything to be unicode.

> I hate having to deal with string encoding, and know as little as possible.
:-)

Apparently me taking this approach resulted in the current situation. :-D

Peter

On Tue, Dec 5, 2017 at 10:27 AM, Alistair Grant 
wrote:

> On Tue, Dec 05, 2017 at 08:56:53AM +0100, Sven Van Caekenberghe wrote:
> >
> > > On 5 Dec 2017, at 08:34, Alistair Grant  wrote:
> > >
> > > On 5 December 2017 at 03:41, Martin Dias  wrote:
> > >> I suspect it's related to the large number of commits in my repo. I
> made
> > >> some tweaks and succeeded to create the fast-import file. But I get:
> > >>
> > >> fatal: Unsupported command: .
> > >> fast-import: dumping crash report to .git/fast_import_crash_10301
> > >>
> > >> Do you recognize this error?
> > >> I will check my changes tweaking the git-migration tool to see if I
> modified
> > >> some behavior my mistake...
> > >
> > > I had the same error just last night.
> > >
> > > In my case, it turned out to be a non-UTF8 encoded character in one of
> > > the commit messages.
> > >
> > > I tracked it down by looking at the crash report and searching for a
> > > nearby command.  I've deleted the crash reports now, but I think it
> > > was the number associated with a mark command that got me near the
> > > problem character in the fast-import file.
> > >
> > > I also modified the code to halt whenever it found a non-UTF8
> > > character.  I'm sure there are better ways to do this, but:
> > >
> > >
> > > GitMigrationCommitInfo>>inlineDataFor: aString
> > >
> > >| theString |
> > >theString := aString.
> > >"Ensure the message has valid UTF-8 encoding (this will raise an
> > > error if it doesn't)"
> > >[ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> > > asByteArray ]
> > >on: Error
> > >do: [ :ex | self halt: 'Illegal string encoding'.
> > >theString := aString select: [ :each | each asciiValue
> > > between: 32 and: 128 ] ].
> > >^ 'data ' , theString size asString , String cr , (theString
> > > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
> >
> > There is also ByteArray>>#utf8Decoded (as well as
> > String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So
> > you could write it shorter as:
> >
> >   aString asByteArray utf8Decoded
> >
> > Instead of Error you could use the more intention revealing
> ZnCharacterEncodingError.
>
> I agree, this was a hack to get me past the immediate issue and see if
> I could get the migration to work.
>
>
> > Apart from that, and I known you did not create this mess, encoding a
> > String into a String, although it can be done, is so wrong. You encode
> > a String (a collection of Characters, of Unicode code points) into a
> > ByteArray and you decode a ByteArray into a String.
> >
> > Sending #asByteArray to a String in almost always wrong, as is sending
> > #asString to a ByteArray. These are implicit conversions (null
> > conversions, like ZnNullEncoder) that only work for pure ASCII or
> > Latin1 (iso-8859-1), but not for the much richer set of Characters
> > that Pharo supports. So these will eventually fail, one day, in a
> > country far away.
>
> I hate having to deal with string encoding, and know as little as
> possible. :-)
>
>
>
> On Tue, Dec 05, 2017 at 08:59:34AM +0100, Peter Uhn??k wrote:
> > > In my case, it turned out to be a non-UTF8 encoded character in one of
> the
> > commit messages.
> >
> > I've ran into this problem in a sister project (tonel-migration), and do
> not
> > have a proper resolution yet. I was forcing everything to be unicode, so
> I need
> > a better way to read and write encoded strings. :<
>
> I think this is always going to be a tricky problem since the file may
> simply be corrupt, and the resolution won't always be the same.  In my
> case I was happy to simply remove the offending characters.
>
> I do think it would be nice if the migration tool at least stopped and
> warned the user that the input is invalid.
>
> Would you be open to a PR that incorporates Sven's changes suggested
> above?
>
>
>
> On Tue, Dec 05, 2017 at 09:05:24AM +0100, Sven Van Caekenberghe wrote:
> >
> > Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it
> > is far from perfect.
>
> In this particular case I think it is better to try decoding with UTF8,
> since #detectEncoding: isn't completely reliable and we just want to
> know if it isn't valid UTF8.
>
>
> Cheers,
> Alistair
>
>


Re: [Pharo-dev] Moving from mc to tonel?

2017-12-05 Thread Alistair Grant
On Tue, Dec 05, 2017 at 08:56:53AM +0100, Sven Van Caekenberghe wrote:
> 
> > On 5 Dec 2017, at 08:34, Alistair Grant  wrote:
> > 
> > On 5 December 2017 at 03:41, Martin Dias  wrote:
> >> I suspect it's related to the large number of commits in my repo. I made
> >> some tweaks and succeeded to create the fast-import file. But I get:
> >> 
> >> fatal: Unsupported command: .
> >> fast-import: dumping crash report to .git/fast_import_crash_10301
> >> 
> >> Do you recognize this error?
> >> I will check my changes tweaking the git-migration tool to see if I 
> >> modified
> >> some behavior my mistake...
> > 
> > I had the same error just last night.
> > 
> > In my case, it turned out to be a non-UTF8 encoded character in one of
> > the commit messages.
> > 
> > I tracked it down by looking at the crash report and searching for a
> > nearby command.  I've deleted the crash reports now, but I think it
> > was the number associated with a mark command that got me near the
> > problem character in the fast-import file.
> > 
> > I also modified the code to halt whenever it found a non-UTF8
> > character.  I'm sure there are better ways to do this, but:
> > 
> > 
> > GitMigrationCommitInfo>>inlineDataFor: aString
> > 
> >| theString |
> >theString := aString.
> >"Ensure the message has valid UTF-8 encoding (this will raise an
> > error if it doesn't)"
> >[ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> > asByteArray ]
> >on: Error
> >do: [ :ex | self halt: 'Illegal string encoding'.
> >theString := aString select: [ :each | each asciiValue
> > between: 32 and: 128 ] ].
> >^ 'data ' , theString size asString , String cr , (theString
> > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
> 
> There is also ByteArray>>#utf8Decoded (as well as 
> String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So 
> you could write it shorter as:
> 
>   aString asByteArray utf8Decoded
> 
> Instead of Error you could use the more intention revealing 
> ZnCharacterEncodingError.

I agree, this was a hack to get me past the immediate issue and see if 
I could get the migration to work.


> Apart from that, and I known you did not create this mess, encoding a 
> String into a String, although it can be done, is so wrong. You encode 
> a String (a collection of Characters, of Unicode code points) into a 
> ByteArray and you decode a ByteArray into a String.
> 
> Sending #asByteArray to a String in almost always wrong, as is sending 
> #asString to a ByteArray. These are implicit conversions (null 
> conversions, like ZnNullEncoder) that only work for pure ASCII or 
> Latin1 (iso-8859-1), but not for the much richer set of Characters 
> that Pharo supports. So these will eventually fail, one day, in a 
> country far away.

I hate having to deal with string encoding, and know as little as 
possible. :-)



On Tue, Dec 05, 2017 at 08:59:34AM +0100, Peter Uhn??k wrote:
> > In my case, it turned out to be a non-UTF8 encoded character in one of the
> commit messages.
> 
> I've ran into this problem in a sister project (tonel-migration), and do not
> have a proper resolution yet. I was forcing everything to be unicode, so I 
> need
> a better way to read and write encoded strings. :<

I think this is always going to be a tricky problem since the file may 
simply be corrupt, and the resolution won't always be the same.  In my 
case I was happy to simply remove the offending characters.

I do think it would be nice if the migration tool at least stopped and 
warned the user that the input is invalid.

Would you be open to a PR that incorporates Sven's changes suggested 
above?



On Tue, Dec 05, 2017 at 09:05:24AM +0100, Sven Van Caekenberghe wrote:
> 
> Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it 
> is far from perfect.

In this particular case I think it is better to try decoding with UTF8, 
since #detectEncoding: isn't completely reliable and we just want to 
know if it isn't valid UTF8.


Cheers,
Alistair



Re: [Pharo-dev] Moving from mc to tonel?

2017-12-05 Thread Sven Van Caekenberghe


> On 5 Dec 2017, at 08:59, Peter Uhnák  wrote:
> 
> > In my case, it turned out to be a non-UTF8 encoded character in one of the 
> > commit messages.
> 
> I've ran into this problem in a sister project (tonel-migration), and do not 
> have a proper resolution yet. I was forcing everything to be unicode, so I 
> need a better way to read and write encoded strings. :<

Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it is far 
from perfect.

> On Tue, Dec 5, 2017 at 8:56 AM, Sven Van Caekenberghe  wrote:
> 
> 
> > On 5 Dec 2017, at 08:34, Alistair Grant  wrote:
> >
> > On 5 December 2017 at 03:41, Martin Dias  wrote:
> >> I suspect it's related to the large number of commits in my repo. I made
> >> some tweaks and succeeded to create the fast-import file. But I get:
> >>
> >> fatal: Unsupported command: .
> >> fast-import: dumping crash report to .git/fast_import_crash_10301
> >>
> >> Do you recognize this error?
> >> I will check my changes tweaking the git-migration tool to see if I 
> >> modified
> >> some behavior my mistake...
> >
> > I had the same error just last night.
> >
> > In my case, it turned out to be a non-UTF8 encoded character in one of
> > the commit messages.
> >
> > I tracked it down by looking at the crash report and searching for a
> > nearby command.  I've deleted the crash reports now, but I think it
> > was the number associated with a mark command that got me near the
> > problem character in the fast-import file.
> >
> > I also modified the code to halt whenever it found a non-UTF8
> > character.  I'm sure there are better ways to do this, but:
> >
> >
> > GitMigrationCommitInfo>>inlineDataFor: aString
> >
> >| theString |
> >theString := aString.
> >"Ensure the message has valid UTF-8 encoding (this will raise an
> > error if it doesn't)"
> >[ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> > asByteArray ]
> >on: Error
> >do: [ :ex | self halt: 'Illegal string encoding'.
> >theString := aString select: [ :each | each asciiValue
> > between: 32 and: 128 ] ].
> >^ 'data ' , theString size asString , String cr , (theString
> > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
> 
> There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both 
> using the newer ZnCharacterEncoders. So you could write it shorter as:
> 
>   aString asByteArray utf8Decoded
> 
> Instead of Error you could use the more intention revealing 
> ZnCharacterEncodingError.
> 
> Apart from that, and I known you did not create this mess, encoding a String 
> into a String, although it can be done, is so wrong. You encode a String (a 
> collection of Characters, of Unicode code points) into a ByteArray and you 
> decode a ByteArray into a String.
> 
> Sending #asByteArray to a String in almost always wrong, as is sending 
> #asString to a ByteArray. These are implicit conversions (null conversions, 
> like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but 
> not for the much richer set of Characters that Pharo supports. So these will 
> eventually fail, one day, in a country far away.
> 
> > Cheers,
> > Alistair
> >
> 
> 
> 




Re: [Pharo-dev] Moving from mc to tonel?

2017-12-05 Thread Peter Uhnák
> In my case, it turned out to be a non-UTF8 encoded character in one of the
commit messages.

I've ran into this problem in a sister project (tonel-migration), and do
not have a proper resolution yet. I was forcing everything to be unicode,
so I need a better way to read and write encoded strings. :<

On Tue, Dec 5, 2017 at 8:56 AM, Sven Van Caekenberghe  wrote:

>
>
> > On 5 Dec 2017, at 08:34, Alistair Grant  wrote:
> >
> > On 5 December 2017 at 03:41, Martin Dias  wrote:
> >> I suspect it's related to the large number of commits in my repo. I made
> >> some tweaks and succeeded to create the fast-import file. But I get:
> >>
> >> fatal: Unsupported command: .
> >> fast-import: dumping crash report to .git/fast_import_crash_10301
> >>
> >> Do you recognize this error?
> >> I will check my changes tweaking the git-migration tool to see if I
> modified
> >> some behavior my mistake...
> >
> > I had the same error just last night.
> >
> > In my case, it turned out to be a non-UTF8 encoded character in one of
> > the commit messages.
> >
> > I tracked it down by looking at the crash report and searching for a
> > nearby command.  I've deleted the crash reports now, but I think it
> > was the number associated with a mark command that got me near the
> > problem character in the fast-import file.
> >
> > I also modified the code to halt whenever it found a non-UTF8
> > character.  I'm sure there are better ways to do this, but:
> >
> >
> > GitMigrationCommitInfo>>inlineDataFor: aString
> >
> >| theString |
> >theString := aString.
> >"Ensure the message has valid UTF-8 encoding (this will raise an
> > error if it doesn't)"
> >[ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> > asByteArray ]
> >on: Error
> >do: [ :ex | self halt: 'Illegal string encoding'.
> >theString := aString select: [ :each | each asciiValue
> > between: 32 and: 128 ] ].
> >^ 'data ' , theString size asString , String cr , (theString
> > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
>
> There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded),
> both using the newer ZnCharacterEncoders. So you could write it shorter as:
>
>   aString asByteArray utf8Decoded
>
> Instead of Error you could use the more intention revealing
> ZnCharacterEncodingError.
>
> Apart from that, and I known you did not create this mess, encoding a
> String into a String, although it can be done, is so wrong. You encode a
> String (a collection of Characters, of Unicode code points) into a
> ByteArray and you decode a ByteArray into a String.
>
> Sending #asByteArray to a String in almost always wrong, as is sending
> #asString to a ByteArray. These are implicit conversions (null conversions,
> like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1),
> but not for the much richer set of Characters that Pharo supports. So these
> will eventually fail, one day, in a country far away.
>
> > Cheers,
> > Alistair
> >
>
>
>


Re: [Pharo-dev] Moving from mc to tonel?

2017-12-04 Thread Peter Uhnák
> I suspect it's related to the large number of commits in my repo. I made
> some tweaks and succeeded to create the fast-import file. But I get:
>

All files in a single commit will be held temporarily in MemoryStore, so
unless you have GBs of code in _single_ commit it shouldn't be a problem.
There's also a possiblity that Pharo was going too fast and didn't manage
to GC it in time or there's a memory leak.


>
> fatal: Unsupported command: .
> fast-import: dumping crash report to .git/fast_import_crash_10301
>
> Do you recognize this error?
>


Please look into the mentioned file, it should give you a context for the
error (and then you can look into the import file around the same context).

Peter


Re: [Pharo-dev] Moving from mc to tonel?

2017-12-04 Thread Sven Van Caekenberghe


> On 5 Dec 2017, at 08:34, Alistair Grant  wrote:
> 
> On 5 December 2017 at 03:41, Martin Dias  wrote:
>> I suspect it's related to the large number of commits in my repo. I made
>> some tweaks and succeeded to create the fast-import file. But I get:
>> 
>> fatal: Unsupported command: .
>> fast-import: dumping crash report to .git/fast_import_crash_10301
>> 
>> Do you recognize this error?
>> I will check my changes tweaking the git-migration tool to see if I modified
>> some behavior my mistake...
> 
> I had the same error just last night.
> 
> In my case, it turned out to be a non-UTF8 encoded character in one of
> the commit messages.
> 
> I tracked it down by looking at the crash report and searching for a
> nearby command.  I've deleted the crash reports now, but I think it
> was the number associated with a mark command that got me near the
> problem character in the fast-import file.
> 
> I also modified the code to halt whenever it found a non-UTF8
> character.  I'm sure there are better ways to do this, but:
> 
> 
> GitMigrationCommitInfo>>inlineDataFor: aString
> 
>| theString |
>theString := aString.
>"Ensure the message has valid UTF-8 encoding (this will raise an
> error if it doesn't)"
>[ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> asByteArray ]
>on: Error
>do: [ :ex | self halt: 'Illegal string encoding'.
>theString := aString select: [ :each | each asciiValue
> between: 32 and: 128 ] ].
>^ 'data ' , theString size asString , String cr , (theString
> ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])

There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both 
using the newer ZnCharacterEncoders. So you could write it shorter as:

  aString asByteArray utf8Decoded

Instead of Error you could use the more intention revealing 
ZnCharacterEncodingError.

Apart from that, and I known you did not create this mess, encoding a String 
into a String, although it can be done, is so wrong. You encode a String (a 
collection of Characters, of Unicode code points) into a ByteArray and you 
decode a ByteArray into a String.

Sending #asByteArray to a String in almost always wrong, as is sending 
#asString to a ByteArray. These are implicit conversions (null conversions, 
like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but 
not for the much richer set of Characters that Pharo supports. So these will 
eventually fail, one day, in a country far away.

> Cheers,
> Alistair
> 




Re: [Pharo-dev] Moving from mc to tonel?

2017-12-04 Thread Alistair Grant
On 5 December 2017 at 03:41, Martin Dias  wrote:
> I suspect it's related to the large number of commits in my repo. I made
> some tweaks and succeeded to create the fast-import file. But I get:
>
> fatal: Unsupported command: .
> fast-import: dumping crash report to .git/fast_import_crash_10301
>
> Do you recognize this error?
> I will check my changes tweaking the git-migration tool to see if I modified
> some behavior my mistake...

I had the same error just last night.

In my case, it turned out to be a non-UTF8 encoded character in one of
the commit messages.

I tracked it down by looking at the crash report and searching for a
nearby command.  I've deleted the crash reports now, but I think it
was the number associated with a mark command that got me near the
problem character in the fast-import file.

I also modified the code to halt whenever it found a non-UTF8
character.  I'm sure there are better ways to do this, but:


GitMigrationCommitInfo>>inlineDataFor: aString

| theString |
theString := aString.
"Ensure the message has valid UTF-8 encoding (this will raise an
error if it doesn't)"
[ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
asByteArray ]
on: Error
do: [ :ex | self halt: 'Illegal string encoding'.
theString := aString select: [ :each | each asciiValue
between: 32 and: 128 ] ].
^ 'data ' , theString size asString , String cr , (theString
ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])



Cheers,
Alistair



Re: [Pharo-dev] Moving from mc to tonel?

2017-12-04 Thread Martin Dias
I suspect it's related to the large number of commits in my repo. I made
some tweaks and succeeded to create the fast-import file. But I get:

fatal: Unsupported command: .
fast-import: dumping crash report to .git/fast_import_crash_10301

Do you recognize this error?
I will check my changes tweaking the git-migration tool to see if I
modified some behavior my mistake...

cheers,
Martín

On Fri, Dec 1, 2017 at 1:42 PM, Martin Dias  wrote:

> Julien, I dodn't know it but anyway my issue was just in the step
> smalltalkhub -> filtree using the migration tool.
>
> Peter, yes it's public. This is the code I used:
>
> 
>
> migration := GitMigration on: 'MartinDias/Epicea'.
> migration cacheAllVersions.
> migration authors: {
> 'md' -> #('Martin Dias' '').
> 'MD' -> #('Martin Dias' '').
> 'MartinDias' -> #('Martin Dias' '').
> 'PabloTesone' -> #('Pablo Tesone' '').
> 'PavelKrivanek' -> #('Pavel Krivanek' '').
> 'DamienCassou' -> #('Damien Cassou' '').
> 'DenisKudryashov' -> #('Denis Kudryashov' '').
> 'GustavoSantos' -> #('Gustavo Santos' '').
> 'ThomasHeniart' -> #('Thomas Heniart' '').
> 'LucasGodoy' -> #('Lucas Godoy' '').
> 'StephaneDucasse' -> #('Stephane Ducasse' '').
> 'VincentBlondeau' -> #('Vincent Blondeau' '').
> 'AlbertoBacchelli' -> #('Alberto Bacchelli' '').
> 'SkipLentz' -> #('Skip Lentz' '').
> 'TheIntegrator' -> #('The Integrator' '').
> }.l
> migration
> fastImportCodeToDirectory: 'src'
> initialCommit: '18a30d8d'
> to: './epicea-fast-import.txt'
>
> --
>
> Martín
>
> On Thu, Nov 30, 2017 at 2:14 PM, Julien 
> wrote:
>
>> If you use Traits and want to have Epicea loadable from GitHub on Pharo
>> 6.1, it will not work.
>>
>> There is an issue when you try to load a Trait in Tonel format in the
>> image.
>>
>> ---
>> Julien Delplanque
>> Doctorant à l’Université de Lille 1
>> http://juliendelplanque.be/phd.html
>> Equipe Rmod, Inria
>> Bâtiment B 40, avenue Halley 59650
>> 
>>  Villeneuve
>> 
>>  d'Ascq
>> 
>> Numéro de téléphone: +333 59 35 86 40
>>
>> Le 6 nov. 2017 à 01:42, Martin Dias  a écrit :
>>
>> Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read
>> that tonel is the new format and there is Peter's git-migration, which
>> could be the tool to do it.
>>
>> Is it good idea to move now?
>> Is it the right tool?
>>
>> Bests,
>> Martín
>>
>>
>>
>


Re: [Pharo-dev] Moving from mc to tonel?

2017-12-01 Thread Martin Dias
Julien, I dodn't know it but anyway my issue was just in the step
smalltalkhub -> filtree using the migration tool.

Peter, yes it's public. This is the code I used:



migration := GitMigration on: 'MartinDias/Epicea'.
migration cacheAllVersions.
migration authors: {
'md' -> #('Martin Dias' '').
'MD' -> #('Martin Dias' '').
'MartinDias' -> #('Martin Dias' '').
'PabloTesone' -> #('Pablo Tesone' '').
'PavelKrivanek' -> #('Pavel Krivanek' '').
'DamienCassou' -> #('Damien Cassou' '').
'DenisKudryashov' -> #('Denis Kudryashov' '').
'GustavoSantos' -> #('Gustavo Santos' '').
'ThomasHeniart' -> #('Thomas Heniart' '').
'LucasGodoy' -> #('Lucas Godoy' '').
'StephaneDucasse' -> #('Stephane Ducasse' '').
'VincentBlondeau' -> #('Vincent Blondeau' '').
'AlbertoBacchelli' -> #('Alberto Bacchelli' '').
'SkipLentz' -> #('Skip Lentz' '').
'TheIntegrator' -> #('The Integrator' '').
}.l
migration
fastImportCodeToDirectory: 'src'
initialCommit: '18a30d8d'
to: './epicea-fast-import.txt'

--

Martín

On Thu, Nov 30, 2017 at 2:14 PM, Julien  wrote:

> If you use Traits and want to have Epicea loadable from GitHub on Pharo
> 6.1, it will not work.
>
> There is an issue when you try to load a Trait in Tonel format in the
> image.
>
> ---
> Julien Delplanque
> Doctorant à l’Université de Lille 1
> http://juliendelplanque.be/phd.html
> Equipe Rmod, Inria
> Bâtiment B 40, avenue Halley 59650
> 
>  Villeneuve
> 
>  d'Ascq
> 
> Numéro de téléphone: +333 59 35 86 40
>
> Le 6 nov. 2017 à 01:42, Martin Dias  a écrit :
>
> Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read
> that tonel is the new format and there is Peter's git-migration, which
> could be the tool to do it.
>
> Is it good idea to move now?
> Is it the right tool?
>
> Bests,
> Martín
>
>
>


Re: [Pharo-dev] Moving from mc to tonel?

2017-11-30 Thread Julien
If you use Traits and want to have Epicea loadable from GitHub on Pharo 6.1, it 
will not work.

There is an issue when you try to load a Trait in Tonel format in the image.

---
Julien Delplanque
Doctorant à l’Université de Lille 1
http://juliendelplanque.be/phd.html
Equipe Rmod, Inria
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Numéro de téléphone: +333 59 35 86 40

> Le 6 nov. 2017 à 01:42, Martin Dias  a écrit :
> 
> Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that 
> tonel is the new format and there is Peter's git-migration, which could be 
> the tool to do it. 
> 
> Is it good idea to move now?
> Is it the right tool?
> 
> Bests,
> Martín



Re: [Pharo-dev] Moving from mc to tonel?

2017-11-30 Thread Peter Uhnák
That sounds like a bug. Is the repo you are trying to migrate public?

On Thu, Nov 30, 2017 at 6:01 PM, Martin Dias  wrote:

> Hi:
>
> 1. thanks for the tool and information.
> 2. I forgot to tell you that I tried the tool some weeks ago and it ran
> out of memory. I should try again with more memory or somehting different...
>
> Cheers,
> Martin
>
> On Mon, Nov 6, 2017 at 5:49 AM, Peter Uhnák  wrote:
>
>> Hi Martin,
>>
>> several people successfully migrated with git-migration... so I guess it
>> works.
>>
>> > Is it good idea to move now?
>>
>> There is no risk in trying to use it. If it fails, you report a bug. :-)
>>
>> Is it the right tool?
>>
>> Yes.
>>
>> As for Tonel... git-migration uses FileTree, but I am looking into making
>> it configurable.
>>
>> (I have filetree-tonel migration tool now... but that is a bit more
>> hardcore and potentially destructive :) (unlike git-migration)).
>>
>> Cheers,
>> Peter
>>
>>
>>
>>
>>
>> On Mon, Nov 6, 2017 at 1:42 AM, Martin Dias  wrote:
>>
>>> Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read
>>> that tonel is the new format and there is Peter's git-migration, which
>>> could be the tool to do it.
>>>
>>> Is it good idea to move now?
>>> Is it the right tool?
>>>
>>> Bests,
>>> Martín
>>>
>>
>>
>


Re: [Pharo-dev] Moving from mc to tonel?

2017-11-30 Thread Martin Dias
Hi:

1. thanks for the tool and information.
2. I forgot to tell you that I tried the tool some weeks ago and it ran out
of memory. I should try again with more memory or somehting different...

Cheers,
Martin

On Mon, Nov 6, 2017 at 5:49 AM, Peter Uhnák  wrote:

> Hi Martin,
>
> several people successfully migrated with git-migration... so I guess it
> works.
>
> > Is it good idea to move now?
>
> There is no risk in trying to use it. If it fails, you report a bug. :-)
>
> Is it the right tool?
>
> Yes.
>
> As for Tonel... git-migration uses FileTree, but I am looking into making
> it configurable.
>
> (I have filetree-tonel migration tool now... but that is a bit more
> hardcore and potentially destructive :) (unlike git-migration)).
>
> Cheers,
> Peter
>
>
>
>
>
> On Mon, Nov 6, 2017 at 1:42 AM, Martin Dias  wrote:
>
>> Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read
>> that tonel is the new format and there is Peter's git-migration, which
>> could be the tool to do it.
>>
>> Is it good idea to move now?
>> Is it the right tool?
>>
>> Bests,
>> Martín
>>
>
>


Re: [Pharo-dev] Moving from mc to tonel?

2017-11-06 Thread Peter Uhnák
Hi Martin,

several people successfully migrated with git-migration... so I guess it
works.

> Is it good idea to move now?

There is no risk in trying to use it. If it fails, you report a bug. :-)

Is it the right tool?

Yes.

As for Tonel... git-migration uses FileTree, but I am looking into making
it configurable.

(I have filetree-tonel migration tool now... but that is a bit more
hardcore and potentially destructive :) (unlike git-migration)).

Cheers,
Peter





On Mon, Nov 6, 2017 at 1:42 AM, Martin Dias  wrote:

> Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read
> that tonel is the new format and there is Peter's git-migration, which
> could be the tool to do it.
>
> Is it good idea to move now?
> Is it the right tool?
>
> Bests,
> Martín
>


[Pharo-dev] Moving from mc to tonel?

2017-11-05 Thread Martin Dias
Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read
that tonel is the new format and there is Peter's git-migration, which
could be the tool to do it.

Is it good idea to move now?
Is it the right tool?

Bests,
Martín