Re: [Pharo-users] Class name with diacritic character and Pharo

2019-01-27 Thread Sven Van Caekenberghe
Yer, author and classname are both part of the preamble, so both are capable of 
breaking it.

Same problem, same solution.

> On 28 Jan 2019, at 05:10, Benoit St-Jean  wrote:
> 
> Sorry, my toothache meds are kicking in! lol
> 
> Correction to my last post:
> 
> I mean *fileouts" don't work when author name has diacritic French characters!
> 
> Obviously, couldn't test filing in!
> 
> 
> On 2019-01-27 11:03, Sven Van Caekenberghe wrote:
>> Hi Dominique,
>> 
>>> On 27 Jan 2019, at 11:40, Dominique Dartois  wrote:
>>> 
>>> Hello all.
>>> If a use french diacritic character in a class name, the code runs but I 
>>> can’t fileout the package nor save it with Monticello.
>>> For example, the C cedilla in the class name drive me to an 
>>> ‘ZnInvalidUTF8:Illegal byte for utf-8 encoding' when filing out.
>>> 
>>> Is it a bug or a feature?
>>> Thank you
>>> 
>>> ---
>>> Dominique Dartois
>> Thanks for reporting this. This is most definitely a bug, I can confirm its 
>> occurrence.
>> 
>> I'm CC pharo-dev as this is quite important. This will be a long mail.
>> 
>> 
>> This is one manifestation of a problem that has been present for quite a 
>> while.
>> 
>> I'll start by describing what I did, what went well and where/how this 
>> fails, some generic points, and two conceptual solutions (that need further 
>> verification).
>> 
>> Like you, I created a new subclass:
>> 
>> Object subclass: #ClasseFrançaise
>>  instanceVariableNames: ''
>>  classVariableNames: ''
>>  package: '_UnpackagedPackage'
>> 
>> With comment:
>> 
>> I am ClasseFrançaise.
>> 
>> Try:
>> 
>>  ClasseFrançaise new élève.
>>  ClasseFrançaise new euro.
>> 
>> And two methods (in the 'test' protocol):
>> 
>> élève
>>  ^ 'élève'
>> 
>> euro
>>  ^ '€'
>> 
>> I added the euro sign (because that is encoded in UTF-8 with 3 bytes, not 2 
>> like ç).
>> Like you said, the system can cope with such class and method names and 
>> seems to function fine.
>> 
>> Looking at the .changes file, the correct source code was appended:
>> 
>> SNAPSHOT2019-01-26T23:36:18.548555+01:00 work.image priorSource: 
>> 339848!
>> 
>> Object subclass: #ClasseFrançaise
>> instanceVariableNames: ''
>> classVariableNames: ''
>> package: '_UnpackagedPackage'!
>> !ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:25' prior: 
>> 0!
>> I am ClasseFrançaise.!
>> !ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
>> 12:26'!
>> élève
>> ^ 'élève'! !
>> !ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:27' prior: 
>> 33898360!
>> I am ClasseFrançaise.
>> 
>> Try:
>> 
>> ClasseFrançaise new élève.
>> ClasseFrançaise new euro.
>> !
>> !ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
>> 12:27'!
>> euro
>>  ^ '€'! !
>> 
>> 
>> Doing a file out (or otherwise saving the source code) fails. The reason is 
>> an incorrect manipulation of this source file while looking for what is 
>> called the method preamble, in SourcFileArray>>#getPreambleFrom:at: position
>> 
>> An programmatic way to invoke the same error is by doing
>> 
>> (ClasseFrançaise>>#élève) timeStamp.
>> (ClasseFrançaise>>#élève) author.
>> 
>> Both fail with the same error.
>> 
>> 
>> The source code of methods is (currently) stored in a .sources or .changes 
>> file. CompiledMethods know their source pointer, an offset in one of these 
>> files. Right before the place where the source starts is a preamble that 
>> contains some meta information (including the author and timestamp). To 
>> access that preamble, the source code pointer is moved backwards to the 
>> beginning of the preamble (which begins and ends with a !).
>> 
>> 
>> The current approach fails in the presence of non-ASCII characters. More 
>> specifically because of a mixup between the concept of byte position and 
>> character position when using UTF-8, a variable length encoding (both the 
>> .changes and the .sources are UTF-8 encoded).
>> 
>> For example, consider
>> 
>> 'à partir de 10 €' size. "16"
>> 'à partir de 10 €' utf8Encoded size. "19"
>> 
>> So although the string contains 16 characters, it is encoded as 19 bytes, à 
>> using 2 bytes and € using 3 bytes. In general, moving backwards or forwards 
>> in UTF-8 encoded bytes cannot be done without understanding UTF-8 itself.
>> 
>> ZnUTF8Encoder can do both (moving forward is #nextFromStream: while moving 
>> backwards is #backOnStream:). However, ZnUTF8Encoder is also strict: it will 
>> signal an error when forced to operate in between encoded characters, which 
>> is what happens here.
>> 
>> It is thus not possible to move to arbitrary bytes positions and assume/hope 
>> to always arrive on the correct character boundaries and it is also wrong to 
>> take the difference between two byte positions as the count of characters 
>> present (since their encoding is of variable length).
>> 
>> SourcFileArray>>#getPreambleFrom:at: is doing 

Re: [Pharo-users] Class name with diacritic character and Pharo

2019-01-27 Thread Benoit St-Jean via Pharo-users
--- Begin Message ---

Sorry, my toothache meds are kicking in! lol

Correction to my last post:

I mean *fileouts" don't work when author name has diacritic French 
characters!


Obviously, couldn't test filing in!


On 2019-01-27 11:03, Sven Van Caekenberghe wrote:

Hi Dominique,


On 27 Jan 2019, at 11:40, Dominique Dartois  wrote:

Hello all.
If a use french diacritic character in a class name, the code runs but I can’t 
fileout the package nor save it with Monticello.
For example, the C cedilla in the class name drive me to an 
‘ZnInvalidUTF8:Illegal byte for utf-8 encoding' when filing out.

Is it a bug or a feature?
Thank you

---
Dominique Dartois

Thanks for reporting this. This is most definitely a bug, I can confirm its 
occurrence.

I'm CC pharo-dev as this is quite important. This will be a long mail.


This is one manifestation of a problem that has been present for quite a while.

I'll start by describing what I did, what went well and where/how this fails, 
some generic points, and two conceptual solutions (that need further 
verification).

Like you, I created a new subclass:

Object subclass: #ClasseFrançaise
instanceVariableNames: ''
classVariableNames: ''
package: '_UnpackagedPackage'

With comment:

I am ClasseFrançaise.

Try:

ClasseFrançaise new élève.
ClasseFrançaise new euro.

And two methods (in the 'test' protocol):

élève
^ 'élève'

euro
^ '€'

I added the euro sign (because that is encoded in UTF-8 with 3 bytes, not 2 
like ç).
Like you said, the system can cope with such class and method names and seems 
to function fine.

Looking at the .changes file, the correct source code was appended:

SNAPSHOT2019-01-26T23:36:18.548555+01:00 work.image priorSource: 339848!

Object subclass: #ClasseFrançaise
 instanceVariableNames: ''
 classVariableNames: ''
 package: '_UnpackagedPackage'!
!ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:25' prior: 0!
I am ClasseFrançaise.!
!ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
12:26'!
élève
 ^ 'élève'! !
!ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:27' prior: 
33898360!
I am ClasseFrançaise.

Try:

 ClasseFrançaise new élève.
 ClasseFrançaise new euro.
!
!ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
12:27'!
euro
^ '€'! !


Doing a file out (or otherwise saving the source code) fails. The reason is an 
incorrect manipulation of this source file while looking for what is called the 
method preamble, in SourcFileArray>>#getPreambleFrom:at: position

An programmatic way to invoke the same error is by doing

(ClasseFrançaise>>#élève) timeStamp.
(ClasseFrançaise>>#élève) author.

Both fail with the same error.


The source code of methods is (currently) stored in a .sources or .changes 
file. CompiledMethods know their source pointer, an offset in one of these 
files. Right before the place where the source starts is a preamble that 
contains some meta information (including the author and timestamp). To access 
that preamble, the source code pointer is moved backwards to the beginning of 
the preamble (which begins and ends with a !).


The current approach fails in the presence of non-ASCII characters. More 
specifically because of a mixup between the concept of byte position and 
character position when using UTF-8, a variable length encoding (both the 
.changes and the .sources are UTF-8 encoded).

For example, consider

'à partir de 10 €' size. "16"
'à partir de 10 €' utf8Encoded size. "19"

So although the string contains 16 characters, it is encoded as 19 bytes, à 
using 2 bytes and € using 3 bytes. In general, moving backwards or forwards in 
UTF-8 encoded bytes cannot be done without understanding UTF-8 itself.

ZnUTF8Encoder can do both (moving forward is #nextFromStream: while moving 
backwards is #backOnStream:). However, ZnUTF8Encoder is also strict: it will 
signal an error when forced to operate in between encoded characters, which is 
what happens here.

It is thus not possible to move to arbitrary bytes positions and assume/hope to 
always arrive on the correct character boundaries and it is also wrong to take 
the difference between two byte positions as the count of characters present 
(since their encoding is of variable length).

SourcFileArray>>#getPreambleFrom:at: is doing both of these wrong (but gets 
away with it in 99.99% of all cases since very few people name their classes like 
that).

There are two solutions: operate mostly on the byte level or operate correctly 
on the character level. Here are two conceptual solutions (you must execute 
either solution 1 or 2, not both), with two different inputs.


src := '!ClasseFrançaise methodsFor: ''test'' stamp: ''SvenVanCaekenberghe 
1/27/2019 12:27''!
euro
^ ''€''! !'.

"startPosition := 83"

str := ZnCharacterReadStream on: (src utf8Encoded readStream).
str 

Re: [Pharo-users] Class name with diacritic character and Pharo

2019-01-27 Thread Benoit St-Jean via Pharo-users
--- Begin Message ---
While we're at it, a similar problem arises when the author name (in my 
case BenoîtStJean) contains a French diacritic.


Just tested it with Pharo 7 (64 bit on Windows 10)...

Fileout works fine.  But filing in crashes!

On 2019-01-27 11:03, Sven Van Caekenberghe wrote:

Hi Dominique,


On 27 Jan 2019, at 11:40, Dominique Dartois  wrote:

Hello all.
If a use french diacritic character in a class name, the code runs but I can’t 
fileout the package nor save it with Monticello.
For example, the C cedilla in the class name drive me to an 
‘ZnInvalidUTF8:Illegal byte for utf-8 encoding' when filing out.

Is it a bug or a feature?
Thank you

---
Dominique Dartois

Thanks for reporting this. This is most definitely a bug, I can confirm its 
occurrence.

I'm CC pharo-dev as this is quite important. This will be a long mail.


This is one manifestation of a problem that has been present for quite a while.

I'll start by describing what I did, what went well and where/how this fails, 
some generic points, and two conceptual solutions (that need further 
verification).

Like you, I created a new subclass:

Object subclass: #ClasseFrançaise
instanceVariableNames: ''
classVariableNames: ''
package: '_UnpackagedPackage'

With comment:

I am ClasseFrançaise.

Try:

ClasseFrançaise new élève.
ClasseFrançaise new euro.

And two methods (in the 'test' protocol):

élève
^ 'élève'

euro
^ '€'

I added the euro sign (because that is encoded in UTF-8 with 3 bytes, not 2 
like ç).
Like you said, the system can cope with such class and method names and seems 
to function fine.

Looking at the .changes file, the correct source code was appended:

SNAPSHOT2019-01-26T23:36:18.548555+01:00 work.image priorSource: 339848!

Object subclass: #ClasseFrançaise
 instanceVariableNames: ''
 classVariableNames: ''
 package: '_UnpackagedPackage'!
!ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:25' prior: 0!
I am ClasseFrançaise.!
!ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
12:26'!
élève
 ^ 'élève'! !
!ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:27' prior: 
33898360!
I am ClasseFrançaise.

Try:

 ClasseFrançaise new élève.
 ClasseFrançaise new euro.
!
!ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
12:27'!
euro
^ '€'! !


Doing a file out (or otherwise saving the source code) fails. The reason is an 
incorrect manipulation of this source file while looking for what is called the 
method preamble, in SourcFileArray>>#getPreambleFrom:at: position

An programmatic way to invoke the same error is by doing

(ClasseFrançaise>>#élève) timeStamp.
(ClasseFrançaise>>#élève) author.

Both fail with the same error.


The source code of methods is (currently) stored in a .sources or .changes 
file. CompiledMethods know their source pointer, an offset in one of these 
files. Right before the place where the source starts is a preamble that 
contains some meta information (including the author and timestamp). To access 
that preamble, the source code pointer is moved backwards to the beginning of 
the preamble (which begins and ends with a !).


The current approach fails in the presence of non-ASCII characters. More 
specifically because of a mixup between the concept of byte position and 
character position when using UTF-8, a variable length encoding (both the 
.changes and the .sources are UTF-8 encoded).

For example, consider

'à partir de 10 €' size. "16"
'à partir de 10 €' utf8Encoded size. "19"

So although the string contains 16 characters, it is encoded as 19 bytes, à 
using 2 bytes and € using 3 bytes. In general, moving backwards or forwards in 
UTF-8 encoded bytes cannot be done without understanding UTF-8 itself.

ZnUTF8Encoder can do both (moving forward is #nextFromStream: while moving 
backwards is #backOnStream:). However, ZnUTF8Encoder is also strict: it will 
signal an error when forced to operate in between encoded characters, which is 
what happens here.

It is thus not possible to move to arbitrary bytes positions and assume/hope to 
always arrive on the correct character boundaries and it is also wrong to take 
the difference between two byte positions as the count of characters present 
(since their encoding is of variable length).

SourcFileArray>>#getPreambleFrom:at: is doing both of these wrong (but gets 
away with it in 99.99% of all cases since very few people name their classes like 
that).

There are two solutions: operate mostly on the byte level or operate correctly 
on the character level. Here are two conceptual solutions (you must execute 
either solution 1 or 2, not both), with two different inputs.


src := '!ClasseFrançaise methodsFor: ''test'' stamp: ''SvenVanCaekenberghe 
1/27/2019 12:27''!
euro
^ ''€''! !'.

"startPosition := 83"

str := ZnCharacterReadStream on: (src 

Re: [Pharo-users] Class name with diacritic character and Pharo

2019-01-27 Thread Dominique Dartois
Hi Sven.
Thank you for the time spend for your reply.
I have tried the same code in Pharo 6.1 (21.0) instead of 7.1.0 and I had NO 
problem.
It seems the class implementation for ZnUTF8Encoder is different.
Thanks again.
---
Dominique

> Le 27 janvier 2019 à 17:03, Sven Van Caekenberghe  a écrit :
> 
> 
> Hi Dominique,
> 
> > On 27 Jan 2019, at 11:40, Dominique Dartois  wrote:
> > 
> > Hello all. 
> > If a use french diacritic character in a class name, the code runs but I 
> > can’t fileout the package nor save it with Monticello. 
> > For example, the C cedilla in the class name drive me to an 
> > ‘ZnInvalidUTF8:Illegal byte for utf-8 encoding' when filing out.
> > 
> > Is it a bug or a feature?
> > Thank you
> > 
> > --- 
> > Dominique Dartois 
> 
> Thanks for reporting this. This is most definitely a bug, I can confirm its 
> occurrence.
> 
> I'm CC pharo-dev as this is quite important. This will be a long mail.
> 
> 
> This is one manifestation of a problem that has been present for quite a 
> while.
> 
> I'll start by describing what I did, what went well and where/how this fails, 
> some generic points, and two conceptual solutions (that need further 
> verification).
> 
> Like you, I created a new subclass:
> 
> Object subclass: #ClasseFrançaise
>   instanceVariableNames: ''
>   classVariableNames: ''
>   package: '_UnpackagedPackage'
> 
> With comment:
> 
> I am ClasseFrançaise.
> 
> Try:
> 
>   ClasseFrançaise new élève.
>   ClasseFrançaise new euro.
> 
> And two methods (in the 'test' protocol):
> 
> élève
>   ^ 'élève'
> 
> euro
>   ^ '€'
> 
> I added the euro sign (because that is encoded in UTF-8 with 3 bytes, not 2 
> like ç).
> Like you said, the system can cope with such class and method names and seems 
> to function fine.
> 
> Looking at the .changes file, the correct source code was appended:
> 
> SNAPSHOT2019-01-26T23:36:18.548555+01:00 work.image priorSource: 
> 339848!
> 
> Object subclass: #ClasseFrançaise
> instanceVariableNames: ''
> classVariableNames: ''
> package: '_UnpackagedPackage'!
> !ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:25' prior: 0!
> I am ClasseFrançaise.!
> !ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
> 12:26'!
> élève
> ^ 'élève'! !
> !ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:27' prior: 
> 33898360!
> I am ClasseFrançaise.
> 
> Try:
> 
> ClasseFrançaise new élève.
> ClasseFrançaise new euro.
> !
> !ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
> 12:27'!
> euro
>   ^ '€'! !
> 
> 
> Doing a file out (or otherwise saving the source code) fails. The reason is 
> an incorrect manipulation of this source file while looking for what is 
> called the method preamble, in SourcFileArray>>#getPreambleFrom:at: position
> 
> An programmatic way to invoke the same error is by doing
> 
> (ClasseFrançaise>>#élève) timeStamp.
> (ClasseFrançaise>>#élève) author.
> 
> Both fail with the same error.
> 
> 
> The source code of methods is (currently) stored in a .sources or .changes 
> file. CompiledMethods know their source pointer, an offset in one of these 
> files. Right before the place where the source starts is a preamble that 
> contains some meta information (including the author and timestamp). To 
> access that preamble, the source code pointer is moved backwards to the 
> beginning of the preamble (which begins and ends with a !).
> 
> 
> The current approach fails in the presence of non-ASCII characters. More 
> specifically because of a mixup between the concept of byte position and 
> character position when using UTF-8, a variable length encoding (both the 
> .changes and the .sources are UTF-8 encoded).
> 
> For example, consider
> 
> 'à partir de 10 €' size. "16"
> 'à partir de 10 €' utf8Encoded size. "19"
> 
> So although the string contains 16 characters, it is encoded as 19 bytes, à 
> using 2 bytes and € using 3 bytes. In general, moving backwards or forwards 
> in UTF-8 encoded bytes cannot be done without understanding UTF-8 itself.
> 
> ZnUTF8Encoder can do both (moving forward is #nextFromStream: while moving 
> backwards is #backOnStream:). However, ZnUTF8Encoder is also strict: it will 
> signal an error when forced to operate in between encoded characters, which 
> is what happens here.
> 
> It is thus not possible to move to arbitrary bytes positions and assume/hope 
> to always arrive on the correct character boundaries and it is also wrong to 
> take the difference between two byte positions as the count of characters 
> present (since their encoding is of variable length).
> 
> SourcFileArray>>#getPreambleFrom:at: is doing both of these wrong (but gets 
> away with it in 99.99% of all cases since very few people name their classes 
> like that).
> 
> There are two solutions: operate mostly on the byte level or operate 
> correctly on the character level. Here are 

[Pharo-users] What's wrong?

2019-01-27 Thread eftomi
Every time that I start Pharo 7.0.1 32-bit image (it's name is "Pharo 7.0 -
32bit" and it resides in the directory of the same name), a new directory is
created under images directory, with the name "Pharo 7.0 - 64bit
(development version)-01". Within it, Pharo is using directories
"ombu-sessions", "play-cache" to store its run-time information.

I'm using PharoLauncher version: 1.5.1.

The situation is the same if I do this with a fresh 7.0.1 image. System
reporter shows:

Image
-
C:\Users\eftomi.MISKOTI\EFTOMI\PHARO\images\Pharo 7.0 - 32bit\Pharo 7.0 -
32bit.image
Pharo7.0.1
Build information:
Pharo-7.0.1+build.143.sha.eca26da119bccd95e463c7717a44b814453df4e8 (32 Bit)
Unnamed

Virtual Machine
---
C:\Users\eftomi.MISKOTI\EFTOMI\PHARO\vms\70-x86\Pharo.exe
CoInterpreter VMMaker.oscog-eem.2504 uuid:
a00b0fad-c04c-47a6-8a11-5dbff110ac11 Jan  5 2019
StackToRegisterMappingCogit VMMaker.oscog-eem.2504 uuid:
a00b0fad-c04c-47a6-8a11-5dbff110ac11 Jan  5 2019
VM: 201901051900 https://github.com/OpenSmalltalk/opensmalltalk-vm.git Date:
Sat Jan 5 20:00:11 2019 CommitHash: 7a3c6b6 Plugins: 201901051900
https://github.com/OpenSmalltalk/opensmalltalk-vm.git

Win32 built on Jan  5 2019 20:12:30 GMT Compiler: 7.4.0
VMMaker versionString VM: 201901051900
https://github.com/OpenSmalltalk/opensmalltalk-vm.git Date: Sat Jan 5
20:00:11 2019 CommitHash: 7a3c6b6 Plugins: 201901051900
https://github.com/OpenSmalltalk/opensmalltalk-vm.git
CoInterpreter VMMaker.oscog-eem.2504 uuid:
a00b0fad-c04c-47a6-8a11-5dbff110ac11 Jan  5 2019
StackToRegisterMappingCogit VMMaker.oscog-eem.2504 uuid:
a00b0fad-c04c-47a6-8a11-5dbff110ac11 Jan  5 2019 




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html



Re: [Pharo-users] Class name with diacritic character and Pharo

2019-01-27 Thread Sven Van Caekenberghe
Hi Dominique,

> On 27 Jan 2019, at 11:40, Dominique Dartois  wrote:
> 
> Hello all. 
> If a use french diacritic character in a class name, the code runs but I 
> can’t fileout the package nor save it with Monticello. 
> For example, the C cedilla in the class name drive me to an 
> ‘ZnInvalidUTF8:Illegal byte for utf-8 encoding' when filing out.
> 
> Is it a bug or a feature?
> Thank you
> 
> --- 
> Dominique Dartois 

Thanks for reporting this. This is most definitely a bug, I can confirm its 
occurrence.

I'm CC pharo-dev as this is quite important. This will be a long mail.


This is one manifestation of a problem that has been present for quite a while.

I'll start by describing what I did, what went well and where/how this fails, 
some generic points, and two conceptual solutions (that need further 
verification).

Like you, I created a new subclass:

Object subclass: #ClasseFrançaise
instanceVariableNames: ''
classVariableNames: ''
package: '_UnpackagedPackage'

With comment:

I am ClasseFrançaise.

Try:

ClasseFrançaise new élève.
ClasseFrançaise new euro.

And two methods (in the 'test' protocol):

élève
^ 'élève'

euro
^ '€'

I added the euro sign (because that is encoded in UTF-8 with 3 bytes, not 2 
like ç).
Like you said, the system can cope with such class and method names and seems 
to function fine.

Looking at the .changes file, the correct source code was appended:

SNAPSHOT2019-01-26T23:36:18.548555+01:00 work.image priorSource: 339848!

Object subclass: #ClasseFrançaise
instanceVariableNames: ''
classVariableNames: ''
package: '_UnpackagedPackage'!
!ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:25' prior: 0!
I am ClasseFrançaise.!
!ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
12:26'!
élève
^ 'élève'! !
!ClasseFrançaise commentStamp: 'SvenVanCaekenberghe 1/27/2019 12:27' prior: 
33898360!
I am ClasseFrançaise.

Try:

ClasseFrançaise new élève.
ClasseFrançaise new euro.
!
!ClasseFrançaise methodsFor: 'test' stamp: 'SvenVanCaekenberghe 1/27/2019 
12:27'!
euro
^ '€'! !


Doing a file out (or otherwise saving the source code) fails. The reason is an 
incorrect manipulation of this source file while looking for what is called the 
method preamble, in SourcFileArray>>#getPreambleFrom:at: position

An programmatic way to invoke the same error is by doing

(ClasseFrançaise>>#élève) timeStamp.
(ClasseFrançaise>>#élève) author.

Both fail with the same error.


The source code of methods is (currently) stored in a .sources or .changes 
file. CompiledMethods know their source pointer, an offset in one of these 
files. Right before the place where the source starts is a preamble that 
contains some meta information (including the author and timestamp). To access 
that preamble, the source code pointer is moved backwards to the beginning of 
the preamble (which begins and ends with a !).


The current approach fails in the presence of non-ASCII characters. More 
specifically because of a mixup between the concept of byte position and 
character position when using UTF-8, a variable length encoding (both the 
.changes and the .sources are UTF-8 encoded).

For example, consider

'à partir de 10 €' size. "16"
'à partir de 10 €' utf8Encoded size. "19"

So although the string contains 16 characters, it is encoded as 19 bytes, à 
using 2 bytes and € using 3 bytes. In general, moving backwards or forwards in 
UTF-8 encoded bytes cannot be done without understanding UTF-8 itself.

ZnUTF8Encoder can do both (moving forward is #nextFromStream: while moving 
backwards is #backOnStream:). However, ZnUTF8Encoder is also strict: it will 
signal an error when forced to operate in between encoded characters, which is 
what happens here.

It is thus not possible to move to arbitrary bytes positions and assume/hope to 
always arrive on the correct character boundaries and it is also wrong to take 
the difference between two byte positions as the count of characters present 
(since their encoding is of variable length).

SourcFileArray>>#getPreambleFrom:at: is doing both of these wrong (but gets 
away with it in 99.99% of all cases since very few people name their classes 
like that).

There are two solutions: operate mostly on the byte level or operate correctly 
on the character level. Here are two conceptual solutions (you must execute 
either solution 1 or 2, not both), with two different inputs.


src := '!ClasseFrançaise methodsFor: ''test'' stamp: ''SvenVanCaekenberghe 
1/27/2019 12:27''!
euro
^ ''€''! !'.

"startPosition := 83"

str := ZnCharacterReadStream on: (src utf8Encoded readStream).
str position: 83. "at start of euro, the methods source string"
str upToEnd.

str position: (83 - 3). "before ! before euro"

"find the previous ! before position"
position := str position.
binary := str wrappedStream.
encoder := str encoder.


Re: [Pharo-users] PetitParser question

2019-01-27 Thread Tudor Girba
Hi Konrad,

A somewhat similar issue is present in an XML grammar: the closing element must 
match the opening element. In PPXmlGrammar, you have a condition that matches 
it and throws a failure otherwise:

element
"[39]   element::=   EmptyElemTag | STag content 
ETag"

^ $< asParser , qualified , attributes , whitespace optional , ('/>' 
asParser / ($> asParser , content , [ :stream | stream position ] asParser , 
' asParser)) ==> [ :nodes | 
nodes fifth = '/>'
ifTrue: [ Array with: nodes second with: nodes third 
with: #() ]
ifFalse: [
nodes second = nodes fifth fifth
ifTrue: [ Array with: nodes second 
with: nodes third with: nodes fifth second ]
ifFalse: [ PPFailure message: 'Expected 
' context: nil at: nodes fifth third ] ] ]

Cheers,
Doru


> On Jan 27, 2019, at 4:38 PM, Konrad Hinsen  wrote:
> 
> Dear Tomo,
> 
>> This post might help you. In case of PetitParser2, it's PP2Failure instead
>> of PPFailure.
>> https://stackoverflow.com/questions/15371334/how-can-a-petitparser-parse-rule-signal-an-error
> 
> That's indeed a possible solution: parse for arbitrary operators, and then 
> add a test for equality that can make everything fail in the end. I will try 
> it out!
> 
> Thanks,
>  Konrad.
> 
> 

--
www.feenk.com

"Not knowing how to do something is not an argument for how it cannot be done."




Re: [Pharo-users] PetitParser question

2019-01-27 Thread Konrad Hinsen

Dear Tomo,


This post might help you. In case of PetitParser2, it's PP2Failure instead
of PPFailure.
https://stackoverflow.com/questions/15371334/how-can-a-petitparser-parse-rule-signal-an-error


That's indeed a possible solution: parse for arbitrary operators, and 
then add a test for equality that can make everything fail in the end. I 
will try it out!


Thanks,
  Konrad.




[Pharo-users] Class name with diacritic character and Pharo

2019-01-27 Thread Dominique Dartois


 
 
  
   
Hello all.
If a use french diacritic character in a class name, the code runs but I can’t fileout the package nor save it with Monticello.
For example, the C cedilla in the class name drive me to an ‘ZnInvalidUTF8:Illegal byte for utf-8 encoding' when filing out.
   
   

   
   
Is it a bug or a feature?
   
   
Thank you
   
   

   
  
  
   
--- 

   
   
Dominique Dartois 

   
   
 


Re: [Pharo-users] Falsehoods programmers believe about Smalltalk

2019-01-27 Thread Sven Van Caekenberghe



> On 27 Jan 2019, at 03:38, Richard O'Keefe  wrote:
> 
> What *is* the persistence scheme in Pharo these days?

FUEL is the standard binary serialiser (which can do blocks, execution stacks, 
etc).
STON is the standard textual serialiser (that cannot do blocks).

It has been like that for many versions.


Re: [Pharo-users] Cryptography long filename in github repo causes error in Win 10 loading

2019-01-27 Thread Sven Van Caekenberghe
Switching to the Tonel file format in git solves this.

There is an option to convert a repo in Iceberg (under extra).

> On 27 Jan 2019, at 05:01, Sanjay Minni  wrote:
> 
> Hi,
> 
> can the following filename in Cryptography package in github repository be
> shortened.
> its causing a crash when loading in Win 10 (so a workaround has to be done
> just for this much)
> as this package is used by Voyage it may be fairly in use.
> 
> filename:
> cryptUIDlgSelectCertificateFromStore.winHandle.pwszTitle.pwszDisplayString.dwDontUseColumn.flags.reserved..st
> 
> location:
> https://github.com/pharo-contributions/Cryptography/tree/master/source/Cryptography-MSCerts.package/Win32FFICertificateStore.class/instance
> 
> Probably this filename is autogenerated so a tweak could be attempted
> 
> 
> 
> This seems to work - at least loading seems to have gone thru
> 
> but is it possible to shorten the following filename in the github
> repository itself
> 
> cryptUIDlgSelectCertificateFromStore.winHandle.pwszTitle.pwszDisplayString.dwDontUseColumn.flags.reserved..st
> 
> it is in
> https://github.com/pharo-contributions/Cryptography/tree/master/source/Cryptography-MSCerts.package/Win32FFICertificateStore.class/instance
> 
> i do not know if there any any other filenames that long
> 
> 
> EstebanLM wrote
>> Hi, 
>> 
>> I’m sorry I didn’t see this before. 
>> This happens because there are many projects that are still using filetree
>> format which stores one file per method. And the problem here is that
>> windows has a path limit that you are exceeding. 
>> To workaround this problem: 
>> 
>> - Open iceberg settings (toolbar button in repositories window).
>> - Check "Share repositories between images”.
>> - In "Location for shared repositories” put something like “C:\repo” (you
>> will need to create that dir too).
>> 
>> And retry :)
>> 
>> Esteban
>> 
>>> On 4 Dec 2018, at 17:24, Sanjay Minni 
> 
>> sm@
> 
> 
> 
> 
> -
> cheers, 
> Sanjay
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> 




Re: [Pharo-users] Instance of ByteArray did not understand #isByteString in ZnUTF8Encoder

2019-01-27 Thread Sven Van Caekenberghe



> On 27 Jan 2019, at 00:20, Hernán Morales Durand  
> wrote:
> 
> Hi Sven,
> 
> Yes, your examples work in Pharo 7, however I also wanted a progress bar, and 
> two exception handlers:
> 
> 1) Resume when my files are too large (ZnEntityTooLarge).
> 2) When download failed display an error message.
> 
> Now for 1) I consider to wrap the whole download code - because I don't like 
> #isKindOf: - into this:
> 
> ZnMaximumEntitySize value: someMaxSize during: [ ... ].

There is also an accessor #maximumEntitySize: 

Because ZnMaximumEntitySize will go away for a more general mechanism.

> I changed my code to use #downloadTo: 
> Thank you for the suggestion!!
> 
> Cheers,
> 
> Hernán
> 
> El sáb., 26 ene. 2019 a las 19:05, Sven Van Caekenberghe () 
> escribió:
> BTW, ZTimezone in ZTimestamp does something similar to what I guess you want 
> to do:
> 
> downloadFallbackZoneinfoDataset
> "Download a fallback copy of the zoneinfo dataset and return its 
> location.
> See #fallbackZoneinfoDatasetURL for more info and a warning"
> 
> | zipLocation zoneinfoLocation |
> zipLocation := self fallbackZoneinfoDatasetURL file asFileReference.
> zipLocation ensureDelete.
> ZnClient new 
> url: self fallbackZoneinfoDatasetURL; 
> downloadTo: zipLocation;
> close.
> zoneinfoLocation := zipLocation parent / 'zoneinfo'.
> zoneinfoLocation ensureDeleteAll.
> ZipArchive new
> readFrom: zipLocation;
> extractAllTo: zoneinfoLocation parent;
> close.
> ^ zoneinfoLocation
> 
> where
> 
> fallbackZoneinfoDatasetURL
> "The URL to the ZIP archive zoneinfo.zip which is offered as a 
> fallback for 
> systems (such as Windows) that do no have their own copy of this 
> data. 
> Note that it is highly recommended that you use such a OS maintained 
> dataset, 
> as this data is changed (being added to) each year."
> 
> ^ 'https://github.com/svenvc/ztimestamp/raw/master/rsrc/zoneinfo.zip' 
> asUrl
> 
> > On 26 Jan 2019, at 23:00, Sven Van Caekenberghe  wrote:
> > 
> > Hmm, the following all work for me (in Pharo 7.0.1 on macOS):
> > 
> > ZnClient new
> >   url: 
> > 'https://github.com/biosmalltalk/biopharo/raw/develop/test_files/BioSmalltalkTestFiles.zip';
> >   get;
> >   yourself.
> >   
> > ZnClient new
> >   url: 
> > 'https://github.com/biosmalltalk/biopharo/raw/develop/test_files/BioSmalltalkTestFiles.zip';
> >   get;
> >   contents.
> >   
> > ZnClient new
> >   get: 
> > 'https://github.com/biosmalltalk/biopharo/raw/develop/test_files/BioSmalltalkTestFiles.zip'.
> >   
> > ZnClient new
> >   url: 
> > 'https://github.com/biosmalltalk/biopharo/raw/develop/test_files/BioSmalltalkTestFiles.zip';
> >   downloadTo: '/tmp'.
> > 
> > $ ls -lah /tmp/BioSmalltalkTestFiles.zip 
> > -rw-r--r--@ 1 sven  wheel   5.3M Jan 26 22:57 /tmp/BioSmalltalkTestFiles.zip
> > 
> > $ file /tmp/BioSmalltalkTestFiles.zip 
> > /tmp/BioSmalltalkTestFiles.zip: Zip archive data, at least v2.0 to extract
> > 
> >> On 26 Jan 2019, at 20:36, Hernán Morales Durand  
> >> wrote:
> >> 
> >> Hi there,
> >> 
> >> In Pharo 7.0 I've encountered an error related with the new streams 
> >> changes 
> >> (https://github.com/pharo-open-documentation/pharo-wiki/blob/master/Migration/MigrationToPharo7.md)
> >>  while downloading a zip file. Problem is ZnUTF8Encoder expects a String, 
> >> but my code which worked in Pharo 6 provides a ByteArray. I've isolated it 
> >> in a reproducible way:
> >> 
> >> | webClient resp |
> >> webClient := ZnClient new.
> >> UIManager default informUserDuring: [ :bar | 
> >>bar label: 'Downloading resources for ' , self class printString.
> >>[ webClient
> >>signalProgress: true;
> >>get: 
> >> 'https://github.com/biosmalltalk/biopharo/raw/develop/test_files/BioSmalltalkTestFiles.zip'
> >>  ]
> >>on: HTTPProgress , ZnEntityTooLarge
> >>do: [ : ex | 
> >>(ex isKindOf: ZnEntityTooLarge)
> >>ifTrue: [ ex resume ]
> >>ifFalse: [ 
> >>| progress |
> >>progress := ex.
> >>progress isEmpty
> >>ifFalse: [
> >>bar current: progress percentage.
> >>progress total ifNotNil: [ :aTotalNumber | bar 
> >> label: 'Downloading ' ] ].
> >>progress resume ] ] ].
> >> (resp := webClient response) isSuccess
> >>ifTrue: [ 'output2.zip' asFileReference writeStreamDo: [ : stream | 
> >> stream nextPutAll: resp contents ] ]
> >>ifFalse: [ self error: 'Cannot download resource files' ].
> >> 
> >> I also tried using #writeOn: on the response, but that also wrote the 
> >> response header:
> >> 
> >> self response writeOn: 'test1.zip' asFileReference writeStream.
> >> self response writeOn: 'test2.zip'