Re: [Pharo-project] How to "fix" "broken" mcz files?

Janko Mivšek Mon, 06 May 2013 03:18:31 -0700

Hans, can you try to find those methods with non-ASCII characters in
Aida mcz-s and I'll try to get rid of it?


I suspect the translations to other languages as part of Aida
multilingual support can be a problem. Those translations are stored in
methods as UTF8 encoded. See WebSecurityManagerApp class, category
'translations'.

If this is it, then I'll try to store them as Base64 encoded instead.

Best regards
Janko


Dne 05. 05. 2013 10:51, piše Norbert Hartl:
> 
> Am 05.05.2013 um 10:07 schrieb Holger Hans Peter Freyther 
> <hol...@freyther.de>:
> 
>> Hi,
>>
>> when I port a project to GNU Smalltalk I tend to use the snapshot/*.st
>> and convert it. Now with some MCZ versions of Aida/Iliad this is failing
>> because the fileout is broken. The fileout is broken in a way that at
>> some point (without a BOM) the creator started to use UCS-4 (or such) for
>> the strings.
>>
>> This can be seen here[1] and either manually extracting the source and
>> using FileStream>>#fileIn: or by using the MczInstaller on the mcz file
>> (which is not working on the snapshot of the MCDefinition) the import
>> will fail.
>>
>> Is this a known/fixed problem with Monticello/Pharo? Is there a way to
>> re-create the source.st from the snapshot of the MCDefinitions?
>>
> Yes, the problem is known. Monticello has no handling for encoding. The last 
> time I looked into it I could see that Monticello is assuming a latin1 
> encoding. As soon as you include a non-latin1 character in the source it will 
> be turned into a WideString. When this is written to disk either our 
> UCS-4+leadingChar format is written or even worse in way that every byte of a 
> WideString is latin1 encoded then. I'm not sure in any way it isn't the right 
> way to do it.
> I started to fix this a couple of years ago but as most of the time the 
> problem is deeply embedded in the image and grows the longer you look at it. 
> And that exceeds my time frame I have for these things massively.
> 
> The problem is easier to fix for the .st file because in case of String 
> representations ('') or the usage of the String class it is platform 
> independent. In the binary blob the platform specific classes like WideString 
> appear that make it unreadable on other platforms. Here the canonical way of 
> encoding something in utf-8 would also mean that platform dependent class are 
> treated in the same canonical way to use only String instead of platform 
> dependent ones. A platform that reads a monticello file gets utf-8 decodes it 
> and then on occurrence of a wide character would then turn it into a platform 
> dependent class, etc.
> 
> So, in order to "fix" this I think the only feasible way is to get rid of 
> non-latin1 characters in the source and save the package again. This is how 
> it is done, e.g. in seaside. If someone really needs some non latin1 
> characters they should be included programmatically, meaning at the right 
> position in code use "Character value: …"
> 
> Norbert
> 
> 
> 

-- 
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Re: [Pharo-project] How to "fix" "broken" mcz files?

Reply via email to