Sven, I see your .st is UTF8 file:

hilaire@pchome /tmp $ file Hilaire-français.st
Hilaire-français.st: UTF-8 Unicode (with BOM) text, with CR line terminators


So I guess it should be ok to filein the Mathontologies source after
converting it to utf8.

hilaire@pchome ~/Travaux/ $ file snapshot/source2.st
snapshot/source2.st: UTF-8 Unicode text, with very long lines

But it turns there are problem in the parsing and stream got converted
set to MacRoman...


Now looking at the MacRomanTextConverter user is funny, we have this
unfactorized codes:


MultiByteBinaryOrTextStream>>setConverterForCode

    | current |
    current := converter saveStateOf: self.
    self position: 0.
    self binary.
    ((self next: 3) =  #[239 187 191]) ifTrue: [
        self converter: UTF8TextConverter new
    ] ifFalse: [
        self converter: MacRomanTextConverter new.
    ].
    converter restoreStateOf: self with: current.
    self text.


MultiByteBinaryOrTextStream>>setEncoderForSourceCodeNamed: streamName

    | l |
    l := streamName asLowercase.
    ((l endsWith: 'cs') or: [
        (l endsWith: 'st') or: [
            (l endsWith: ('st.gz')) or: [
                (l endsWith: ('st.gz'))]]]) ifTrue: [
                    self converter: MacRomanTextConverter new.
                    ^ self.
    ].
    self converter: UTF8TextConverter new.


MultiByteFileStream>>setConverterForCode

    | current |
    (SourceFiles at: 2)
        ifNotNil: [self fullName = (SourceFiles at: 2) fullName ifTrue:
[^ self]].
    current := self converter saveStateOf: self.
    self position: 0.
    self binary.
    ((self next: 3) = #[ 16rEF 16rBB 16rBF ]) ifTrue: [
        self converter: UTF8TextConverter new
    ] ifFalse: [
        self converter: MacRomanTextConverter new.
    ].
    converter restoreStateOf: self with: current.
    self text.


CodeImporter>>selectTextConverterForCode
    self flag: #fix.  "This should not be here probably."
    "We need to see the first three bytes in order to see the origin of
the file"
    readStream binary.
    ((readStream next: 3) = #[ 16rEF 16rBB 16rBF ]) ifTrue: [
        readStream converter: UTF8TextConverter new
    ] ifFalse: [
        readStream converter: MacRomanTextConverter new.
    ].

    "we restore the position to the start of the file again"
    readStream position: 0.
   
    "We put the file in text mode for the file in"
    readStream text.


AND THE WINNER IS...

#selectTextConverterForCode where the filestream is not detected as UTF8
and used converter is MacRoman...


Forcing to UTF8 the converter there, let the code to be imported. But
there are many questions. Like what should be the detection method for
encoding or why the original source.st is iso-8859 does not get imported?

Hilaire

Le 07/09/2015 17:03, Hilaire a écrit :
> So there is hope :)
>
> When debugging the import, I see that at some point the
> MultiByteFileStream got it converter changed to MacRomanTextConverter.
> It seams to happen somewhere in #parseNextDeclaration. Initially just
> before import but a the start of fileIn its converter is UTF8...
>
> Hilaire
>
> Le 07/09/2015 16:30, Sven Van Caekenberghe a écrit :
>> I created a class with one method:
>>
>> Hilaire>>#français
>>      "élève"
>>      
>>      ^ self résoudre
>>
>> I can file that out and back in again in Pharo 4.
>>
>> Here is the .st file:
>>


-- 
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu



Reply via email to