[Pharo-dev] leadingChar and friends

Pharo4Stef Sun, 02 Feb 2014 14:09:37 -0800

During my journey to the leadingChar realm I took notes and I share them with 
you.


leadingChar: leadChar code: code

        code >= 16r400000 ifTrue: [
                self error: 'code is out of range'.
        ].
        leadChar >= 256 ifTrue: [
                self error: 'lead is out of range'.
        ].
        code < 256 ifTrue: [ ^self value: code ].
        ^self value: (leadChar bitShift: 22) + code.

charCode
        ^ (value bitAnd: 16r3FFFFF).

leadingChar
        ^ (value bitAnd: (16r3FC00000)) bitShift: -22.

characterSet
        ^ EncodedCharSet charsetAt: self leadingChar

=> a character encodes the characterSet.





============================
Why are we using 
        Latin1>>leadingChar
                ^ 0.
        Unicode>>leadingChar
                ^ 0     

and I do not get why 
        GreekEnvironment>>leadingChar 
                 ^0
        Latin2Environment>>leadingChar 
                 ^0
        Latin1Environment>>leadingChar 
                 ^0
        Latin9Environment>>leadingChar 
                 ^0
        RussianEnvironment>>leadingChar 
                 ^0
        SimplifiedChineseEnvironment>>leadingChar 
                 ^0

======================
I do not understand why Unicode is declared as 1 and not 0.

Unicode class>>initialize

        
        EncodedCharSet declareEncodedCharSet: self atIndex: 0+1.
        EncodedCharSet declareEncodedCharSet: self atIndex: 256.



================================
I do not understand why Latin1 does not use declareEncodedCharSet

Latin1 class>>initialize
        "
        self initialize
"
        compoundTextSequence := String streamContents: 
                [ :s | 
                s nextPut: (Character value: 27).
                s nextPut: $(.
                s nextPut: $B ].
        rightHalfSequence := String streamContents: 
                [ :s | 
                s nextPut: (Character value: 27).
                s nextPut: $-.
                s nextPut: $A ]


I started to distribute the initialization into subclasses starting from this 
method:

declareEncodedCharSet: anEncodedCharSetOrLanguageEnvironmentClass atIndex: 
aNumber

"this method is used to modularize the old initialize method: 
        EncodedCharSets at: 0+1 put: Unicode.
        EncodedCharSets at: 1+1 put: JISX0208.
        EncodedCharSets at: 2+1 put: GB2312.
        EncodedCharSets at: 3+1 put: KSX1001.
        EncodedCharSets at: 4+1 put: JISX0208.
        EncodedCharSets at: 5+1 put: JapaneseEnvironment.
        EncodedCharSets at: 6+1 put: SimplifiedChineseEnvironment.
        EncodedCharSets at: 7+1 put: KoreanEnvironment.
        EncodedCharSets at: 8+1 put: GB2312.
        EncodedCharSets at: 12+1 put: KSX1001.
        EncodedCharSets at: 13+1 put: GreekEnvironment.
        EncodedCharSets at: 14+1 put: Latin2Environment.
        EncodedCharSets at: 15+1 put: RussianEnvironment.
        EncodedCharSets at: 17+1 put: Latin9Environment.
        EncodedCharSets at: 256 put: Unicode.

and indeed Latin1Environment was not part of the list.

Now apparently we can remove Latin1 because

        EncodedCharSets of EncodedCharSet do not contain Latin1 
        

==================================
No senders
        emitSequenceToResetStateIfNeededOn: aStream forState: state
        rightDirection


Funny 
        nextPutRightHalfValue: ascii toStream: aStream 
withShiftSequenceIfNeededForTextConverterState: state 
        nextPutValue: ascii toStream: aStream 
withShiftSequenceIfNeededForTextConverterState: state 

==========================
Ideas of cleaning steps
        

step one: 
        what if Character>>leadingChar ^ 0 

We can probably kill the encodedSet.
It looks like Latin1Environment and Latin1 are not used.

Stef

[Pharo-dev] leadingChar and friends

Reply via email to