Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

2010-08-18 Thread Stéphane Ducasse
may be you should contact yoshiki.

On Aug 18, 2010, at 9:59 AM, Philippe Marschall wrote:

> On 08/17/2010 04:55 PM, Henrik Johansen wrote:
>> 
>> On Aug 16, 2010, at 9:49 30PM, Philippe Marschall wrote:
>> 
>>> Hi
>>> 
>>> I decided to write ISO-8859-15 and CP-1252 support [1] (mostly for
>>> selfish reasons so that Seaside on Pharo would support ISO-8859-15 and
>>> CP-1252).
>> 
>> 
>> More converters are always nice :D
>> Their code seems ok to me.
>>> 
>>> A couple of notes:
>>> - the five unmapped bytes of CP-1252 (not ISO-8859-15, the comment is
>>> wrong) are mapped to the Unicode replacement character (U+FFFD)
>>> - a new Latin9 language environment is introduced
>>> - some minor clean up like removing unused class variables
>>> 
>>> I'd appreciate it if somebody knowledgeable in these areas could review
>>> the changes. I'm especially unsure about the Latin9 language
>>> environment, but reusing Latin1 or Unicode seemed wrong.
>> 
>> I'm not sure its too wrong, according to EncodedCharSet comment: 
>> "The other confusion comes from the name of "Latin1" class.  It used to mean 
>> the Latin-1 (ISO-8859-1) character set, but now it primarily means that the 
>> "Western European languages that are covered by the characters in Latin-1 
>> character set."
>> I'd reckon the same holds true for Latin1Environment (Western ), 
>> Latin2Environment (Eastern), and Latin7Environment (Greek). I don't think 
>> CP1252/8859-15 warrants the same as they are basically alternative encodings 
>> to latin1 for western languages.
>> 
>> Also: 
>> - leadingChar is used in StrikeFontSet to choose different glyph sets. This 
>> allows for StrikeFonts supporting more than the default latin1 glyphs, seems 
>> to me it would be "wrong" to use the same one for two different encodings. 
>> Not sure why this approach was taken rather than allowing additional strike 
>> font sets based on unicode code point ranges, then using leadingChar only to 
>> differentiate when the visual glyphs for those code points would be 
>> different. I suspect it maybe was developed to deal with Han unification 
>> first, then reused to support multiple character sets later.
>> 
>> - LanguageEnvironment seems to have been used in conjunction with 
>> translation (note the entire old translation system was removed in Pharo and 
>> replaced by an external package), maybe to decide which encoding externally 
>> stored translation files should be read in as.
>> Then, having environments with overlapping supportedLanguages seem somewhat 
>> weird as well.
>> Modifying defaultEncodingName/systemConverterClass of Latin1Environment to 
>> use CP1252 for some Windows systems (as per Latin2) may be another approach, 
>> may or may not lead to unintended consequences elsewhere though, I did not 
>> investigate all uses.
>> 
>> IMHO, for someone who wasn't involved in its developemnt, the whole 
>> multilingual package could use some cleaning, more class comments, and 
>> clearer statement of responsibilities.
>> 
>> Cheers,
>> Henry
>> 
>> TLDR; 
>> More converters: yay! 
>> More LanguageEnvironments: o_O, not sure
> 
> OK, if nobody says it's a good idea and the right thing to do I'll drop
> the LanguageEnvironment.
> 
> Cheers
> Philippe
> 
> 
> ___
> Pharo-project mailing list
> Pharo-project@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

2010-08-18 Thread Philippe Marschall
On 08/17/2010 04:55 PM, Henrik Johansen wrote:
> 
> On Aug 16, 2010, at 9:49 30PM, Philippe Marschall wrote:
> 
>> Hi
>>
>> I decided to write ISO-8859-15 and CP-1252 support [1] (mostly for
>> selfish reasons so that Seaside on Pharo would support ISO-8859-15 and
>> CP-1252).
> 
> 
> More converters are always nice :D
> Their code seems ok to me.
>>
>> A couple of notes:
>> - the five unmapped bytes of CP-1252 (not ISO-8859-15, the comment is
>> wrong) are mapped to the Unicode replacement character (U+FFFD)
>> - a new Latin9 language environment is introduced
>> - some minor clean up like removing unused class variables
>>
>> I'd appreciate it if somebody knowledgeable in these areas could review
>> the changes. I'm especially unsure about the Latin9 language
>> environment, but reusing Latin1 or Unicode seemed wrong.
> 
> I'm not sure its too wrong, according to EncodedCharSet comment: 
> "The other confusion comes from the name of "Latin1" class.  It used to mean 
> the Latin-1 (ISO-8859-1) character set, but now it primarily means that the 
> "Western European languages that are covered by the characters in Latin-1 
> character set."
> I'd reckon the same holds true for Latin1Environment (Western ), 
> Latin2Environment (Eastern), and Latin7Environment (Greek). I don't think 
> CP1252/8859-15 warrants the same as they are basically alternative encodings 
> to latin1 for western languages.
> 
> Also: 
> - leadingChar is used in StrikeFontSet to choose different glyph sets. This 
> allows for StrikeFonts supporting more than the default latin1 glyphs, seems 
> to me it would be "wrong" to use the same one for two different encodings. 
> Not sure why this approach was taken rather than allowing additional strike 
> font sets based on unicode code point ranges, then using leadingChar only to 
> differentiate when the visual glyphs for those code points would be 
> different. I suspect it maybe was developed to deal with Han unification 
> first, then reused to support multiple character sets later.
> 
> - LanguageEnvironment seems to have been used in conjunction with translation 
> (note the entire old translation system was removed in Pharo and replaced by 
> an external package), maybe to decide which encoding externally stored 
> translation files should be read in as.
> Then, having environments with overlapping supportedLanguages seem somewhat 
> weird as well.
> Modifying defaultEncodingName/systemConverterClass of Latin1Environment to 
> use CP1252 for some Windows systems (as per Latin2) may be another approach, 
> may or may not lead to unintended consequences elsewhere though, I did not 
> investigate all uses.
> 
> IMHO, for someone who wasn't involved in its developemnt, the whole 
> multilingual package could use some cleaning, more class comments, and 
> clearer statement of responsibilities.
> 
> Cheers,
> Henry
> 
> TLDR; 
> More converters: yay! 
> More LanguageEnvironments: o_O, not sure

OK, if nobody says it's a good idea and the right thing to do I'll drop
the LanguageEnvironment.

Cheers
Philippe


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

2010-08-17 Thread Stéphane Ducasse
henrik

thanks for the feedback.
do you have any ideas of simple comments that could help?
Because this part of pharo is just dark :)

Stef

On Aug 17, 2010, at 4:55 PM, Henrik Johansen wrote:

> 
> On Aug 16, 2010, at 9:49 30PM, Philippe Marschall wrote:
> 
>> Hi
>> 
>> I decided to write ISO-8859-15 and CP-1252 support [1] (mostly for
>> selfish reasons so that Seaside on Pharo would support ISO-8859-15 and
>> CP-1252).
> 
> 
> More converters are always nice :D
> Their code seems ok to me.
>> 
>> A couple of notes:
>> - the five unmapped bytes of CP-1252 (not ISO-8859-15, the comment is
>> wrong) are mapped to the Unicode replacement character (U+FFFD)
>> - a new Latin9 language environment is introduced
>> - some minor clean up like removing unused class variables
>> 
>> I'd appreciate it if somebody knowledgeable in these areas could review
>> the changes. I'm especially unsure about the Latin9 language
>> environment, but reusing Latin1 or Unicode seemed wrong.
> 
> I'm not sure its too wrong, according to EncodedCharSet comment: 
> "The other confusion comes from the name of "Latin1" class.  It used to mean 
> the Latin-1 (ISO-8859-1) character set, but now it primarily means that the 
> "Western European languages that are covered by the characters in Latin-1 
> character set."
> I'd reckon the same holds true for Latin1Environment (Western ), 
> Latin2Environment (Eastern), and Latin7Environment (Greek). I don't think 
> CP1252/8859-15 warrants the same as they are basically alternative encodings 
> to latin1 for western languages.
> 
> Also: 
> - leadingChar is used in StrikeFontSet to choose different glyph sets. This 
> allows for StrikeFonts supporting more than the default latin1 glyphs, seems 
> to me it would be "wrong" to use the same one for two different encodings. 
> Not sure why this approach was taken rather than allowing additional strike 
> font sets based on unicode code point ranges, then using leadingChar only to 
> differentiate when the visual glyphs for those code points would be 
> different. I suspect it maybe was developed to deal with Han unification 
> first, then reused to support multiple character sets later.
> 
> - LanguageEnvironment seems to have been used in conjunction with translation 
> (note the entire old translation system was removed in Pharo and replaced by 
> an external package), maybe to decide which encoding externally stored 
> translation files should be read in as.
> Then, having environments with overlapping supportedLanguages seem somewhat 
> weird as well.
> Modifying defaultEncodingName/systemConverterClass of Latin1Environment to 
> use CP1252 for some Windows systems (as per Latin2) may be another approach, 
> may or may not lead to unintended consequences elsewhere though, I did not 
> investigate all uses.
> 
> IMHO, for someone who wasn't involved in its developemnt, the whole 
> multilingual package could use some cleaning, more class comments, and 
> clearer statement of responsibilities.
> 
> Cheers,
> Henry
> 
> TLDR; 
> More converters: yay! 
> More LanguageEnvironments: o_O, not sure
> ___
> Pharo-project mailing list
> Pharo-project@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

2010-08-17 Thread Henrik Johansen

On Aug 16, 2010, at 9:49 30PM, Philippe Marschall wrote:

> Hi
> 
> I decided to write ISO-8859-15 and CP-1252 support [1] (mostly for
> selfish reasons so that Seaside on Pharo would support ISO-8859-15 and
> CP-1252).


More converters are always nice :D
Their code seems ok to me.
> 
> A couple of notes:
> - the five unmapped bytes of CP-1252 (not ISO-8859-15, the comment is
> wrong) are mapped to the Unicode replacement character (U+FFFD)
> - a new Latin9 language environment is introduced
> - some minor clean up like removing unused class variables
> 
> I'd appreciate it if somebody knowledgeable in these areas could review
> the changes. I'm especially unsure about the Latin9 language
> environment, but reusing Latin1 or Unicode seemed wrong.

I'm not sure its too wrong, according to EncodedCharSet comment: 
"The other confusion comes from the name of "Latin1" class.  It used to mean 
the Latin-1 (ISO-8859-1) character set, but now it primarily means that the 
"Western European languages that are covered by the characters in Latin-1 
character set."
I'd reckon the same holds true for Latin1Environment (Western ), 
Latin2Environment (Eastern), and Latin7Environment (Greek). I don't think 
CP1252/8859-15 warrants the same as they are basically alternative encodings to 
latin1 for western languages.

Also: 
- leadingChar is used in StrikeFontSet to choose different glyph sets. This 
allows for StrikeFonts supporting more than the default latin1 glyphs, seems to 
me it would be "wrong" to use the same one for two different encodings. 
Not sure why this approach was taken rather than allowing additional strike 
font sets based on unicode code point ranges, then using leadingChar only to 
differentiate when the visual glyphs for those code points would be different. 
I suspect it maybe was developed to deal with Han unification first, then 
reused to support multiple character sets later.

- LanguageEnvironment seems to have been used in conjunction with translation 
(note the entire old translation system was removed in Pharo and replaced by an 
external package), maybe to decide which encoding externally stored translation 
files should be read in as.
Then, having environments with overlapping supportedLanguages seem somewhat 
weird as well.
Modifying defaultEncodingName/systemConverterClass of Latin1Environment to use 
CP1252 for some Windows systems (as per Latin2) may be another approach, may or 
may not lead to unintended consequences elsewhere though, I did not investigate 
all uses.

IMHO, for someone who wasn't involved in its developemnt, the whole 
multilingual package could use some cleaning, more class comments, and clearer 
statement of responsibilities.

Cheers,
Henry

TLDR; 
More converters: yay! 
More LanguageEnvironments: o_O, not sure
___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

2010-08-17 Thread Philippe Marschall
On 08/16/2010 09:49 PM, Philippe Marschall wrote:
> Hi
> 
> I decided to write ISO-8859-15 and CP-1252 support [1] (mostly for
> selfish reasons so that Seaside on Pharo would support ISO-8859-15 and
> CP-1252).
> 
> A couple of notes:
>  - the five unmapped bytes of CP-1252 (not ISO-8859-15, the comment is
> wrong) are mapped to the Unicode replacement character (U+FFFD)
>  - a new Latin9 language environment is introduced

I also snatched the first free leading char (17) for this.

Cheers
Philippe


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


[Pharo-project] adding ISO-8859-15 and CP-1252 support

2010-08-16 Thread Philippe Marschall
Hi

I decided to write ISO-8859-15 and CP-1252 support [1] (mostly for
selfish reasons so that Seaside on Pharo would support ISO-8859-15 and
CP-1252).

A couple of notes:
 - the five unmapped bytes of CP-1252 (not ISO-8859-15, the comment is
wrong) are mapped to the Unicode replacement character (U+FFFD)
 - a new Latin9 language environment is introduced
 - some minor clean up like removing unused class variables

I'd appreciate it if somebody knowledgeable in these areas could review
the changes. I'm especially unsure about the Latin9 language
environment, but reusing Latin1 or Unicode seemed wrong.

 [1] http://code.google.com/p/pharo/issues/detail?id=2812

Cheers
Philippe


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project