On 6/9/07, Yen-Ju Chen <[EMAIL PROTECTED]> wrote:
> On 6/9/07, Quentin Mathé <[EMAIL PROTECTED]> wrote:
> > Le 9 juin 07 à 22:42, Yen-Ju Chen a écrit :
> >
> > > On 6/9/07, Quentin Mathé <[EMAIL PROTECTED]> wrote:
> > >>> I need to know what's the output of 'locale' command.
> > >>
> > >> LANG=
> > >> LC_COLLATE="C"
> > >> LC_CTYPE="C"
> > >> LC_MESSAGES="C"
> > >> LC_MONETARY="C"
> > >> LC_NUMERIC="C"
> > >> LC_TIME="C"
> > >> LC_ALL="C"
> > >
> > >   It is interesting that LC_CTYPE is 'C',
> > >   which means it treats all character as C encoding (ASCII ?).
> > >
> > >>> And what is your default C string encoding [NSString
> > >>> defaultCStringEncoding].
> > >>
> > >> NSMacOSRomanStringEncoding
> > >
> > >   And all Cocoa (and probably Carbon) applicaions
> > >   treats characters as Roman.
> > >   Then I wonder how Unix command, like 'more' and 'vi', see the
> > > characters.
> >
> > Badly like that:
> > $ more TestAccent.txt
> > <83>toil<8E>

I play with 'locale' and 'localedef ' a little to show the situation on mac.
Terminal.app is set to UTF8 and a file is also in UTF8 encoding.
'locale' show "LC_CTYPE=C".
'cat utf8.txt' will display the right glyph
'more utf8.txt' will show <E9><B3> ... for characters > 127
because it thinks your terminal is in 'C' encoding.
'vi utf8.txt' will shows invalid glyph (mostly ?) for characters > 127
because it tries to interpret UTF8 in 'C' encoding.
Therefore, a glyph which takes 2 characters will be interpret as 2 glyphs.
It is similar to your 'ls' result.
(Therefore, I suspect HFS is not really in UTF8,
  or 'ls' did some conversion behind).

You can change default encoding
by executing 'export LC_CTYPE=en_US.UTF-8'
The output of 'locale' should show your LC_CTYPE is UTF-8.
Now, 'cat', 'more', 'vi' should show 'utf8.txt' correctly.

This pretty much explains everything.
On mac, there is a discrepancy between Unix environment
and Cocoa (probably also carbon).
GNUstep check Unix locale to decide default encoding.
So there should not be a discrepancy.

Yen-Ju


>
>   With TermX and use UTF8 as default encoding,
>   I got different result with 'cat', 'more' and 'vi'.
>   So I don't really know which one is the correct one.
>
>   We are dealing two issues here:
>   1. What is the default encoding used by system ?
>       This encoding is probably the one for file system.
>       It should be something we can solve.
>   2. What is the encoding for a text file ?
>       This one cannot be solve solely by terminal emulator.
>       It involves the text editor and the tool you use to view it.
>       Only when both of them use the default encoding can you display
>        them correctly with terminal emulator.
>       Otherwise, the viewer has to convert the encoding of a text file
>       to the default encoding.
>       So if you use vi, you have to know which
>       encoding it uses to save the text file.
>       Most of unix command use LANG or LC_CTYPE for encoding.
>       But surprisingly your locale is 'C' even on a French system.
>       So I don't really know which encoding these Unix command use.
>       If vi think your system is in 'C' encoding,
>       you can only save file in UTF8 without losing information, I think.
>       (Not 100% sure about that).
>
> >
> > That's why I always set my text encoding to UTF-8 in Save Panel.
> >
> > >   It also raises the question what is the encoding of the file system
> > >   (filename). Is it UTF8 (compatible with ASCII) or MacOSRoman ?
> >
> > iirc HFS+ uses UTF-8.
> > Here is 'ls' output example:
> > $ ls
> > E??toile??
> > Liens a?? trier
> > Ico??ne
> >
> > This looks like UTF-8 when you consider UTF-8 is ASCII compatible and
> > only accents are wrongly intepreted here (not the whole character as
> > with Roman).
> >
> > >   A quick way to see is changing line 416 in TXTextView.m to
> > >   difference encoding and see.
> > >   It is where it decides how to convert characters into NSString.
> >
> > Do you want I try that on Mac OS X (by compiling TermX on it) or on
> > Ubuntu/GNUstep?
>
> On Mac. Thanx.
> GNUstep has font issue.
> So even you get the encoding right,
> it might not have the right font to display it.
> Therefore, you cannot tell which one is wrong (encoding or font).
>
> Yen-Ju
>
> >
> > Quentin.
> >
> >
> > _______________________________________________
> > Etoile-discuss mailing list
> > [email protected]
> > https://mail.gna.org/listinfo/etoile-discuss
> >
>

_______________________________________________
Etoile-discuss mailing list
[email protected]
https://mail.gna.org/listinfo/etoile-discuss

Répondre à