I'm trying to find a way to get the user encoding used for example for  
command line arguments e.g.:

        # creating an Hebrew file name...
        touch עברית
        ./foo.py *

 From my experience with Mac OS X 10.0-3, I know the foo.py will always  
get hebrew-name using utf-8.

You can also see this when you type non-ASCII in the Terminal:

        $ touch \327\242\327\221\327\250\327\231\327\252

Will create the file named "עברית"

I noticed that it does not matter what encoding you set in the Terminal  
window setting, anything you type will use utf-8 encoding.

Anyway, I could not find any documentation about this issue, expect  
this:

        "All BSD system functions expect their string parameters to be in  
UTF-8 encoding
        and nothing else. Code that calls BSD system routines should ensure  
that the contents of all const *char parameters are in canonical UTF-8  
encoding."
        <http://developer.apple.com/documentation/MacOSX/Conceptual/ 
BPInternational/Articles/FileEncodings.html#//apple_ref/doc/uid/ 
20002137-DontLinkElementID_4>


On Linux people are getting the encoding with:

        import locale
        locale.getpreferredencoding()

But on OS X getpreferredencoding() returns useless results, at least  
for decoding command line arguments or printing readable output. For  
example:

        1. Choose "Window Settings..." in the Terminal and set the Character  
Set Encoding to Unicode (UTF-8)
        2. Try:
        >>> import locale
        >>> locale.getpreferredencoding()
        'mac-roman'

I have found this code trying to correct the behavior (from bzrlib):

        # work around egregious python 2.4 bug
        >>> import sys
        >>> sys.platform = 'posix'
        >>> import locale
        >>> locale.getpreferredencoding()
        'US-ASCII'
        >>> sys.platform = 'darwin'

Obviously this workaround does not work around this problem :-) 
So my conclusion is that Mac OS X uses always utf-8 for input to the  
shell. Unless I am missing something?

Next, how can you get the Terminal output encoding? For example, what  
if a user changed the Character Set Encoding to Western (Mac OS Roman)  
- how can you detect this setting from Python?


Best Regards,

Nir Soffer
_______________________________________________
Pythonmac-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/pythonmac-sig

Reply via email to