Re: WTF? Printing unicode strings
[EMAIL PROTECTED] wrote: > Learn something every day. I take it "646" is an alias for "ascii" (or vice > versa)? Usage of "646" as an alias for ASCII is primarily a Sun invention. When ASCII became an international standard, its standard number became ISO/IEC 646:1968. It's not *quite* the same as ASCII, as it leaves a certain number of code points unassigned that ASCII defines (most notably, the dollar sign, and the square and curly braces). What Sun means is probably the "International Reference Version" of ISO 646, which is (now) identical to ASCII. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
In article <[EMAIL PROTECTED]>, "Serge Orlov" <[EMAIL PROTECTED]> wrote: > Serge Orlov wrote: > > Ron Garret wrote: > > > In article <[EMAIL PROTECTED]>, > > > "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > > > > > > Ron Garret wrote: > > > > > > > I'm using an OS X terminal to ssh to a Linux machine. > > > > > > > > > > > > In theory it should work out of the box. OS X terminal should set > > > > > > enviromental variable LANG=en_US.utf-8, then ssh should transfer > > > > > > this > > > > > > variable to Linux and python will know that your terminal is utf-8. > > > > > > Unfortunately AFAIK OS X terminal doesn't set that variable and > > > > > > most > > > > > > (all?) ssh clients don't transfer it between machines. As a > > > > > > workaround > > > > > > you can set that variable on linux yourself . This should work in > > > > > > the > > > > > > command line right away: > > > > > > > > > > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > > > > > > > > > Or put the following line in ~/.bashrc and logout/login > > > > > > > > > > > > export LANG=en_US.utf-8 > > > > > > > > > > No joy. > > > > > > > > > > [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > > Traceback (most recent call last): > > > > > File "", line 1, in ? > > > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > > > > > position 0: ordinal not in range(128) > > > > > [EMAIL PROTECTED]:~$ > > > > > > > > What version of python and what shell do you run? What the following > > > > commands print: > > > > > > > > python -V > > > > echo $SHELL > > > > $SHELL --version > > > > > > [EMAIL PROTECTED]:~$ python -V > > > Python 2.3.4 > > > [EMAIL PROTECTED]:~$ echo $SHELL > > > /bin/bash > > > [EMAIL PROTECTED]:~$ $SHELL --version > > > GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu) > > > Copyright (C) 2002 Free Software Foundation, Inc. > > > [EMAIL PROTECTED]:~$ > > > > That's recent enough. I guess the distribution you're using set LC_* > > variables for no good reason. Either unset all enviromental variables > > starting with LC_ and set LANG variable or overide LC_CTYPE variable: > > > > LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)" > > > > Should be working now :) > > I've pulled myself together and installed linux in vwware player. > Apparently there is another way linux distributors can screw up. I > chose debian 3.1 minimal network install and after answering all > installation questions I found that only ascii and latin-1 english > locales were installed: > $ locale -a > C > en_US > en_US.iso88591 > POSIX > > In 2006, I would expect utf-8 english locale to be present even in > minimal install. I had to edit /etc/locale.gen and run locale-gen as > root. After that python started to print unicode characters. That's it. Thanks! rg -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
In article <[EMAIL PROTECTED]>, "Serge Orlov" <[EMAIL PROTECTED]> wrote: > Ron Garret wrote: > > In article <[EMAIL PROTECTED]>, > > "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > > > > Ron Garret wrote: > > > > > > I'm using an OS X terminal to ssh to a Linux machine. > > > > > > > > > > In theory it should work out of the box. OS X terminal should set > > > > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this > > > > > variable to Linux and python will know that your terminal is utf-8. > > > > > Unfortunately AFAIK OS X terminal doesn't set that variable and most > > > > > (all?) ssh clients don't transfer it between machines. As a workaround > > > > > you can set that variable on linux yourself . This should work in the > > > > > command line right away: > > > > > > > > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > > > > > > > Or put the following line in ~/.bashrc and logout/login > > > > > > > > > > export LANG=en_US.utf-8 > > > > > > > > No joy. > > > > > > > > [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > Traceback (most recent call last): > > > > File "", line 1, in ? > > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > > > > position 0: ordinal not in range(128) > > > > [EMAIL PROTECTED]:~$ > > > > > > What version of python and what shell do you run? What the following > > > commands print: > > > > > > python -V > > > echo $SHELL > > > $SHELL --version > > > > [EMAIL PROTECTED]:~$ python -V > > Python 2.3.4 > > [EMAIL PROTECTED]:~$ echo $SHELL > > /bin/bash > > [EMAIL PROTECTED]:~$ $SHELL --version > > GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu) > > Copyright (C) 2002 Free Software Foundation, Inc. > > [EMAIL PROTECTED]:~$ > > That's recent enough. I guess the distribution you're using set LC_* > variables for no good reason. Nope: [EMAIL PROTECTED]:~$ export | grep LC [EMAIL PROTECTED]:~$ > Either unset all enviromental variables > starting with LC_ and set LANG variable or overide LC_CTYPE variable: > > LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)" > > Should be working now :) Nope: [EMAIL PROTECTED]:~$ LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)" Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in range(128) rg -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
[EMAIL PROTECTED] wrote: > Robert> Because sys.stdout.encoding isn't determined by your Python > Robert> configuration, but your terminal's. > > Learn something every day. I take it "646" is an alias for "ascii" (or vice > versa)? > > % python > Python 2.4.2 (#1, Feb 23 2006, 12:48:31) > [GCC 3.4.1] on sunos5 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.stdout.encoding > '646' > >>> import codecs > >>> codecs.lookup("646") > (, , > , encodings.ascii.StreamWriter at 0x819aa1c>) Yes. In encodings/aliases.py in the standard library: """ aliases = { # Please keep this list sorted alphabetically by value ! # ascii codec '646': 'ascii', """ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
[EMAIL PROTECTED] wrote: > John> Hmm, not that this helps me any :) > > import sys > sys.stdout.encoding > John> 'cp1252' > > Sure it does. You can print Unicode objects which map to cp1252. I assume > that means you're on Windows or that for some perverse reason you have your > Mac's Terminal window set to cp1252. (Does it go there? I'm at work right > now so I can't check). > > Skip You're right, I'm on XP. I just couldn't make sense of the lookup call, although some of the names looked like .NET classes. -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
John> Hmm, not that this helps me any :) import sys sys.stdout.encoding John> 'cp1252' Sure it does. You can print Unicode objects which map to cp1252. I assume that means you're on Windows or that for some perverse reason you have your Mac's Terminal window set to cp1252. (Does it go there? I'm at work right now so I can't check). Skip -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
[EMAIL PROTECTED] wrote: > Robert> Because sys.stdout.encoding isn't determined by your Python > Robert> configuration, but your terminal's. > > Learn something every day. I take it "646" is an alias for "ascii" (or vice > versa)? Hmm, not that this helps me any :) >>> import sys >>> sys.stdout.encoding 'cp1252' >>> import codecs >>> codecs.lookup('cp1252') (>, >, , ) >>> -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Robert> Because sys.stdout.encoding isn't determined by your Python Robert> configuration, but your terminal's. Learn something every day. I take it "646" is an alias for "ascii" (or vice versa)? % python Python 2.4.2 (#1, Feb 23 2006, 12:48:31) [GCC 3.4.1] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.encoding '646' >>> import codecs >>> codecs.lookup("646") (, , , ) Skip -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
John Salerno wrote: > AFAIK, I'm all ASCII (at least, I never made explicit changes to the > default Python install), so how am I able to print out the character? Because sys.stdout.encoding isn't determined by your Python configuration, but your terminal's. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Fredrik Lundh wrote: > Ron Garret wrote: > > u'\xbd' >> u'\xbd' > print _ >> Traceback (most recent call last): >> File "", line 1, in ? >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in >> position 0: ordinal not in range(128) > > so stdout on your machine is ascii, and you don't understand why you > cannot print a non-ascii unicode character to it? wtf? > > > AFAIK, I'm all ASCII (at least, I never made explicit changes to the default Python install), so how am I able to print out the character? -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret a écrit : > In article <[EMAIL PROTECTED]>, > Fredrik Lundh <[EMAIL PROTECTED]> wrote: > >> Ron Garret wrote: >> >> u'\xbd' >>> u'\xbd' >> print _ >>> Traceback (most recent call last): >>> File "", line 1, in ? >>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in >>> position 0: ordinal not in range(128) >> so stdout on your machine is ascii, and you don't understand why you >> cannot print a non-ascii unicode character to it? wtf? >> >> > > I forgot to mention: > sys.getdefaultencoding() > 'utf-8' print u'\xbd' > Traceback (most recent call last): > File "", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > position 0: ordinal not in range(128) This is default encoding for evaluation of expressions in u"..." strings, this has nothing to do with printing. For the output encoding, see sys.stdout.encoding. >>> import sys >>> sys.stdout.encoding 'cp850' >>> A+ Laurent. -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Serge Orlov wrote: > Ron Garret wrote: > > In article <[EMAIL PROTECTED]>, > > "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > > > > Ron Garret wrote: > > > > > > I'm using an OS X terminal to ssh to a Linux machine. > > > > > > > > > > In theory it should work out of the box. OS X terminal should set > > > > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this > > > > > variable to Linux and python will know that your terminal is utf-8. > > > > > Unfortunately AFAIK OS X terminal doesn't set that variable and most > > > > > (all?) ssh clients don't transfer it between machines. As a workaround > > > > > you can set that variable on linux yourself . This should work in the > > > > > command line right away: > > > > > > > > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > > > > > > > Or put the following line in ~/.bashrc and logout/login > > > > > > > > > > export LANG=en_US.utf-8 > > > > > > > > No joy. > > > > > > > > [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > Traceback (most recent call last): > > > > File "", line 1, in ? > > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > > > > position 0: ordinal not in range(128) > > > > [EMAIL PROTECTED]:~$ > > > > > > What version of python and what shell do you run? What the following > > > commands print: > > > > > > python -V > > > echo $SHELL > > > $SHELL --version > > > > [EMAIL PROTECTED]:~$ python -V > > Python 2.3.4 > > [EMAIL PROTECTED]:~$ echo $SHELL > > /bin/bash > > [EMAIL PROTECTED]:~$ $SHELL --version > > GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu) > > Copyright (C) 2002 Free Software Foundation, Inc. > > [EMAIL PROTECTED]:~$ > > That's recent enough. I guess the distribution you're using set LC_* > variables for no good reason. Either unset all enviromental variables > starting with LC_ and set LANG variable or overide LC_CTYPE variable: > > LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)" > > Should be working now :) I've pulled myself together and installed linux in vwware player. Apparently there is another way linux distributors can screw up. I chose debian 3.1 minimal network install and after answering all installation questions I found that only ascii and latin-1 english locales were installed: $ locale -a C en_US en_US.iso88591 POSIX In 2006, I would expect utf-8 english locale to be present even in minimal install. I had to edit /etc/locale.gen and run locale-gen as root. After that python started to print unicode characters. -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > In article <[EMAIL PROTECTED]>, > "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > > Ron Garret wrote: > > > > > I'm using an OS X terminal to ssh to a Linux machine. > > > > > > > > In theory it should work out of the box. OS X terminal should set > > > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this > > > > variable to Linux and python will know that your terminal is utf-8. > > > > Unfortunately AFAIK OS X terminal doesn't set that variable and most > > > > (all?) ssh clients don't transfer it between machines. As a workaround > > > > you can set that variable on linux yourself . This should work in the > > > > command line right away: > > > > > > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > > > > > Or put the following line in ~/.bashrc and logout/login > > > > > > > > export LANG=en_US.utf-8 > > > > > > No joy. > > > > > > [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > Traceback (most recent call last): > > > File "", line 1, in ? > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > > > position 0: ordinal not in range(128) > > > [EMAIL PROTECTED]:~$ > > > > What version of python and what shell do you run? What the following > > commands print: > > > > python -V > > echo $SHELL > > $SHELL --version > > [EMAIL PROTECTED]:~$ python -V > Python 2.3.4 > [EMAIL PROTECTED]:~$ echo $SHELL > /bin/bash > [EMAIL PROTECTED]:~$ $SHELL --version > GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu) > Copyright (C) 2002 Free Software Foundation, Inc. > [EMAIL PROTECTED]:~$ That's recent enough. I guess the distribution you're using set LC_* variables for no good reason. Either unset all enviromental variables starting with LC_ and set LANG variable or overide LC_CTYPE variable: LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)" Should be working now :) -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
In article <[EMAIL PROTECTED]>, "Serge Orlov" <[EMAIL PROTECTED]> wrote: > Ron Garret wrote: > > > > I'm using an OS X terminal to ssh to a Linux machine. > > > > > > In theory it should work out of the box. OS X terminal should set > > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this > > > variable to Linux and python will know that your terminal is utf-8. > > > Unfortunately AFAIK OS X terminal doesn't set that variable and most > > > (all?) ssh clients don't transfer it between machines. As a workaround > > > you can set that variable on linux yourself . This should work in the > > > command line right away: > > > > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > > > Or put the following line in ~/.bashrc and logout/login > > > > > > export LANG=en_US.utf-8 > > > > No joy. > > > > [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > > position 0: ordinal not in range(128) > > [EMAIL PROTECTED]:~$ > > What version of python and what shell do you run? What the following > commands print: > > python -V > echo $SHELL > $SHELL --version [EMAIL PROTECTED]:~$ python -V Python 2.3.4 [EMAIL PROTECTED]:~$ echo $SHELL /bin/bash [EMAIL PROTECTED]:~$ $SHELL --version GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu) Copyright (C) 2002 Free Software Foundation, Inc. [EMAIL PROTECTED]:~$ -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > > > I'm using an OS X terminal to ssh to a Linux machine. > > > > In theory it should work out of the box. OS X terminal should set > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this > > variable to Linux and python will know that your terminal is utf-8. > > Unfortunately AFAIK OS X terminal doesn't set that variable and most > > (all?) ssh clients don't transfer it between machines. As a workaround > > you can set that variable on linux yourself . This should work in the > > command line right away: > > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > > > Or put the following line in ~/.bashrc and logout/login > > > > export LANG=en_US.utf-8 > > No joy. > > [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)" > Traceback (most recent call last): > File "", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > position 0: ordinal not in range(128) > [EMAIL PROTECTED]:~$ What version of python and what shell do you run? What the following commands print: python -V echo $SHELL $SHELL --version -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
In article <[EMAIL PROTECTED]>, "Serge Orlov" <[EMAIL PROTECTED]> wrote: > Ron Garret wrote: > > In article <[EMAIL PROTECTED]>, > > "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > > > > Ron Garret wrote: > > > > In article <[EMAIL PROTECTED]>, > > > > Robert Kern <[EMAIL PROTECTED]> wrote: > > > > > > > > > Ron Garret wrote: > > > > > > > > > > > I forgot to mention: > > > > > > > > > > > sys.getdefaultencoding() > > > > > > > > > > > > 'utf-8' > > > > > > > > > > A) You shouldn't be able to do that. > > > > > > > > What can I say? I can. > > > > > > > > > B) Don't do that. > > > > > > > > OK. What should I do instead? > > > > > > Exact answer depends on what OS and terminal you are using and what > > > your program is supposed to do, are you going to distribute the program > > > or it's just for internal use. > > > > I'm using an OS X terminal to ssh to a Linux machine. > > In theory it should work out of the box. OS X terminal should set > enviromental variable LANG=en_US.utf-8, then ssh should transfer this > variable to Linux and python will know that your terminal is utf-8. > Unfortunately AFAIK OS X terminal doesn't set that variable and most > (all?) ssh clients don't transfer it between machines. As a workaround > you can set that variable on linux yourself . This should work in the > command line right away: > > LANG=en_US.utf-8 python -c "print unichr(0xbd)" > > Or put the following line in ~/.bashrc and logout/login > > export LANG=en_US.utf-8 No joy. [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)" Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in range(128) [EMAIL PROTECTED]:~$ rg -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > In article <[EMAIL PROTECTED]>, > Robert Kern <[EMAIL PROTECTED]> wrote: > >>Ron Garret wrote: >> >>>I'm using an OS X terminal to ssh to a Linux machine. >> >>Click on the "Terminal" menu, then "Window Settings...". Choose "Display" >>from >>the combobox. At the bottom you will see a combobox title "Character Set >>Encoding". Choose "Unicode (UTF-8)". > > It was already set to UTF-8. Then take a look at your LANG environment variable on your Linux machine. For example, I have LANG=en_US.UTF-8 on my Linux machine, and I can ssh into it from a UTF-8-configured Terminal.app and print unicode strings just fine. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > In article <[EMAIL PROTECTED]>, > "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > > Ron Garret wrote: > > > In article <[EMAIL PROTECTED]>, > > > Robert Kern <[EMAIL PROTECTED]> wrote: > > > > > > > Ron Garret wrote: > > > > > > > > > I forgot to mention: > > > > > > > > > sys.getdefaultencoding() > > > > > > > > > > 'utf-8' > > > > > > > > A) You shouldn't be able to do that. > > > > > > What can I say? I can. > > > > > > > B) Don't do that. > > > > > > OK. What should I do instead? > > > > Exact answer depends on what OS and terminal you are using and what > > your program is supposed to do, are you going to distribute the program > > or it's just for internal use. > > I'm using an OS X terminal to ssh to a Linux machine. In theory it should work out of the box. OS X terminal should set enviromental variable LANG=en_US.utf-8, then ssh should transfer this variable to Linux and python will know that your terminal is utf-8. Unfortunately AFAIK OS X terminal doesn't set that variable and most (all?) ssh clients don't transfer it between machines. As a workaround you can set that variable on linux yourself . This should work in the command line right away: LANG=en_US.utf-8 python -c "print unichr(0xbd)" Or put the following line in ~/.bashrc and logout/login export LANG=en_US.utf-8 -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
In article <[EMAIL PROTECTED]>, Robert Kern <[EMAIL PROTECTED]> wrote: > Ron Garret wrote: > > > I'm using an OS X terminal to ssh to a Linux machine. > > Click on the "Terminal" menu, then "Window Settings...". Choose "Display" > from > the combobox. At the bottom you will see a combobox title "Character Set > Encoding". Choose "Unicode (UTF-8)". It was already set to UTF-8. > > But what about this: > > > f2=open('foo','w') > f2.write(u'\xFF') > > > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in > > position 0: ordinal not in range(128) > > > > That should have nothing to do with my terminal, right? > > Correct, that is a different problem. f.write() expects a string of bytes, > not a > unicode string. In order to convert unicode strings to byte strings without > an > explicit .encode() method call, Python uses the default encoding which is > 'ascii'. It's not easily changeable for a good reason. Your modules won't > work > on anyone else's machine if you hack that setting. OK. > > I just found http://www.amk.ca/python/howto/unicode, which seems to be > > enlightening. The answer seems to be something like: > > > > import codecs > > f = codecs.open('foo','w','utf-8') > > > > but that seems pretty awkward. > > About as clean as it gets when dealing with text encodings. OK. Thanks. rg -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > > But what about this: > > >>> f2=open('foo','w') > >>> f2.write(u'\xFF') > Traceback (most recent call last): > File "", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in > position 0: ordinal not in range(128) > >>> > > That should have nothing to do with my terminal, right? Correct. But first try to answer this: given that you want to write the Unicode character value 255 to a file, how is that character to be represented in the file? For example, one might think that one could just get a byte whose value is 255 and write that to a file, but what happens if one chooses a Unicode character whose value is greater than 255? One could use two bytes or three bytes or as many as one needs, but what if the lowest 8 bits of that value are all set? How would one know, if one reads a file back and gets a byte whose value is 255 whether it represents a character all by itself or is part of another character's representation? It gets complicated! The solution is that you choose an encoding which allows you to store the characters in the file, thus answering indirectly the question above: encodings determine how the characters are represented in the file and allow you to read the file and get back the characters you put into it. One of the most common encodings suitable for the storage of Unicode character values is UTF-8, which has been designed with the above complications in mind, but as long as you remember to choose an encoding, you don't have to think about it: Python takes care of the difficult stuff on your behalf. In the above code you haven't made that choice. So, to answer the above question, you can either... * Use the encode method on Unicode objects to turn them into plain strings, then write them to a file - at that point, you are writing specific byte values. * Use the codecs.open function and other codecs module features to write Unicode objects directly to files and streams - here, the module's infrastructure deals with byte-level issues. * If you're using something like an XML library, you can often pass a normal file or stream object to some function or method whilst stating the output encoding. There is no universally correct answer to which encoding should be used when writing Unicode character values to files, contrary to some beliefs and opinions which, for example, lead to people pretending that everything is in UTF-8 in order to appease legacy applications with the minimum of tweaks necessary to stop them from breaking completely. Thus, Python doesn't make a decision for you here. Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > I'm using an OS X terminal to ssh to a Linux machine. Click on the "Terminal" menu, then "Window Settings...". Choose "Display" from the combobox. At the bottom you will see a combobox title "Character Set Encoding". Choose "Unicode (UTF-8)". > But what about this: > f2=open('foo','w') f2.write(u'\xFF') > > Traceback (most recent call last): > File "", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in > position 0: ordinal not in range(128) > > That should have nothing to do with my terminal, right? Correct, that is a different problem. f.write() expects a string of bytes, not a unicode string. In order to convert unicode strings to byte strings without an explicit .encode() method call, Python uses the default encoding which is 'ascii'. It's not easily changeable for a good reason. Your modules won't work on anyone else's machine if you hack that setting. > I just found http://www.amk.ca/python/howto/unicode, which seems to be > enlightening. The answer seems to be something like: > > import codecs > f = codecs.open('foo','w','utf-8') > > but that seems pretty awkward. About as clean as it gets when dealing with text encodings. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
In article <[EMAIL PROTECTED]>, "Serge Orlov" <[EMAIL PROTECTED]> wrote: > Ron Garret wrote: > > In article <[EMAIL PROTECTED]>, > > Robert Kern <[EMAIL PROTECTED]> wrote: > > > > > Ron Garret wrote: > > > > > > > I forgot to mention: > > > > > > > sys.getdefaultencoding() > > > > > > > > 'utf-8' > > > > > > A) You shouldn't be able to do that. > > > > What can I say? I can. > > > > > B) Don't do that. > > > > OK. What should I do instead? > > Exact answer depends on what OS and terminal you are using and what > your program is supposed to do, are you going to distribute the program > or it's just for internal use. I'm using an OS X terminal to ssh to a Linux machine. But what about this: >>> f2=open('foo','w') >>> f2.write(u'\xFF') Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128) >>> That should have nothing to do with my terminal, right? I just found http://www.amk.ca/python/howto/unicode, which seems to be enlightening. The answer seems to be something like: import codecs f = codecs.open('foo','w','utf-8') but that seems pretty awkward. rg -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > In article <[EMAIL PROTECTED]>, > Robert Kern <[EMAIL PROTECTED]> wrote: > > > Ron Garret wrote: > > > > > I forgot to mention: > > > > > sys.getdefaultencoding() > > > > > > 'utf-8' > > > > A) You shouldn't be able to do that. > > What can I say? I can. > > > B) Don't do that. > > OK. What should I do instead? Exact answer depends on what OS and terminal you are using and what your program is supposed to do, are you going to distribute the program or it's just for internal use. -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > In article <[EMAIL PROTECTED]>, > Robert Kern <[EMAIL PROTECTED]> wrote: > >>Ron Garret wrote: >> >>>I forgot to mention: >>> >>> >>sys.getdefaultencoding() >>> >>>'utf-8' >> >>A) You shouldn't be able to do that. > > What can I say? I can. See B). >>B) Don't do that. > > OK. What should I do instead? See below. >>C) It's not relevant to the encoding of stdout which determines how unicode >>strings get converted to bytes when printing them: >> >import sys >sys.stdout.encoding >> >>'UTF-8' >> >sys.getdefaultencoding() >> >>'ascii' >> >print u'\xbd' >> >>1⁄2 > > OK, so how am I supposed to change the encoding of sys.stdout? It comes > up as US-ASCII on my system. Simply setting it doesn't work: You will have to use a terminal that accepts UTF-8. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
In article <[EMAIL PROTECTED]>, Robert Kern <[EMAIL PROTECTED]> wrote: > Ron Garret wrote: > > > I forgot to mention: > > > sys.getdefaultencoding() > > > > 'utf-8' > > A) You shouldn't be able to do that. What can I say? I can. > B) Don't do that. OK. What should I do instead? > C) It's not relevant to the encoding of stdout which determines how unicode > strings get converted to bytes when printing them: > > >>> import sys > >>> sys.stdout.encoding > 'UTF-8' > >>> sys.getdefaultencoding() > 'ascii' > >>> print u'\xbd' > 1â2 OK, so how am I supposed to change the encoding of sys.stdout? It comes up as US-ASCII on my system. Simply setting it doesn't work: >>> import sys >>> sys.stdout.encoding='utf-8' Traceback (most recent call last): File "", line 1, in ? TypeError: readonly attribute >>> rg -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: > I forgot to mention: > sys.getdefaultencoding() > > 'utf-8' A) You shouldn't be able to do that. B) Don't do that. C) It's not relevant to the encoding of stdout which determines how unicode strings get converted to bytes when printing them: >>> import sys >>> sys.stdout.encoding 'UTF-8' >>> sys.getdefaultencoding() 'ascii' >>> print u'\xbd' ½ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
In article <[EMAIL PROTECTED]>, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > Ron Garret wrote: > > u'\xbd' > > u'\xbd' > print _ > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > > position 0: ordinal not in range(128) > > so stdout on your machine is ascii, and you don't understand why you > cannot print a non-ascii unicode character to it? wtf? > > I forgot to mention: >>> sys.getdefaultencoding() 'utf-8' >>> print u'\xbd' Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in range(128) >>> -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: u'\xbd' > u'\xbd' print _ > Traceback (most recent call last): > File "", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > position 0: ordinal not in range(128) so stdout on your machine is ascii, and you don't understand why you cannot print a non-ascii unicode character to it? wtf? -- http://mail.python.org/mailman/listinfo/python-list
Re: WTF? Printing unicode strings
Ron Garret wrote: u'\xbd' > u'\xbd' print _ > Traceback (most recent call last): > File "", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in > position 0: ordinal not in range(128) Not sure if this really helps you, but: >>> u'\xbd' u'\xbd' >>> print _ ½ >>> -- http://mail.python.org/mailman/listinfo/python-list
WTF? Printing unicode strings
>>> u'\xbd' u'\xbd' >>> print _ Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in range(128) >>> -- http://mail.python.org/mailman/listinfo/python-list