En Tue, 15 Dec 2009 19:12:01 -0300, Emmanuel <manou...@gmail.com> escribió:

Then my problem is diferent!

In fact I'm reading a csv file saved from openoffice oocalc using
UTF-8 encoding. I get a list of list (let's cal it tab) with the csv
data.
If I do:

print tab[2][4]
In ipython, I get:
equação de Toricelli. Tarefa exercícios PVR 1 e 2 ; PVP 1

If I only do:
tab[2][4]

In ipython, I get:
'equa\xc3\xa7\xc3\xa3o de Toricelli. Tarefa exerc\xc3\xadcios PVR 1 e
2 ; PVP 1'

Does that mean that my problem is not the one I'm thinking?

Yes. You have a real problem, but not this one. When you say `print something`, you get a nice view of `something`, basically the result of doing `str(something)`. When you say `something` alone in the interpreter, you get a more formal representation, the result of calling `repr(something)`:

py> x = "ecuação"
py> print x
ecuação
py> x
'ecua\x87\xc6o'
py> print repr(x)
'ecua\x87\xc6o'

Those '' around the text and the \xNN notation allow for an unambiguous representation. Two strings may "look like" the same but be different, and repr shows that. ('ecua\x87\xc6o' is encoded in windows-1252; you should see 'equa\xc3\xa7\xc3\xa3o' in utf-8)

My real problem is when I use that that kind of UTF-8 encoded (?) with
selenium here.
If I just switch the folowing line:
self.sel.type("q", "equação")

by:
self.sel.type("q", u"equação")


It works fine!

Yes: you should work with unicode most of the time. The "recipe" for having as little unicode problems as possible says:

- convert the input data (read from external sources, like a file) from bytes to unicode, using the (known) encoding of those bytes

- handle unicode internally everywhere in your program

- and convert from unicode to bytes as late as possible, when writing output (to screen, other files, etc) using the encoding expected by those external files.

See the Unicode How To: http://docs.python.org/howto/unicode.html

The problem is that the csv.reader does give a "equação" and not a
u"equação"

The csv module cannot handle unicode text directly, but see the last example in the csv documentation for a simple workaround: http://docs.python.org/library/csv.html

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to