En Mon, 26 Jan 2009 00:23:30 -0200, John Machin <sjmac...@lexicon.net>
escribió:
On Jan 26, 1:03 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
It's so easy that don't doing that is just inexcusable lazyness :)
Your own example, written using the csv module:
import csv
f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
headers = f.next()
for line in f:
field1, field2, field3 = line
do_stuff()
And where in all of that do you recommend that .decode(some_encoding)
be inserted?
For encodings that don't use embedded NUL bytes (latin1, utf8) I'd decode
the fields right when extracting them:
field1, field2, field3 = (field.decode('utf8') for field in line)
For encodings that allow NUL bytes, I'd use any of the recipes in the csv
module documentation.
(That is, if I care about the encoding at all. Perhaps the file contains
only numbers. Perhaps it contains only ASCII characters. Perhaps I'm only
interested in some fields for which the encoding is irrelevant. Perhaps it
is an internally generated file and it doesn't matter as long as I use the
same encoding on output)
But I admit that in general, the "decode input early when reading, work in
unicode, encode output late when writing" is the best practice.
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list