En Mon, 26 Jan 2009 00:23:30 -0200, John Machin <sjmac...@lexicon.net> escribió:
On Jan 26, 1:03 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:

It's so easy that don't doing that is just inexcusable lazyness :)
Your own example, written using the csv module:

import csv

f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
headers = f.next()
for line in f:
     field1, field2, field3 = line
     do_stuff()

And where in all of that do you recommend that .decode(some_encoding)
be inserted?

For encodings that don't use embedded NUL bytes (latin1, utf8) I'd decode the fields right when extracting them:

    field1, field2, field3 = (field.decode('utf8') for field in line)

For encodings that allow NUL bytes, I'd use any of the recipes in the csv module documentation.

(That is, if I care about the encoding at all. Perhaps the file contains only numbers. Perhaps it contains only ASCII characters. Perhaps I'm only interested in some fields for which the encoding is irrelevant. Perhaps it is an internally generated file and it doesn't matter as long as I use the same encoding on output) But I admit that in general, the "decode input early when reading, work in unicode, encode output late when writing" is the best practice.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to