I have a bunch of csv files that have the following characteristics:

- field delimiter is a comma
- all fields quoted with double quotes
- lines terminated by a *space* followed by a newline

What surprised me was that the csv reader included the trailing space in 
the final field value returned, even though it is outside of the quotes. 


I've produced a test program (see below) that demonstrates this. There 
is a workaround, which is to not pass the csv reader the file iterator, 
but rather a generator that returns lines from the file with the 
trailing space stripped.

Interestingly, the same behaviour is seen if there are spaces before the 
field separator. They are also included in the preceding field value, 
even if they are outside the quotations. My workaround wouldn't help here.

Anyway is this a bug or a feature? If it is a feature then I'm curious 
as to why it is considered desirable behaviour.

- Andrew



import csv
filename = "test_data.csv"

# Generate a test file - note the spaces before the newlines
fout = open(filename, "wb")
fout.write('"Field1","Field2","Field3" \n')
fout.write('"a","b","c" \n')
fout.write('"d" ,"e","f" \n')
fout.close()

# Function to test a reader
def read_and_print(reader):
     for line in reader:
         print ",".join(['"%s"' % field for field in line])

# Read the test file - and print the output
reader = csv.reader(open("test_data.csv", "rb"))
read_and_print(reader)

# Now the workaround: a generator to strip the strings before the reader 
decodes them
def stripped(input):
     for line in input:
         yield line.strip()
reader = csv.reader(stripped(open("test_data.csv", "rb")))
read_and_print(reader)

# Try using lineterminator instead - it doesn't work
reader = csv.reader(open("test_data.csv", "rb"), lineterminator=" \r\n")
read_and_print(reader)
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to