I have a bunch of csv files that have the following characteristics: - field delimiter is a comma - all fields quoted with double quotes - lines terminated by a *space* followed by a newline
What surprised me was that the csv reader included the trailing space in the final field value returned, even though it is outside of the quotes. I've produced a test program (see below) that demonstrates this. There is a workaround, which is to not pass the csv reader the file iterator, but rather a generator that returns lines from the file with the trailing space stripped. Interestingly, the same behaviour is seen if there are spaces before the field separator. They are also included in the preceding field value, even if they are outside the quotations. My workaround wouldn't help here. Anyway is this a bug or a feature? If it is a feature then I'm curious as to why it is considered desirable behaviour. - Andrew import csv filename = "test_data.csv" # Generate a test file - note the spaces before the newlines fout = open(filename, "wb") fout.write('"Field1","Field2","Field3" \n') fout.write('"a","b","c" \n') fout.write('"d" ,"e","f" \n') fout.close() # Function to test a reader def read_and_print(reader): for line in reader: print ",".join(['"%s"' % field for field in line]) # Read the test file - and print the output reader = csv.reader(open("test_data.csv", "rb")) read_and_print(reader) # Now the workaround: a generator to strip the strings before the reader decodes them def stripped(input): for line in input: yield line.strip() reader = csv.reader(stripped(open("test_data.csv", "rb"))) read_and_print(reader) # Try using lineterminator instead - it doesn't work reader = csv.reader(open("test_data.csv", "rb"), lineterminator=" \r\n") read_and_print(reader) -- http://mail.python.org/mailman/listinfo/python-list