On Thu, May 03, 2007 at 10:28:34AM -0700, [EMAIL PROTECTED] wrote: > On May 3, 10:12 am, [EMAIL PROTECTED] wrote: > > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote: > > > > As Larry said, this most likely means there are null bytes in the CSV > > > > file. > > > > > > Ciao, > > > > Marc 'BlackJack' Rintsch > > > > > How would I go about identifying where it is? > > > > A hex editor might be easiest. > > > > You could also use Python: > > > > print open("filewithnuls").read().replace("\0", ">>>NUL<<<") > > > > Dustin > > Hmm, interesting if I run: > > print open("test.csv").read().replace("\0", ">>>NUL<<<") > > every single character gets a >>>NUL<<< between them... > > What the heck does that mean? > > Example, here is the first field in the csv > > 89114608511, > > the above code produces: > >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,
I'm guessing that your file is in UTF-16, then -- Windows seems to do that a lot. It kind of makes it *not* a CSV file, but oh well. Try print open("test.csv").decode('utf-16').read().replace("\0", ">>>NUL<<<") I'm not terribly unicode-savvy, so I'll leave it to others to suggest a way to get the CSV reader to handle such encoding without reading in the whole file, decoding it, and setting up a StringIO file. Dustin -- http://mail.python.org/mailman/listinfo/python-list