On May 4, 3:40 am, [EMAIL PROTECTED] wrote: > On Thu, May 03, 2007 at 10:28:34AM -0700, [EMAIL PROTECTED] wrote: > > On May 3, 10:12 am, [EMAIL PROTECTED] wrote: > > > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote: > > > > > As Larry said, this most likely means there are null bytes in the CSV > > > > > file. > > > > > > Ciao, > > > > > Marc 'BlackJack' Rintsch > > > > > How would I go about identifying where it is? > > > > A hex editor might be easiest. > > > > You could also use Python: > > > > print open("filewithnuls").read().replace("\0", ">>>NUL<<<") > > > > Dustin > > > Hmm, interesting if I run: > > > print open("test.csv").read().replace("\0", ">>>NUL<<<") > > > every single character gets a >>>NUL<<< between them... > > > What the heck does that mean? > > > Example, here is the first field in the csv > > > 89114608511, > > > the above code produces: > > >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<, > > I'm guessing that your file is in UTF-16, then -- Windows seems to do > that a lot.
Do what a lot? Encode data in UTF-16xE without putting in a BOM or telling the world in some other fashion what x is? Humans seem to do that occasionally. When they use Windows software, the result is highly likely to be encoded in UTF-16LE -- unless of course the human deliberately chooses otherwise (e.g. the "Unicode bigendian" option in NotePad's "Save As" dialogue). Further, the data is likely to have a BOM prepended. The above is consistent with BOM-free UTF-16BE. -- http://mail.python.org/mailman/listinfo/python-list