Re: _csv.Error: string with NUL bytes
On May 4, 3:40 am, [EMAIL PROTECTED] wrote: > On Thu, May 03, 2007 at 10:28:34AM -0700, [EMAIL PROTECTED] wrote: > > On May 3, 10:12 am, [EMAIL PROTECTED] wrote: > > > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote: > > > > > As Larry said, this most likely means there are null bytes in the CSV > > > > > file. > > > > > > Ciao, > > > > > Marc 'BlackJack' Rintsch > > > > > How would I go about identifying where it is? > > > > A hex editor might be easiest. > > > > You could also use Python: > > > > print open("filewithnuls").read().replace("\0", ">>>NUL<<<") > > > > Dustin > > > Hmm, interesting if I run: > > > print open("test.csv").read().replace("\0", ">>>NUL<<<") > > > every single character gets a >>>NUL<<< between them... > > > What the heck does that mean? > > > Example, here is the first field in the csv > > > 89114608511, > > > the above code produces: > > >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<, > > I'm guessing that your file is in UTF-16, then -- Windows seems to do > that a lot. Do what a lot? Encode data in UTF-16xE without putting in a BOM or telling the world in some other fashion what x is? Humans seem to do that occasionally. When they use Windows software, the result is highly likely to be encoded in UTF-16LE -- unless of course the human deliberately chooses otherwise (e.g. the "Unicode bigendian" option in NotePad's "Save As" dialogue). Further, the data is likely to have a BOM prepended. The above is consistent with BOM-free UTF-16BE. -- http://mail.python.org/mailman/listinfo/python-list
Re: _csv.Error: string with NUL bytes
[EMAIL PROTECTED] wrote: > I'm guessing that your file is in UTF-16, then -- Windows seems to do > that a lot. It kind of makes it *not* a CSV file, but oh well. Try > > print open("test.csv").decode('utf-16').read().replace("\0", > ">>>NUL<<<") > > I'm not terribly unicode-savvy, so I'll leave it to others to suggest a > way to get the CSV reader to handle such encoding without reading in the > whole file, decoding it, and setting up a StringIO file. Not pretty, but seems to work: from __future__ import with_statement import csv import codecs def recoding_reader(stream, from_encoding, args=(), kw={}): intermediate_encoding = "utf8" efrom = codecs.lookup(from_encoding) einter = codecs.lookup(intermediate_encoding) rstream = codecs.StreamRecoder(stream, einter.encode, efrom.decode, efrom.streamreader, einter.streamwriter) for row in csv.reader(rstream, *args, **kw): yield [unicode(column, intermediate_encoding) for column in row] def main(): file_encoding = "utf16" # generate sample data: data = u"\xe4hnlich,\xfcblich\r\nalpha,beta\r\ngamma,delta\r\n" with open("tmp.txt", "wb") as f: f.write(data.encode(file_encoding)) # read it with open("tmp.txt", "rb") as f: for row in recoding_reader(f, file_encoding): print u" | ".join(row) if __name__ == "__main__": main() Data from the file is recoded to UTF-8, then passed to a csv.reader() whose output is decoded to unicode. Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: _csv.Error: string with NUL bytes
On Thu, May 03, 2007 at 10:28:34AM -0700, [EMAIL PROTECTED] wrote: > On May 3, 10:12 am, [EMAIL PROTECTED] wrote: > > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote: > > > > As Larry said, this most likely means there are null bytes in the CSV > > > > file. > > > > > > Ciao, > > > > Marc 'BlackJack' Rintsch > > > > > How would I go about identifying where it is? > > > > A hex editor might be easiest. > > > > You could also use Python: > > > > print open("filewithnuls").read().replace("\0", ">>>NUL<<<") > > > > Dustin > > Hmm, interesting if I run: > > print open("test.csv").read().replace("\0", ">>>NUL<<<") > > every single character gets a >>>NUL<<< between them... > > What the heck does that mean? > > Example, here is the first field in the csv > > 89114608511, > > the above code produces: > >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<, I'm guessing that your file is in UTF-16, then -- Windows seems to do that a lot. It kind of makes it *not* a CSV file, but oh well. Try print open("test.csv").decode('utf-16').read().replace("\0", ">>>NUL<<<") I'm not terribly unicode-savvy, so I'll leave it to others to suggest a way to get the CSV reader to handle such encoding without reading in the whole file, decoding it, and setting up a StringIO file. Dustin -- http://mail.python.org/mailman/listinfo/python-list
Re: _csv.Error: string with NUL bytes
On May 3, 10:12 am, [EMAIL PROTECTED] wrote: > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote: > > > As Larry said, this most likely means there are null bytes in the CSV > > > file. > > > > Ciao, > > > Marc 'BlackJack' Rintsch > > > How would I go about identifying where it is? > > A hex editor might be easiest. > > You could also use Python: > > print open("filewithnuls").read().replace("\0", ">>>NUL<<<") > > Dustin Hmm, interesting if I run: print open("test.csv").read().replace("\0", ">>>NUL<<<") every single character gets a >>>NUL<<< between them... What the heck does that mean? Example, here is the first field in the csv 89114608511, the above code produces: >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<, -- http://mail.python.org/mailman/listinfo/python-list
Re: _csv.Error: string with NUL bytes
On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote: > > As Larry said, this most likely means there are null bytes in the CSV file. > > > > Ciao, > > Marc 'BlackJack' Rintsch > > How would I go about identifying where it is? A hex editor might be easiest. You could also use Python: print open("filewithnuls").read().replace("\0", ">>>NUL<<<") Dustin -- http://mail.python.org/mailman/listinfo/python-list
Re: _csv.Error: string with NUL bytes
On May 3, 9:29 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > In <[EMAIL PROTECTED]>, fscked wrote: > > The traceback is as follows: > > > Traceback (most recent call last): > > File "createXMLPackage.py", line 35, in ? > > for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name, > > address, phone, country, city, in csvreader: > > _csv.Error: string with NUL bytes > > Exit code: 1 , 0001h > > As Larry said, this most likely means there are null bytes in the CSV file. > > Ciao, > Marc 'BlackJack' Rintsch How would I go about identifying where it is? -- http://mail.python.org/mailman/listinfo/python-list
Re: _csv.Error: string with NUL bytes
In <[EMAIL PROTECTED]>, fscked wrote: > The traceback is as follows: > > Traceback (most recent call last): > File "createXMLPackage.py", line 35, in ? > for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name, > address, phone, country, city, in csvreader: > _csv.Error: string with NUL bytes > Exit code: 1 , 0001h As Larry said, this most likely means there are null bytes in the CSV file. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: _csv.Error: string with NUL bytes
On May 3, 9:11 am, Larry Bates <[EMAIL PROTECTED]> wrote: > fscked wrote: > > Anyone have an idea of what I might do to fix this? I have googled adn > > can only find some random conversations about it that doesn't make > > sense to me. > > > I am basically reading in a csv file to create an xml and get this > > error. > > > I don't see any empty values in any fields or anything... > > You really should post some code and the actual traceback error your > get for us to help. I suspect that you have an ill-formed record in > your CSV file. If you can't control that, you may have to write your > own CSV dialect parser. > > -Larry Certainly, here is the code: import os,sys import csv from elementtree.ElementTree import Element, SubElement, ElementTree def indent(elem, level=0): i = "\n" + level*" " if len(elem): if not elem.text or not elem.text.strip(): elem.text = i + " " for elem in elem: indent(elem, level+1) if not elem.tail or not elem.tail.strip(): elem.tail = i else: if level and (not elem.tail or not elem.tail.strip()): elem.tail = i root = Element("{Boxes}boxes") myfile = open('test.csv', 'rb') csvreader = csv.reader(myfile) for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name, address, phone, country, city, in csvreader: mainbox = SubElement(root, "{Boxes}box") mainbox.attrib["city"] = city mainbox.attrib["country"] = country mainbox.attrib["phone"] = phone mainbox.attrib["address"] = address mainbox.attrib["name"] = name mainbox.attrib["pl_heartbeat"] = heartbeat mainbox.attrib["sw_ver"] = sw_ver mainbox.attrib["hw_ver"] = hw_ver mainbox.attrib["date_activated"] = activated mainbox.attrib["mac_address"] = mac mainbox.attrib["boxid"] = boxid indent(root) ElementTree(root).write('test.xml', encoding='UTF-8') The traceback is as follows: Traceback (most recent call last): File "createXMLPackage.py", line 35, in ? for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name, address, phone, country, city, in csvreader: _csv.Error: string with NUL bytes Exit code: 1 , 0001h -- http://mail.python.org/mailman/listinfo/python-list
Re: _csv.Error: string with NUL bytes
fscked wrote: > Anyone have an idea of what I might do to fix this? I have googled adn > can only find some random conversations about it that doesn't make > sense to me. > > I am basically reading in a csv file to create an xml and get this > error. > > I don't see any empty values in any fields or anything... > You really should post some code and the actual traceback error your get for us to help. I suspect that you have an ill-formed record in your CSV file. If you can't control that, you may have to write your own CSV dialect parser. -Larry -- http://mail.python.org/mailman/listinfo/python-list
_csv.Error: string with NUL bytes
Anyone have an idea of what I might do to fix this? I have googled adn can only find some random conversations about it that doesn't make sense to me. I am basically reading in a csv file to create an xml and get this error. I don't see any empty values in any fields or anything... -- http://mail.python.org/mailman/listinfo/python-list