Re: _csv.Error: string with NUL bytes

2007-05-03 Thread John Machin
On May 4, 3:40 am, [EMAIL PROTECTED] wrote:
> On Thu, May 03, 2007 at 10:28:34AM -0700, [EMAIL PROTECTED] wrote:
> > On May 3, 10:12 am, [EMAIL PROTECTED] wrote:
> > > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
> > > > > As Larry said, this most likely means there are null bytes in the CSV 
> > > > > file.
>
> > > > > Ciao,
> > > > > Marc 'BlackJack' Rintsch
>
> > > > How would I go about identifying where it is?
>
> > > A hex editor might be easiest.
>
> > > You could also use Python:
>
> > >   print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
>
> > > Dustin
>
> > Hmm, interesting if I run:
>
> > print open("test.csv").read().replace("\0", ">>>NUL<<<")
>
> > every single character gets a >>>NUL<<< between them...
>
> > What the heck does that mean?
>
> > Example, here is the first field in the csv
>
> > 89114608511,
>
> > the above code produces:
> > >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,
>
> I'm guessing that your file is in UTF-16, then -- Windows seems to do
> that a lot.

Do what a lot? Encode data in UTF-16xE without putting in a BOM or
telling the world in some other fashion what x is? Humans seem to do
that occasionally. When they use Windows software, the result is
highly likely to be encoded in UTF-16LE -- unless of course the human
deliberately chooses otherwise (e.g. the "Unicode bigendian" option in
NotePad's "Save As" dialogue). Further, the data is likely to have a
BOM prepended.

The above is consistent with BOM-free UTF-16BE.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: _csv.Error: string with NUL bytes

2007-05-03 Thread Peter Otten
[EMAIL PROTECTED] wrote:

> I'm guessing that your file is in UTF-16, then -- Windows seems to do
> that a lot.  It kind of makes it *not* a CSV file, but oh well.  Try
> 
>   print open("test.csv").decode('utf-16').read().replace("\0",
>   ">>>NUL<<<")
> 
> I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
> way to get the CSV reader to handle such encoding without reading in the
> whole file, decoding it, and setting up a StringIO file.

Not pretty, but seems to work:

from __future__ import with_statement

import csv
import codecs

def recoding_reader(stream, from_encoding, args=(), kw={}):
intermediate_encoding = "utf8"
efrom = codecs.lookup(from_encoding)
einter = codecs.lookup(intermediate_encoding)
rstream = codecs.StreamRecoder(stream, einter.encode, efrom.decode,
efrom.streamreader, einter.streamwriter)

for row in csv.reader(rstream, *args, **kw):
yield [unicode(column, intermediate_encoding) for column in row]

def main():
file_encoding = "utf16"

# generate sample data:
data = u"\xe4hnlich,\xfcblich\r\nalpha,beta\r\ngamma,delta\r\n"
with open("tmp.txt", "wb") as f:
f.write(data.encode(file_encoding))

# read it
with open("tmp.txt", "rb") as f:
for row in recoding_reader(f, file_encoding):
print u" | ".join(row)

if __name__ == "__main__":
main()

Data from the file is recoded to UTF-8, then passed to a csv.reader() whose
output is decoded to unicode.

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: _csv.Error: string with NUL bytes

2007-05-03 Thread dustin
On Thu, May 03, 2007 at 10:28:34AM -0700, [EMAIL PROTECTED] wrote:
> On May 3, 10:12 am, [EMAIL PROTECTED] wrote:
> > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
> > > > As Larry said, this most likely means there are null bytes in the CSV 
> > > > file.
> >
> > > > Ciao,
> > > > Marc 'BlackJack' Rintsch
> >
> > > How would I go about identifying where it is?
> >
> > A hex editor might be easiest.
> >
> > You could also use Python:
> >
> >   print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
> >
> > Dustin
> 
> Hmm, interesting if I run:
> 
> print open("test.csv").read().replace("\0", ">>>NUL<<<")
> 
> every single character gets a >>>NUL<<< between them...
> 
> What the heck does that mean?
> 
> Example, here is the first field in the csv
> 
> 89114608511,
> 
> the above code produces:
> >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,

I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot.  It kind of makes it *not* a CSV file, but oh well.  Try 

  print open("test.csv").decode('utf-16').read().replace("\0", ">>>NUL<<<")

I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
way to get the CSV reader to handle such encoding without reading in the
whole file, decoding it, and setting up a StringIO file.

Dustin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: _csv.Error: string with NUL bytes

2007-05-03 Thread IAmStarsky
On May 3, 10:12 am, [EMAIL PROTECTED] wrote:
> On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
> > > As Larry said, this most likely means there are null bytes in the CSV 
> > > file.
>
> > > Ciao,
> > > Marc 'BlackJack' Rintsch
>
> > How would I go about identifying where it is?
>
> A hex editor might be easiest.
>
> You could also use Python:
>
>   print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
>
> Dustin

Hmm, interesting if I run:

print open("test.csv").read().replace("\0", ">>>NUL<<<")

every single character gets a >>>NUL<<< between them...

What the heck does that mean?

Example, here is the first field in the csv

89114608511,

the above code produces:
>>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: _csv.Error: string with NUL bytes

2007-05-03 Thread dustin
On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
> > As Larry said, this most likely means there are null bytes in the CSV file.
> >
> > Ciao,
> > Marc 'BlackJack' Rintsch
> 
> How would I go about identifying where it is?

A hex editor might be easiest.

You could also use Python:

  print open("filewithnuls").read().replace("\0", ">>>NUL<<<")

Dustin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: _csv.Error: string with NUL bytes

2007-05-03 Thread fscked
On May 3, 9:29 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> In <[EMAIL PROTECTED]>, fscked wrote:
> > The traceback is as follows:
>
> > Traceback (most recent call last):
> >   File "createXMLPackage.py", line 35, in ?
> > for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
> > address, phone, country, city, in csvreader:
> > _csv.Error: string with NUL bytes
> > Exit code: 1 , 0001h
>
> As Larry said, this most likely means there are null bytes in the CSV file.
>
> Ciao,
> Marc 'BlackJack' Rintsch

How would I go about identifying where it is?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: _csv.Error: string with NUL bytes

2007-05-03 Thread Marc 'BlackJack' Rintsch
In <[EMAIL PROTECTED]>, fscked wrote:

> The traceback is as follows:
> 
> Traceback (most recent call last):
>   File "createXMLPackage.py", line 35, in ?
> for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
> address, phone, country, city, in csvreader:
> _csv.Error: string with NUL bytes
> Exit code: 1 , 0001h

As Larry said, this most likely means there are null bytes in the CSV file.

Ciao,
Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: _csv.Error: string with NUL bytes

2007-05-03 Thread fscked
On May 3, 9:11 am, Larry Bates <[EMAIL PROTECTED]> wrote:
> fscked wrote:
> > Anyone have an idea of what I might do to fix this? I have googled adn
> > can only find some random conversations about it that doesn't make
> > sense to me.
>
> > I am basically reading in a csv file to create an xml and get this
> > error.
>
> > I don't see any empty values in any fields or anything...
>
> You really should post some code and the actual traceback error your
> get for us to help.  I suspect that you have an ill-formed record in
> your CSV file.  If you can't control that, you may have to write your
> own CSV dialect parser.
>
> -Larry

Certainly, here is the code:

import os,sys
import csv
from elementtree.ElementTree import Element, SubElement, ElementTree

def indent(elem, level=0):
i = "\n" + level*"  "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + "  "
for elem in elem:
indent(elem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i

root = Element("{Boxes}boxes")
myfile = open('test.csv', 'rb')
csvreader = csv.reader(myfile)

for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name, address,
phone, country, city, in csvreader:
mainbox = SubElement(root, "{Boxes}box")
mainbox.attrib["city"] = city
mainbox.attrib["country"] = country
mainbox.attrib["phone"] = phone
mainbox.attrib["address"] = address
mainbox.attrib["name"] = name
mainbox.attrib["pl_heartbeat"] = heartbeat
mainbox.attrib["sw_ver"] = sw_ver
mainbox.attrib["hw_ver"] = hw_ver
mainbox.attrib["date_activated"] = activated
mainbox.attrib["mac_address"] = mac
mainbox.attrib["boxid"] = boxid

indent(root)

ElementTree(root).write('test.xml', encoding='UTF-8')

The traceback is as follows:

Traceback (most recent call last):
  File "createXMLPackage.py", line 35, in ?
for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
address, phone, country, city, in csvreader:
_csv.Error: string with NUL bytes
Exit code: 1 , 0001h

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: _csv.Error: string with NUL bytes

2007-05-03 Thread Larry Bates
fscked wrote:
> Anyone have an idea of what I might do to fix this? I have googled adn
> can only find some random conversations about it that doesn't make
> sense to me.
> 
> I am basically reading in a csv file to create an xml and get this
> error.
> 
> I don't see any empty values in any fields or anything...
> 

You really should post some code and the actual traceback error your
get for us to help.  I suspect that you have an ill-formed record in
your CSV file.  If you can't control that, you may have to write your
own CSV dialect parser.

-Larry
-- 
http://mail.python.org/mailman/listinfo/python-list


_csv.Error: string with NUL bytes

2007-05-03 Thread fscked
Anyone have an idea of what I might do to fix this? I have googled adn
can only find some random conversations about it that doesn't make
sense to me.

I am basically reading in a csv file to create an xml and get this
error.

I don't see any empty values in any fields or anything...

-- 
http://mail.python.org/mailman/listinfo/python-list