Re: help I'm getting delimited
On 18 dec, 00:06, John Machin sjmac...@lexicon.net wrote: On Dec 18, 3:15 am, aka alexoploca...@gmail.com wrote: Do you mean that this file was created by whatever.UnicodeWriter? If so, did you just now discover this information? How do you know that the UnicodeWriter is functioning perfectly? What does functioning perfectly mean to you? In particular, what encoding is it using? Which do you mean: (a) you typed those lines into Notepad yourself (b) you took a copy of a file created by whatever.UnicodeWriter, opened it with Notepad, trimmed off some rows and columns, and saved it again ? Here's a likely hypothesis: the file was written in utf16. In that case: either (i) you really want utf16 (why?), so: (1) the csv module will not cope with it, and is not expected to cope with it (2) the whatever.UnicodeReader should (in order of preference): (a) be allowed to find out for itself that 'utf16' is the go (b) be told explicitly that 'utf16' is the go (c) be served with a bug report OR (ii) you really want utf8, so: (1) the csv module should be happy (2) the whatever.UnicodeWriter should be told to use 'utf8' (3) the whatever.UnicodeReader should (in order of preference): [as above but s/16/8/] The csv file originally was created by the UnicodeWriter class and was used for a mailmerge function with Microsoft Word which all functioned perfectly. The reverse did not: read back the outputted file so at last I editted it in Notepad, cutting off columns, but I didn't know that the encoding would remain even after that because it still caused problems. Now after testing from the Python command line with a csv file generated from Excel I could get it working so it had to be the encoding. Because the write side of my code, which uses the UnicodeWriter, was ok I didn't pay attention to the fact that I had changed the UW class from UTF-8 to UTF-16 because of difficulties with dutch characters like ë and ö. Then at last I tried changing back to UTF-8 and noticed both out -and input was working, including those special characters, so it was my unjustifiable conclusion that I couldn't get around these special characters at the write side without UTF-16 which ultimately got me in trouble with the read side. With your help I got it straight. Once again minimizing the problem to its bare basics and to prevent big steps is the key. Thanks a lot for your help John. BTW, the TurboGears code by the way is not very different from Python, it just uses some extra identifiers. -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
On 18 dec, 00:06, John Machin sjmac...@lexicon.net wrote: - Tekst uit oorspronkelijk bericht niet weergeven - - Tekst uit oorspronkelijk bericht weergeven - On Dec 18, 3:15 am, aka alexoploca...@gmail.com wrote: Do you mean that this file was created by whatever.UnicodeWriter? If so, did you just now discover this information? How do you know that the UnicodeWriter is functioning perfectly? What does functioning perfectly mean to you? In particular, what encoding is it using? Which do you mean: (a) you typed those lines into Notepad yourself (b) you took a copy of a file created by whatever.UnicodeWriter, opened it with Notepad, trimmed off some rows and columns, and saved it again ? Here's a likely hypothesis: the file was written in utf16. In that case: either (i) you really want utf16 (why?), so: (1) the csv module will not cope with it, and is not expected to cope with it (2) the whatever.UnicodeReader should (in order of preference): (a) be allowed to find out for itself that 'utf16' is the go (b) be told explicitly that 'utf16' is the go (c) be served with a bug report OR (ii) you really want utf8, so: (1) the csv module should be happy (2) the whatever.UnicodeWriter should be told to use 'utf8' (3) the whatever.UnicodeReader should (in order of preference): [as above but s/16/8/] The csv file originally was created by the UnicodeWriter class and was used for a mailmerge function with Microsoft Word which all functioned perfectly. The reverse did not: read back the outputted file so at last I editted it in Notepad, cutting off columns, but I didn't know that the encoding would remain even after that because it still caused problems. Now after testing from the Python command line with a csv file generated from Excel I could get it working so it had to be the encoding. Because the write side of my code, which uses the UnicodeWriter, was ok I didn't pay attention to the fact that I had changed the UW class from UTF-8 to UTF-16 because of difficulties with dutch characters like ë and ö. Then at last I tried changing back to UTF-8 and noticed both out -and input was working, including those special characters, so it was my unjustifiable conclusion that I couldn't get around these special characters at the write side without UTF-16 which ultimately got me in trouble with the read side. With your help I got it straight. Once again minimizing the problem to its bare basics and to prevent big steps is the key. Thanks a lot for your help John. BTW, the TurboGears code is not very different from Python, it just uses some extra identifiers around the Python code. -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
On Dec 18, 3:15 am, aka alexoploca...@gmail.com wrote: Do you mean that this file was created by whatever.UnicodeWriter? If so, did you just now discover this information? How do you know that the UnicodeWriter is functioning perfectly? What does functioning perfectly mean to you? In particular, what encoding is it using? Which do you mean: (a) you typed those lines into Notepad yourself (b) you took a copy of a file created by whatever.UnicodeWriter, opened it with Notepad, trimmed off some rows and columns, and saved it again ? Here's a likely hypothesis: the file was written in utf16. In that case: either (i) you really want utf16 (why?), so: (1) the csv module will not cope with it, and is not expected to cope with it (2) the whatever.UnicodeReader should (in order of preference): (a) be allowed to find out for itself that 'utf16' is the go (b) be told explicitly that 'utf16' is the go (c) be served with a bug report OR (ii) you really want utf8, so: (1) the csv module should be happy (2) the whatever.UnicodeWriter should be told to use 'utf8' (3) the whatever.UnicodeReader should (in order of preference): [as above but s/16/8/] The csv file originally was created by the UnicodeWriter class and was used for a mailmerge function with Microsoft Word which all functioned perfectly. The reverse did not: read back the outputted file so at last I editted it in Notepad, cutting off columns, but I didn't know that the encoding would remain even after that because it still caused problems. Now after testing from the Python command line with a csv file generated from Excel I could get it working so it had to be the encoding. Because the write side of my code, which uses the UnicodeWriter, was ok I didn't pay attention to the fact that I had changed the UW class from UTF-8 to UTF-16 because of difficulties with dutch characters like ë and ö. Then at last I tried changing back to UTF-8 and noticed both out -and input was working, including those special characters, so it was my unjustifiable conclusion that I couldn't get around these special characters at the write side without UTF-16 which ultimately got me in trouble with the read side. With your help I got it straight. Once again minimizing the problem to its bare basics and preventing too large steps is the key. Thanks a lot for your help John. BTW, the TurboGears code is not very different from Python, it just uses some extra identifiers. -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
On Wed, 2008-12-17 at 06:28 -0800, aka wrote: Hi John, thanks. You're right, I didn't past the method header because I thought it didn't matter when the input filename is hardcoded. The try/except isn't very helpful indeed so I commented it out. You're right I wrongly referred to the UnicodeReader class in my first post because that's ultimately where I want to go so I outcommented it here for you to see. The fact is that neither csv.reader nor the UnicodeReader will read the file, while writing with the UnicodeWriter works like a charm. That's why I put str() around roles to see any content. I simplified the csv-file by cutting off columns without result. The file looks now like: id;company;department 12;Cadillac;Research 11;Ford;Accounting 10;Chrysler;Sales The dictionary on the return is because this code is part of my TurboGears application. The entire method is: import csv from utilities.urw import UnicodeWriter, UnicodeReader @expose(allow_json=True) def import_roles(self, input=None, *args, **kwargs): inp = 'C:/temp/test.csv' roles = [] msg = '' ## try: fp = open(inp, 'rb') reader = csv.reader(fp, dialect='excel', delimiter=';') ## reader = UnicodeReader(fp, dialect='excel', delimiter=';') for r in reader: roles.append(r[0]) fp.close() ## except: ## msg = Something's wrong with the csv.reader return dict(filepath=inp, roles=str(roles), msg=msg) csv.reader results in: for r in reader: Error: line contains NULL byte Use of UnicodeReader results in: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte This looks like the problem might be in your choice of codec. A UTF-8 file will never have 0xff in it, and would be unlikely to have 0x00 either. My guess is that you will need to decode your input from UTF-16. (and then use the UnicodeReader). Will post only complete code from now on thanks. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
On Dec 17, 9:39 pm, aka alexoploca...@gmail.com wrote: Due to being in a hurry I didn't paste correctly (sorry). The intention is to put values of column 1 (id) in the roles list, therefore appending within the loop, to fill a session var. The complete code is: It's *not* complete. It's missing import csv. roles = [] inp = 'C:/temp/test.csv' try: fp = open(inp, 'rb') reader = csv.reader(fp, dialect='excel', delimiter=';') for r in reader: roles.append(r) ## ultimately should be something like r.id or r[0] ## first row of csv file should be skipped because of column names except: msg = 'Something's wrong with the csv.reader' But you don't print the message! In any case, using the try/except like that *hides* any useful diagnostic information; it gives only an indication that something is wrong, but not what is wrong and where it is wrong. If you throw away the try/except, you will get a more meaningful message -- possibly that csv is not defined!! -- and the traceback will tell you in which line the error occured. return dict(file=inp,roles=str(roles)) Why do you think that you need (a) that complicated expression (b) the str() call? Assuming you are intending to make a function out of all that, what's wrong with returning a (simple) tuple: return inp, roles ? The above 'return' statement is not inside a function/method. You would have got this message: SyntaxError: 'return' outside function People will very soon lose patience with you if you persist in not posting the actual code that you ran. The roles list isn't populated at all :( This could mean (if the code that was posted is moderately similar to that which was run) that the error happened before the first time that roles.append(r) was executed ;-) Please divulge the contents of test.csv -- but not if it's huge! Considering trying to get your code to work first with a data file of close-to-minimal size and complexity, like this: 8--- id,other_info tom,1 dick,2 harry,3 8--- By the way, you mentioned the UnicodeReader class in your original post, but you don't seem to use it ... -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
Hi John, thanks. You're right, I didn't past the method header because I thought it didn't matter when the input filename is hardcoded. The try/except isn't very helpful indeed so I commented it out. You're right I wrongly referred to the UnicodeReader class in my first post because that's ultimately where I want to go so I outcommented it here for you to see. The fact is that neither csv.reader nor the UnicodeReader will read the file, while writing with the UnicodeWriter works like a charm. That's why I put str() around roles to see any content. I simplified the csv-file by cutting off columns without result. The file looks now like: id;company;department 12;Cadillac;Research 11;Ford;Accounting 10;Chrysler;Sales The dictionary on the return is because this code is part of my TurboGears application. The entire method is: import csv from utilities.urw import UnicodeWriter, UnicodeReader @expose(allow_json=True) def import_roles(self, input=None, *args, **kwargs): inp = 'C:/temp/test.csv' roles = [] msg = '' ## try: fp = open(inp, 'rb') reader = csv.reader(fp, dialect='excel', delimiter=';') ## reader = UnicodeReader(fp, dialect='excel', delimiter=';') for r in reader: roles.append(r[0]) fp.close() ## except: ## msg = Something's wrong with the csv.reader return dict(filepath=inp, roles=str(roles), msg=msg) csv.reader results in: for r in reader: Error: line contains NULL byte Use of UnicodeReader results in: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte Will post only complete code from now on thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
Hi John, thanks. You're right, I didn't past the method header because I thought it didn't matter when the input filename is hardcoded. The try/except isn't very helpful indeed so I commented it out. You're right I wrongly referred to the UnicodeReader class in my first post because that's ultimately where I want to go so I outcommented it here for you to see. The fact is that neither csv.reader nor the UnicodeReader will read the file, while writing with the UnicodeWriter works like a charm. That's why I put str() around roles to see any content. I simplified the csv-file by cutting off columns without result. The file looks now like: id;company;department 12;Cadillac;Research 11;Ford;Accounting 10;Chrysler;Sales The dictionary on the return is because this code is part of my TurboGears application. The entire method is: import csv from utilities.urw import UnicodeWriter, UnicodeReader @expose(allow_json=True) def import_roles(self, input=None, *args, **kwargs): inp = 'C:/temp/test.csv' roles = [] msg = '' ## try: fp = open(inp, 'rb') reader = csv.reader(fp, dialect='excel', delimiter=';') ## reader = UnicodeReader(fp, dialect='excel', delimiter=';') for r in reader: roles.append(r[0]) fp.close() ## except: ## msg = Something's wrong with the csv.reader return dict(filepath=inp, roles=str(roles), msg=msg) csv.reader results in: for r in myreader: Error: line contains NULL byte Use of UnicodeReader results in: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte Will post only complete code from now on thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
Due to being in a hurry I didn't paste correctly (sorry). The intention is to parse a csv file and (ultimately) put values of column 1 (id) in a list (so I need to append in the loop) that will be used to fill a session var. The complete code is: roles = [] inp = 'C:/temp/test.csv' try: fp = open(inp, 'rb') reader = csv.reader(fp, dialect='excel', delimiter=';') for r in reader: roles.append(r) ## ultimately should be something like r.id ## first row of csv file should be skipped because of column names or r[0] except: msg = 'Something's wrong with the csv.reader' return dict(file=inp,roles=str(roles)) The roles list isn't populated at all :( -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
Due to being in a hurry I didn't paste correctly (sorry). The intention is to put values of column 1 (id) in the roles list, therefore appending within the loop, to fill a session var. The complete code is: roles = [] inp = 'C:/temp/test.csv' try: fp = open(inp, 'rb') reader = csv.reader(fp, dialect='excel', delimiter=';') for r in reader: roles.append(r) ## ultimately should be something like r.id or r[0] ## first row of csv file should be skipped because of column names except: msg = 'Something's wrong with the csv.reader' return dict(file=inp,roles=str(roles)) The roles list isn't populated at all :( -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
John, this is the actual code I ran in TurboGears which is a Python framework. I should have left away the import statements. Trust me, the problem isn't in there because the UnicodeWriter is functioning perfectly. I did allready sanitate the csv file to these four lines in Notepad so there isn't anything more than this: id;company;department 12;Cadillac;Research 11;Ford;Accounting 10;Chrysler;Sales The only possible problematic lines are marked # here: def import_roles(self, input=None, *args, **kwargs): inp = 'C:/temp/test.csv' roles = [] msg = '' ## try: fp = open(inp, 'rb') # reader = csv.reader(fp, dialect='excel', delimiter=';') # ## reader = UnicodeReader(fp, dialect='excel', delimiter=';') # for r in reader: roles.append(r[0]) # fp.close() ## except: ## msg = Something's wrong with the csv.reader return dict(filepath=inp, roles=str(roles), msg=msg) Yeah rdmur, I'll have a look at the Python commandline. -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
Quoth John Machin sjmac...@lexicon.net: On Dec 18, 1:28 am, aka alexoploca...@gmail.com wrote: @expose(allow_json=True) Means what? Does what? Does the problem still happen without that? Means what he's posting is not a standalone script :) He says it's part of his turbogears ap. @expose says that this method is callable by name from a URL, and allow_json means it can be called with a parameter requesting a json formatted response instead of html. Funny, the indentation changed there --- for the very last time, is that the actual code of a standalone script that reproduces the problem? Alex, I would strongly suggest that you move your code out into a standalone script and debug it there (you'll get more help from this group if you do, for one thing!). After you get it working standalone you can incorporate it back into your Turbogears ap. --RDM -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
On Dec 18, 1:28 am, aka alexoploca...@gmail.com wrote: Hi John, thanks. You're right, I didn't past the method header because I thought it didn't matter when the input filename is hardcoded. The try/except isn't very helpful indeed so I commented it out. You're right I wrongly referred to the UnicodeReader class in my first post because that's ultimately where I want to go so I outcommented it here for you to see. The fact is that neither csv.reader nor the UnicodeReader will read the file, while writing with the UnicodeWriter works like a charm. That's why I put str() around roles to see any content. I simplified the csv-file by cutting off columns without result. The file looks now like: id;company;department 12;Cadillac;Research 11;Ford;Accounting 10;Chrysler;Sales The dictionary on the return is because this code is part of my TurboGears application. The entire method is: import csv from utilities.urw import UnicodeWriter, UnicodeReader Pardon my ignorance, but what is utilities.urw?? @expose(allow_json=True) Means what? Does what? Does the problem still happen without that? Funny, the indentation changed there --- for the very last time, is that the actual code of a standalone script that reproduces the problem? def import_roles(self, input=None, *args, **kwargs): inp = 'C:/temp/test.csv' roles = [] msg = '' ## try: fp = open(inp, 'rb') reader = csv.reader(fp, dialect='excel', delimiter=';') ## reader = UnicodeReader(fp, dialect='excel', delimiter=';') for r in reader: roles.append(r[0]) fp.close() ## except: ## msg = Something's wrong with the csv.reader return dict(filepath=inp, roles=str(roles), msg=msg) csv.reader results in: for r in reader: Error: line contains NULL byte Looks like the file is stuffed. Have you tried inspecting it with a tool that would actually show a '\x00' or a '\xff' unambiguously? If you don't have a fancy one, use the Python interactive prompt: open('your_file.csv', 'rb').read() Use of UnicodeReader results in: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte Will post only complete code from now on thanks. Just make sure it's runnable and it's what you actually ran thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
Due to being in a hurry I didn't paste correctly so I lost the try clause (sorry). The intention is to parse a csv file and (ultimately) put values of column 1 (id) in a list (so I need to append in the loop) that will be used to fill a session var. The complete code is: roles = [] inp = 'C:/temp/test.csv' try: fp = open(inp, 'rb') reader = csv.reader(fp, dialect='excel', delimiter=';') for r in reader: roles.append(r) ## ultimately should be something like r.id or r[0] except: msg = 'Something's wrong with the csv.reader' return dict(file=inp,roles=str(roles)) The roles list isn't populated at all :( -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
On Dec 18, 3:15 am, aka alexoploca...@gmail.com wrote: John, this is the actual code I ran in TurboGears which is a Python framework. It's not complete -- the change in indentation would have caused a SyntaxError. If (as you appear to assert) the problem is in the csv module, then create a small stand-alone no-TurboGears Python script and a test file which together demonstrate the problem reproducibly so that the problem can investigated by anyone with a standard TurboGears-free Python installation. If you can't reproduce the problem in that manner, then you may need to seek assistance in a TurboGears-specific forum. I should have left away the import statements. Trust me, the problem isn't in there because the UnicodeWriter is functioning perfectly. Do you mean that this file was created by whatever.UnicodeWriter? If so, did you just now discover this information? How do you know that the UnicodeWriter is functioning perfectly? What does functioning perfectly mean to you? In particular, what encoding is it using? I did allready sanitate the csv file to these four lines in Notepad so there isn't anything more than this: id;company;department 12;Cadillac;Research 11;Ford;Accounting 10;Chrysler;Sales Which do you mean: (a) you typed those lines into Notepad yourself (b) you took a copy of a file created by whatever.UnicodeWriter, opened it with Notepad, trimmed off some rows and columns, and saved it again ? You said earlier csv.reader results in: for r in reader: Error: line contains NULL byte Use of UnicodeReader results in: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte Those results are consistent with your file being encoded in utf16_le, with the utf16_le BOM ('\xff\xfe') at the start of the file. Have you, as I asked, looked at the file with some better-than-Notepad diagnostic apparatus? Here's a likely hypothesis: the file was written in utf16. In that case: either (i) you really want utf16 (why?), so: (1) the csv module will not cope with it, and is not expected to cope with it (2) the whatever.UnicodeReader should (in order of preference): (a) be allowed to find out for itself that 'utf16' is the go (b) be told explicitly that 'utf16' is the go (c) be served with a bug report OR (ii) you really want utf8, so: (1) the csv module should be happy (2) the whatever.UnicodeWriter should be told to use 'utf8' (3) the whatever.UnicodeReader should (in order of preference): [as above but s/16/8/] HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
On Tue, 2008-12-16 at 08:26 -0800, aka wrote: Hi, I'm going nuts over the csv.reader and UnicodeReader class. Somehow I can't get this method working which is supposed to read a csv file which name is inputted but here now hardcoded. What I need for now is that the string version of the list is put out for control. Later on I will only need to read the first column (id) of the csv file to be able to fill in a session var with a list of all ids. inp = c:/temp/test.csv roles = [] try: fp = open(inp, 'rb') reader = csv.reader(fp) for r in reader: rollen.append(r) except: msg = Er is iets mis met de UnicodeReader return dict(file=in,roles=str(roles)) Any help greatly appreciated! Cheers Did you intend inside the loop to write: roles.append(r) -- http://mail.python.org/mailman/listinfo/python-list
Re: help I'm getting delimited
Paul Watson wrote: On Tue, 2008-12-16 at 08:26 -0800, aka wrote: Hi, I'm going nuts over the csv.reader and UnicodeReader class. Somehow I can't get this method working which is supposed to read a csv file which name is inputted but here now hardcoded. What I need for now is that the string version of the list is put out for control. Later on I will only need to read the first column (id) of the csv file to be able to fill in a session var with a list of all ids. inp = c:/temp/test.csv roles = [] try: fp = open(inp, 'rb') reader = csv.reader(fp) for r in reader: rollen.append(r) except: msg = Er is iets mis met de UnicodeReader return dict(file=in,roles=str(roles)) Any help greatly appreciated! Cheers Did you intend inside the loop to write: roles.append(r) Also, the bare except will catch _all_ exceptions. You should catch only those you expect. In this case, it's catching your use of rollen instead of roles (probably unintentional) and then complaining about UnicodeReader, even though that's (probably, again!) not the problem. -- http://mail.python.org/mailman/listinfo/python-list