Re: [Tutor] Urgent: unicode problems writing CSV file
On Wednesday 08 June 2016 07:24 PM, Alex Hall wrote: All, I'm working on a project that writes CSV files, and I have to get it done very soon. I've done this before, but I'm suddenly hitting a problem with unicode conversions. I'm trying to write data, but getting the standard cannot encode character: ordinal not in range(128) I've tried str(info).encode("utf8") str(info).decode(utf8") unicode(info, "utf8") csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword argument What else can I do? As I said, I really have to get this working soon, but I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks. Hi, How about opening file with setting the encoding = utf8. This solved most of my problems. George ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
Thanks for all the responses, everyone, what you all said makes sense. I also understand what you mean by the tone of an "urgent" message coming across as demanding. On Wed, Jun 8, 2016 at 1:19 PM, Tim Golden wrote: > On 08/06/2016 14:54, Alex Hall wrote: > > All, > > I'm working on a project that writes CSV files, and I have to get it done > > very soon. I've done this before, but I'm suddenly hitting a problem with > > unicode conversions. I'm trying to write data, but getting the standard > > cannot encode character: ordinal not in range(128) > > > > I've tried > > str(info).encode("utf8") > > str(info).decode(utf8") > > unicode(info, "utf8") > > csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword > > argument > > > > What else can I do? As I said, I really have to get this working soon, > but > > I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks. > > > > This is a little tricky. I assume that you're on Python 2.x (since > open() isn't taking an encoding). Deep in the bowels of the CSV module's > C implmentation is code which converts every item in the row it's > receiving to a string. (Essentially does: [str(x) for x in row]). Which > will assume ascii: there's no opportunity to specify an encoding. > > For things whose __str__ returns something ascii-ish, that's fine. But > if your data does or is likely to contain non-ascii data, you'll need to > preprocess it. How you do it, and how general-purpose that approach is > will depend on your data. For the purposes of discussion, let's assume > your data looks like this: > > unicode, int, int > > Then your encoder could do this: > > def encoder_of_rows(row): > return [row[0].encode("utf-8"), str(row[1]), str(row[2])] > > and your csv processor could do this: > > rows = [...] > with open("filename.csv", "wb") as f: > writer = csv.writer(f) > writer.writerows([encoder_of_rows(row) for row in rows]) > > > but if could be more (or less) complex than that depending on your data > and how much you know about it. > > TJG > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > -- Alex Hall Automatic Distributors, IT department ah...@autodist.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
> Date: Thu, 9 Jun 2016 03:57:21 +1000 > From: st...@pearwood.info > To: tutor@python.org > Subject: Re: [Tutor] Urgent: unicode problems writing CSV file > > On Wed, Jun 08, 2016 at 01:18:11PM -0400, Alex Hall wrote: >> >> csvWriter.writerow([info.encode("utf8") if type(info)is unicode else info >> for info in resultInfo]) > > Let's break it up into pieces. You build a list: > > [blah blah blah for info in resultInfo] > > then write it to the csv file with writerow. That is straightforward. > > What's in the list? > > info.encode("utf8") if type(info)is unicode else info > > So Python looks at each item, `info`, decides if it is Unicode or not, > and if it is Unicode it converts it to a byte str using encode, > otherwise leaves it be. > > If it makes you feel better, this is almost exactly the solution I would > have come up with in your shoes, except I'd probably write a helper > function to make it a bit less mysterious: > > def to_str(obj): > if isinstance(obj, unicode): > return obj.encode('uft-8') > elif isinstance(obj, str): > return obj > else: > & Or maybe an error? > return repr(obj) > > csvWriter.writerow([to_string(info) for info in resultInfo]) > The docs also offer some code for working with Unicode/CSV, see the bottom of this page: https://docs.python.org/2/library/csv.html ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
Alex Hall wrote: > The type of the 'info' variable can vary, as I'm pulling it from a > database with Pyodbc. I eventually found something that works, though I'm > not fully sure why or how. As Tim says, the csv.writer in Python 2 applies str() to every value. If that value is a unicode instance this results in value.encode(sys.getdefaultencoding()) In every sanely configured Python installation the default encoding is "ascii", hence the UnicodeEncodeError. If you manually encode for unicode objects (with an encoding that can encode all its codepoints) > csvWriter.writerow([info.encode("utf8") if type(info)is unicode else info > for info in resultInfo]) the str(value) that follows sees a byte string, becomes a noop and will never fail. > where resultInfo is an array holding the values from a row of database > query results, in the order I want them in. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
On Wed, Jun 08, 2016 at 01:18:11PM -0400, Alex Hall wrote: > I never knew that specifying a question is related to a time-sensitive > project is considered impolite. My apologies. Your urgency is not our urgency. We're volunteers offerring our time and experience for free. Whether you intended it or not, labelling the post "urgent" can come across as badgering us: "Hey, answer my question! For free! And do it now, not when it is convenient to you!" I'm sure that wasn't your intention, but that's how it can come across. > The type of the 'info' variable can vary, as I'm pulling it from a > database with Pyodbc. /face-palm Oh vey, that's terrible! Not your fault, but still terrible. > I eventually found something that works, though I'm > not fully sure why or how. > > csvWriter.writerow([info.encode("utf8") if type(info)is unicode else info > for info in resultInfo]) Let's break it up into pieces. You build a list: [blah blah blah for info in resultInfo] then write it to the csv file with writerow. That is straightforward. What's in the list? info.encode("utf8") if type(info)is unicode else info So Python looks at each item, `info`, decides if it is Unicode or not, and if it is Unicode it converts it to a byte str using encode, otherwise leaves it be. If it makes you feel better, this is almost exactly the solution I would have come up with in your shoes, except I'd probably write a helper function to make it a bit less mysterious: def to_str(obj): if isinstance(obj, unicode): return obj.encode('uft-8') elif isinstance(obj, str): return obj else: # Or maybe an error? return repr(obj) csvWriter.writerow([to_string(info) for info in resultInfo]) -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
On Wed, Jun 8, 2016 at 12:53 PM Alex Hall wrote: > All, > I'm working on a project that writes CSV files, and I have to get it done > very soon. I've done this before, but I'm suddenly hitting a problem with > unicode conversions. I'm trying to write data, but getting the standard > cannot encode character: ordinal not in range(128) > Have you tried ignoring invalid characters? >>> data = b'\x30\x40\xff\x50' >>> text = data.decode('utf-8') Traceback ... UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff >>> text = data.decode('utf-8', 'ignore') >>> print(text) 0@P BTW, most programming volunteers don't like responding to things marked "Urgent". It's probably not really urgent unless someone's life is in danger. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
I never knew that specifying a question is related to a time-sensitive project is considered impolite. My apologies. The type of the 'info' variable can vary, as I'm pulling it from a database with Pyodbc. I eventually found something that works, though I'm not fully sure why or how. csvWriter.writerow([info.encode("utf8") if type(info)is unicode else info for info in resultInfo]) where resultInfo is an array holding the values from a row of database query results, in the order I want them in. On Wed, Jun 8, 2016 at 1:08 PM, Peter Otten <__pete...@web.de> wrote: > Alex Hall wrote: > > Marking your posts is generally considered impolite and will not help you > get answers sooner than without it. > > I'm working on a project that writes CSV files, and I have to get it done > > very soon. I've done this before, but I'm suddenly hitting a problem with > > unicode conversions. I'm trying to write data, but getting the standard > > cannot encode character: ordinal not in range(128) > > > > I've tried > > str(info).encode("utf8") > > str(info).decode(utf8") > > unicode(info, "utf8") > > csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword > > argument > > > > What else can I do? As I said, I really have to get this working soon, > but > > I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks. > > What's the type of "info"? > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > -- Alex Hall Automatic Distributors, IT department ah...@autodist.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
On Wed, Jun 08, 2016 at 09:54:23AM -0400, Alex Hall wrote: > All, > I'm working on a project that writes CSV files, and I have to get it done > very soon. I've done this before, but I'm suddenly hitting a problem with > unicode conversions. I'm trying to write data, but getting the standard > cannot encode character: ordinal not in range(128) I infer from your error that you are using Python 2. Is that right? You should say so, *especially* for Unicode problems, because Python 3 uses a very different (and much better) system for handling text strings. Also, there is no such thing as a "standard" error. All error messages are different, and they usually show lots of debugging information that you haven't yet learned to read. But we have, so please show us the full traceback! > I've tried > str(info).encode("utf8") > str(info).decode(utf8") One of the problems with Python 2 is that it allows two nonsense operations: str.encode and unicode.decode. The whole string handling thing in Python 2 is a bit of a mess. It's over 20 years old, and dates back to before Unicode even existed, so you'll have to excuse a bit of confusion. In Python 2: (1) str means *byte string*, NOT text string, and is limited to "chars" with ordinal values 0 to 255; (2) unicode means "text string"; (3) In an attempt to be helpful, Python 2 will try to automatically convert to and from bytes strings as needed. This works so long as all your characters are ASCII, but leads to chaos, confusion and error as soon as you have non-ASCII characters involved. Python 3 fixes these confusing features. Remember two facts: (1) To go from TEXT to BYTES (i.e. unicode -> str) use ENCODE; (2) To go from BYTES to TEXT (i.e. str -> unicode) use DECODE. but you must be careful to prevent Python doing those automatic conversions first. Looking at your code: str(info).encode("utf8") that's wrong, because it tries to go from str->unicode using encode. But using decode also gives the same error. That hints that the error is happening in the call to str() first. Firstly, we need to know what info is. Run this: print type(info) print repr(info) print str(info) and report any errors and output. I'm going to assume that info is a unicode object. Why? Because that will give the error you experience: py> info = u'abcµ' py> str(info) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 3: ordinal not in range(128) The right way to convert unicode text to a byte str is with the encode method. Unless you have good reason to use another encoding, always use UTF-8 (which I see you are doing, great). py> info.encode('utf-8') 'abc\xc2\xb5' If all your Unicode text strings are valid and correct, that should be all you need, but if you are paranoid and fear "invalid" Unicode strings, which can theoretically happen (ask me how if you care), you can take a belt-and-braces approach and preemptively deal with errors by converting them to question marks. NOTE THAT THIS THROWS AWAY INFORMATION FROM YOUR UNICODE TEXT. If your paranoia exceeds your fear of losing information, you can instruct Python to use a ? any time there is an encoding error: info.encode('utf-8', errors='replace') So to recap: - you have a variable `info`, which I am guessing is unicode - you can convert it to a byte str with: info.encode('utf-8') or for the paranoid: info.encode('utf-8', errors='replace') Now that you have a byte string, you can just write it out to the CSV file. To read it back in, you read the CSV file, which returns a byte str, and then convert back to Unicode with: info = data.decode('utf-8') > unicode(info, "utf8") When you run this, what exception do you get? My guess is that you get the following TypeError: py> unicode(u'abc', 'utf-8') Traceback (most recent call last): File "", line 1, in TypeError: decoding Unicode is not supported > csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword > argument Python 3 allows you to set the encoding of files, Python 2 doesn't. In Python 2 you can use the io module, but note that this won't help you as (1) the csv module doesn't support Unicode, and (2) your problem lies elsewhere. P.S. don't feel bad if the whole Unicode thing is confusing you. Most people go through a period of confusion, because you have to unlearn nearly everything you thought you knew about text in computers before you can really get Unicode. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
On 08/06/2016 14:54, Alex Hall wrote: > All, > I'm working on a project that writes CSV files, and I have to get it done > very soon. I've done this before, but I'm suddenly hitting a problem with > unicode conversions. I'm trying to write data, but getting the standard > cannot encode character: ordinal not in range(128) > > I've tried > str(info).encode("utf8") > str(info).decode(utf8") > unicode(info, "utf8") > csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword > argument > > What else can I do? As I said, I really have to get this working soon, but > I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks. > This is a little tricky. I assume that you're on Python 2.x (since open() isn't taking an encoding). Deep in the bowels of the CSV module's C implmentation is code which converts every item in the row it's receiving to a string. (Essentially does: [str(x) for x in row]). Which will assume ascii: there's no opportunity to specify an encoding. For things whose __str__ returns something ascii-ish, that's fine. But if your data does or is likely to contain non-ascii data, you'll need to preprocess it. How you do it, and how general-purpose that approach is will depend on your data. For the purposes of discussion, let's assume your data looks like this: unicode, int, int Then your encoder could do this: def encoder_of_rows(row): return [row[0].encode("utf-8"), str(row[1]), str(row[2])] and your csv processor could do this: rows = [...] with open("filename.csv", "wb") as f: writer = csv.writer(f) writer.writerows([encoder_of_rows(row) for row in rows]) but if could be more (or less) complex than that depending on your data and how much you know about it. TJG ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
On Wed, Jun 8, 2016 at 1:08 PM, Peter Otten <__pete...@web.de> wrote: > Alex Hall wrote: > > Marking your posts is generally considered impolite and will not help you > get answers sooner than without it. > >> I'm working on a project that writes CSV files, and I have to get it done >> very soon. I've done this before, but I'm suddenly hitting a problem with >> unicode conversions. I'm trying to write data, but getting the standard >> cannot encode character: ordinal not in range(128) >> >> I've tried >> str(info).encode("utf8") >> str(info).decode(utf8") >> unicode(info, "utf8") >> csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword >> argument >> Can you show a small piece of code that you run with the full traceback. Show what your data to be printed looks like >> What else can I do? As I said, I really have to get this working soon, but >> I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks. > > What's the type of "info"? > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor -- Joel Goldstick http://joelgoldstick.com/blog http://cc-baseballstats.info/stats/birthdays ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urgent: unicode problems writing CSV file
Alex Hall wrote: Marking your posts is generally considered impolite and will not help you get answers sooner than without it. > I'm working on a project that writes CSV files, and I have to get it done > very soon. I've done this before, but I'm suddenly hitting a problem with > unicode conversions. I'm trying to write data, but getting the standard > cannot encode character: ordinal not in range(128) > > I've tried > str(info).encode("utf8") > str(info).decode(utf8") > unicode(info, "utf8") > csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword > argument > > What else can I do? As I said, I really have to get this working soon, but > I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks. What's the type of "info"? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor