subject:"Re\: \[Tutor\] Urgent\: unicode problems writing CSV file"

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-09 Thread CMG Thrissur




On Wednesday 08 June 2016 07:24 PM, Alex Hall wrote:

All,
I'm working on a project that writes CSV files, and I have to get it done
very soon. I've done this before, but I'm suddenly hitting a problem with
unicode conversions. I'm trying to write data, but getting the standard
cannot encode character: ordinal not in range(128)

I've tried
str(info).encode("utf8")
str(info).decode(utf8")
unicode(info, "utf8")
csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword
argument

What else can I do? As I said, I really have to get this working soon, but
I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks.


Hi,

How about opening file with setting the encoding = utf8.

This solved most of my problems.

George
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Alex Hall

Thanks for all the responses, everyone, what you all said makes sense. I
also understand what you mean by the tone of an "urgent" message coming
across as demanding.

On Wed, Jun 8, 2016 at 1:19 PM, Tim Golden  wrote:

> On 08/06/2016 14:54, Alex Hall wrote:
> > All,
> > I'm working on a project that writes CSV files, and I have to get it done
> > very soon. I've done this before, but I'm suddenly hitting a problem with
> > unicode conversions. I'm trying to write data, but getting the standard
> > cannot encode character: ordinal not in range(128)
> >
> > I've tried
> > str(info).encode("utf8")
> > str(info).decode(utf8")
> > unicode(info, "utf8")
> > csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword
> > argument
> >
> > What else can I do? As I said, I really have to get this working soon,
> but
> > I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks.
> >
>
> This is a little tricky. I assume that you're on Python 2.x (since
> open() isn't taking an encoding). Deep in the bowels of the CSV module's
> C implmentation is code which converts every item in the row it's
> receiving to a string. (Essentially does: [str(x) for x in row]). Which
> will assume ascii: there's no opportunity to specify an encoding.
>
> For things whose __str__ returns something ascii-ish, that's fine. But
> if your data does or is likely to contain non-ascii data, you'll need to
> preprocess it. How you do it, and how general-purpose that approach is
> will depend on your data. For the purposes of discussion, let's assume
> your data looks like this:
>
> unicode, int, int
>
> Then your encoder could do this:
>
> def encoder_of_rows(row):
>   return [row[0].encode("utf-8"), str(row[1]), str(row[2])]
>
> and your csv processor could do this:
>
> rows = [...]
> with open("filename.csv", "wb") as f:
>   writer = csv.writer(f)
>   writer.writerows([encoder_of_rows(row) for row in rows])
>
>
> but if could be more (or less) complex than that depending on your data
> and how much you know about it.
>
> TJG
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>



-- 
Alex Hall
Automatic Distributors, IT department
ah...@autodist.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Albert-Jan Roskam


> Date: Thu, 9 Jun 2016 03:57:21 +1000
> From: st...@pearwood.info
> To: tutor@python.org
> Subject: Re: [Tutor] Urgent: unicode problems writing CSV file
> 
> On Wed, Jun 08, 2016 at 01:18:11PM -0400, Alex Hall wrote:



>> 
>> csvWriter.writerow([info.encode("utf8") if type(info)is unicode else info
>> for info in resultInfo])
> 
> Let's break it up into pieces. You build a list:
> 
> [blah blah blah for info in resultInfo]
> 
> then write it to the csv file with writerow. That is straightforward.
> 
> What's in the list?
> 
> info.encode("utf8") if type(info)is unicode else info
> 
> So Python looks at each item, `info`, decides if it is Unicode or not, 
> and if it is Unicode it converts it to a byte str using encode, 
> otherwise leaves it be.
> 
> If it makes you feel better, this is almost exactly the solution I would 
> have come up with in your shoes, except I'd probably write a helper 
> function to make it a bit less mysterious:
> 
> def to_str(obj):
> if isinstance(obj, unicode):
> return obj.encode('uft-8')
> elif isinstance(obj, str):
> return obj
> else:
> & Or maybe an error?
> return repr(obj)
> 
> csvWriter.writerow([to_string(info) for info in resultInfo])
>

The docs also offer some code for working with Unicode/CSV, see the bottom of 
this page: 
https://docs.python.org/2/library/csv.html


  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Peter Otten

Alex Hall wrote:

> The type  of the 'info' variable can vary, as I'm pulling it from a
> database with Pyodbc. I eventually found something that works, though I'm
> not fully sure why or how.

As Tim says, the csv.writer in Python 2 applies str() to every value.

If that value is a unicode instance this results in

value.encode(sys.getdefaultencoding())

In every sanely configured Python installation the default encoding is 
"ascii", hence the UnicodeEncodeError. If you manually encode for unicode 
objects (with an encoding that can encode all its codepoints)

> csvWriter.writerow([info.encode("utf8") if type(info)is unicode else info
> for info in resultInfo])

the str(value) that follows sees a byte string, becomes a noop and will 
never fail.

> where resultInfo is an array holding the values from a row of database
> query results, in the order I want them in.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Steven D'Aprano

On Wed, Jun 08, 2016 at 01:18:11PM -0400, Alex Hall wrote:
> I never knew that specifying a question is related to a time-sensitive
> project is considered impolite. My apologies.

Your urgency is not our urgency. We're volunteers offerring our time and 
experience for free. Whether you intended it or not, labelling the post 
"urgent" can come across as badgering us:

"Hey, answer my question! For free! And do it now, not when it is 
convenient to you!"

I'm sure that wasn't your intention, but that's how it can come across.

> The type  of the 'info' variable can vary, as I'm pulling it from a
> database with Pyodbc.

/face-palm

Oh vey, that's terrible! Not your fault, but still terrible.

> I eventually found something that works, though I'm
> not fully sure why or how.
> 
>   csvWriter.writerow([info.encode("utf8") if type(info)is unicode else info
> for info in resultInfo])

Let's break it up into pieces. You build a list:

[blah blah blah for info in resultInfo]

then write it to the csv file with writerow. That is straightforward.

What's in the list?

info.encode("utf8") if type(info)is unicode else info

So Python looks at each item, `info`, decides if it is Unicode or not, 
and if it is Unicode it converts it to a byte str using encode, 
otherwise leaves it be.

If it makes you feel better, this is almost exactly the solution I would 
have come up with in your shoes, except I'd probably write a helper 
function to make it a bit less mysterious:

def to_str(obj):
if isinstance(obj, unicode):
return obj.encode('uft-8')
elif isinstance(obj, str):
return obj
else:
# Or maybe an error?
return repr(obj)

csvWriter.writerow([to_string(info) for info in resultInfo])

-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Michael Selik

On Wed, Jun 8, 2016 at 12:53 PM Alex Hall  wrote:

> All,
> I'm working on a project that writes CSV files, and I have to get it done
> very soon. I've done this before, but I'm suddenly hitting a problem with
> unicode conversions. I'm trying to write data, but getting the standard
> cannot encode character: ordinal not in range(128)
>

Have you tried ignoring invalid characters?

>>> data = b'\x30\x40\xff\x50'
>>> text = data.decode('utf-8')
Traceback ... UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff
>>> text = data.decode('utf-8', 'ignore')
>>> print(text)
0@P

BTW, most programming volunteers don't like responding to things marked
"Urgent". It's probably not really urgent unless someone's life is in
danger.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Alex Hall

I never knew that specifying a question is related to a time-sensitive
project is considered impolite. My apologies.

The type  of the 'info' variable can vary, as I'm pulling it from a
database with Pyodbc. I eventually found something that works, though I'm
not fully sure why or how.

  csvWriter.writerow([info.encode("utf8") if type(info)is unicode else info
for info in resultInfo])

where resultInfo is an array holding the values from a row of database
query results, in the order I want them in.


On Wed, Jun 8, 2016 at 1:08 PM, Peter Otten <__pete...@web.de> wrote:

> Alex Hall wrote:
>
> Marking your posts is generally considered impolite and will not help you
> get answers sooner than without it.
>

> I'm working on a project that writes CSV files, and I have to get it done
> > very soon. I've done this before, but I'm suddenly hitting a problem with
> > unicode conversions. I'm trying to write data, but getting the standard
> > cannot encode character: ordinal not in range(128)
> >
> > I've tried
> > str(info).encode("utf8")
> > str(info).decode(utf8")
> > unicode(info, "utf8")
> > csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword
> > argument
> >
> > What else can I do? As I said, I really have to get this working soon,
> but
> > I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks.
>
> What's the type of "info"?
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>



-- 
Alex Hall
Automatic Distributors, IT department
ah...@autodist.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Steven D'Aprano

On Wed, Jun 08, 2016 at 09:54:23AM -0400, Alex Hall wrote:
> All,
> I'm working on a project that writes CSV files, and I have to get it done
> very soon. I've done this before, but I'm suddenly hitting a problem with
> unicode conversions. I'm trying to write data, but getting the standard
> cannot encode character: ordinal not in range(128)

I infer from your error that you are using Python 2. Is that right? You 
should say so, *especially* for Unicode problems, because Python 3 uses 
a very different (and much better) system for handling text strings.

Also, there is no such thing as a "standard" error. All error messages 
are different, and they usually show lots of debugging information that 
you haven't yet learned to read. But we have, so please show us the full 
traceback!

> I've tried
> str(info).encode("utf8")
> str(info).decode(utf8")

One of the problems with Python 2 is that it allows two nonsense 
operations: str.encode and unicode.decode. The whole string handling 
thing in Python 2 is a bit of a mess. It's over 20 years old, and dates 
back to before Unicode even existed, so you'll have to excuse a bit of 
confusion. In Python 2:

(1) str means *byte string*, NOT text string, and is limited 
to "chars" with ordinal values 0 to 255;

(2) unicode means "text string";

(3) In an attempt to be helpful, Python 2 will try to automatically
convert to and from bytes strings as needed. This works so long
as all your characters are ASCII, but leads to chaos, confusion
and error as soon as you have non-ASCII characters involved.

Python 3 fixes these confusing features.

Remember two facts:

(1) To go from TEXT to BYTES (i.e. unicode -> str) use ENCODE;

(2) To go from BYTES to TEXT (i.e. str -> unicode) use DECODE.

but you must be careful to prevent Python doing those automatic 
conversions first.

Looking at your code:

str(info).encode("utf8")

that's wrong, because it tries to go from str->unicode using encode. But 
using decode also gives the same error. That hints that the error is 
happening in the call to str() first.

Firstly, we need to know what info is. Run this:

print type(info)
print repr(info)
print str(info)

and report any errors and output. I'm going to assume that info is a 
unicode object. Why? Because that will give the error you experience:

py> info = u'abcµ'
py> str(info)
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in 
position 3: ordinal not in range(128)

The right way to convert unicode text to a byte str is with the encode 
method. Unless you have good reason to use another encoding, always use 
UTF-8 (which I see you are doing, great).

py> info.encode('utf-8')
'abc\xc2\xb5'

If all your Unicode text strings are valid and correct, that should be 
all you need, but if you are paranoid and fear "invalid" Unicode 
strings, which can theoretically happen (ask me how if you care), you 
can take a belt-and-braces approach and preemptively deal with errors by 
converting them to question marks.

NOTE THAT THIS THROWS AWAY INFORMATION FROM YOUR UNICODE TEXT.

If your paranoia exceeds your fear of losing information, you can 
instruct Python to use a ? any time there is an encoding error:

info.encode('utf-8', errors='replace')

So to recap:

- you have a variable `info`, which I am guessing is unicode

- you can convert it to a byte str with:

info.encode('utf-8')

  or for the paranoid:

info.encode('utf-8', errors='replace')

Now that you have a byte string, you can just write it out to the CSV 
file.

To read it back in, you read the CSV file, which returns a byte str, and 
then convert back to Unicode with:

info = data.decode('utf-8')

> unicode(info, "utf8")

When you run this, what exception do you get? My guess is that you get 
the following TypeError:

py> unicode(u'abc', 'utf-8')
Traceback (most recent call last):
  File "", line 1, in 
TypeError: decoding Unicode is not supported

> csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword
> argument

Python 3 allows you to set the encoding of files, Python 2 doesn't. In 
Python 2 you can use the io module, but note that this won't help you as 
(1) the csv module doesn't support Unicode, and (2) your problem lies 
elsewhere.

P.S. don't feel bad if the whole Unicode thing is confusing you. Most 
people go through a period of confusion, because you have to unlearn 
nearly everything you thought you knew about text in computers before 
you can really get Unicode.

-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Tim Golden

On 08/06/2016 14:54, Alex Hall wrote:
> All,
> I'm working on a project that writes CSV files, and I have to get it done
> very soon. I've done this before, but I'm suddenly hitting a problem with
> unicode conversions. I'm trying to write data, but getting the standard
> cannot encode character: ordinal not in range(128)
> 
> I've tried
> str(info).encode("utf8")
> str(info).decode(utf8")
> unicode(info, "utf8")
> csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword
> argument
> 
> What else can I do? As I said, I really have to get this working soon, but
> I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks.
> 

This is a little tricky. I assume that you're on Python 2.x (since
open() isn't taking an encoding). Deep in the bowels of the CSV module's
C implmentation is code which converts every item in the row it's
receiving to a string. (Essentially does: [str(x) for x in row]). Which
will assume ascii: there's no opportunity to specify an encoding.

For things whose __str__ returns something ascii-ish, that's fine. But
if your data does or is likely to contain non-ascii data, you'll need to
preprocess it. How you do it, and how general-purpose that approach is
will depend on your data. For the purposes of discussion, let's assume
your data looks like this:

unicode, int, int

Then your encoder could do this:

def encoder_of_rows(row):
  return [row[0].encode("utf-8"), str(row[1]), str(row[2])]

and your csv processor could do this:

rows = [...]
with open("filename.csv", "wb") as f:
  writer = csv.writer(f)
  writer.writerows([encoder_of_rows(row) for row in rows])

but if could be more (or less) complex than that depending on your data
and how much you know about it.

TJG
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Joel Goldstick

On Wed, Jun 8, 2016 at 1:08 PM, Peter Otten <__pete...@web.de> wrote:
> Alex Hall wrote:
>
> Marking your posts is generally considered impolite and will not help you
> get answers sooner than without it.
>
>> I'm working on a project that writes CSV files, and I have to get it done
>> very soon. I've done this before, but I'm suddenly hitting a problem with
>> unicode conversions. I'm trying to write data, but getting the standard
>> cannot encode character: ordinal not in range(128)
>>
>> I've tried
>> str(info).encode("utf8")
>> str(info).decode(utf8")
>> unicode(info, "utf8")
>> csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword
>> argument
>>
Can you show a small piece of code that you run with the full
traceback.  Show what your data to be printed looks like

>> What else can I do? As I said, I really have to get this working soon, but
>> I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks.
>
> What's the type of "info"?
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor



-- 
Joel Goldstick
http://joelgoldstick.com/blog
http://cc-baseballstats.info/stats/birthdays
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

2016-06-08 Thread Peter Otten

Alex Hall wrote:

Marking your posts is generally considered impolite and will not help you 
get answers sooner than without it.

> I'm working on a project that writes CSV files, and I have to get it done
> very soon. I've done this before, but I'm suddenly hitting a problem with
> unicode conversions. I'm trying to write data, but getting the standard
> cannot encode character: ordinal not in range(128)
> 
> I've tried
> str(info).encode("utf8")
> str(info).decode(utf8")
> unicode(info, "utf8")
> csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword
> argument
> 
> What else can I do? As I said, I really have to get this working soon, but
> I'm stuck on this stupid unicode thing. Any ideas will be great. Thanks.

What's the type of "info"?


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

Re: [Tutor] Urgent: unicode problems writing CSV file

11 matches

Site Navigation

Mail list logo

Footer information