I had a similar problem but i can 't encode a byte to a file what has been uploaded, without damage the data if i used utf-8 to encode the file duplicates the size, and i try to change the codec to raw_unicode_escape and this barely give me the correct size but still damage the file, i used Python 3 and i have to encode the file again.

On Oct 9, 2010 11:39pm, Chris Rebert <creb...@ucsd.edu> wrote:
On Sat, Oct 9, 2010 at 4:59 PM, Brian Blais bbl...@bryant.edu> wrote:

> This may be a stemming from my complete ignorance of unicode, but when I do this (Python 2.6):

>

> s='\xc2\xa9 2008 \r\n'

>

> and I want the ascii version of it, ignoring any non-ascii chars, I thought I could do:

>

> s.encode('ascii','ignore')

>

> but it gives the error:

>

> In [20]:s.encode('ascii','ignore')

> ----------------------------------------------------------------------------

> UnicodeDecodeError Traceback (most recent call last)

>

> /Users/bblais/python/doit100810a.py in ()

> ----> 1

> 2

> 3

> 4

> 5

>

> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

>

> am I doing something stupid here?



In addition to Benjamin's explanation:



Unicode strings in Python are of type `unicode` and written with a

leading "u"; eg u"A unicode string for ¥500". Byte strings lack the

leading "u"; eg "A plain byte string". Note that "Unicode string"

does not refer to strings which have been encoded using a Unicode

encoding (eg UTF-8); such strings are still byte strings, for

encodings emit bytes.



As to why you got the /exact/ error you did:

As a backward compatibility hack, in order to satisfy your nonsensical

encoding request, Python implicitly tried to decode the byte string

`s` using ASCII as a default (the choice of ASCII here has nothing to

do with the fact that you specified ASCII in your encoding request),

so that it could then try and encode the resulting unicode string;

hence why you got a Unicode*De*codeError as opposed to a

Unicode*En*codeError, despite the fact you called *en*code().



Highly suggested further reading:

"The Absolute Minimum Every Software Developer Absolutely, Positively

Must Know About Unicode and Character Sets (No Excuses!)"

http://www.joelonsoftware.com/articles/Unicode.html



Cheers,

Chris

--

http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to