I had a similar problem but i can 't encode a byte to a file what has been
uploaded, without damage the data if i used utf-8 to encode the file
duplicates the size, and i try to change the codec to raw_unicode_escape
and this barely give me the correct size but still damage the file, i used
Python 3 and i have to encode the file again.
On Oct 9, 2010 11:39pm, Chris Rebert <creb...@ucsd.edu> wrote:
On Sat, Oct 9, 2010 at 4:59 PM, Brian Blais bbl...@bryant.edu> wrote:
> This may be a stemming from my complete ignorance of unicode, but when
I do this (Python 2.6):
>
> s='\xc2\xa9 2008 \r\n'
>
> and I want the ascii version of it, ignoring any non-ascii chars, I
thought I could do:
>
> s.encode('ascii','ignore')
>
> but it gives the error:
>
> In [20]:s.encode('ascii','ignore')
>
----------------------------------------------------------------------------
> UnicodeDecodeError Traceback (most recent call last)
>
> /Users/bblais/python/doit100810a.py in ()
> ----> 1
> 2
> 3
> 4
> 5
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
ordinal not in range(128)
>
> am I doing something stupid here?
In addition to Benjamin's explanation:
Unicode strings in Python are of type `unicode` and written with a
leading "u"; eg u"A unicode string for ¥500". Byte strings lack the
leading "u"; eg "A plain byte string". Note that "Unicode string"
does not refer to strings which have been encoded using a Unicode
encoding (eg UTF-8); such strings are still byte strings, for
encodings emit bytes.
As to why you got the /exact/ error you did:
As a backward compatibility hack, in order to satisfy your nonsensical
encoding request, Python implicitly tried to decode the byte string
`s` using ASCII as a default (the choice of ASCII here has nothing to
do with the fact that you specified ASCII in your encoding request),
so that it could then try and encode the resulting unicode string;
hence why you got a Unicode*De*codeError as opposed to a
Unicode*En*codeError, despite the fact you called *en*code().
Highly suggested further reading:
"The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)"
http://www.joelonsoftware.com/articles/Unicode.html
Cheers,
Chris
--
http://mail.python.org/mailman/listinfo/python-list
--
http://mail.python.org/mailman/listinfo/python-list