Re: [Tutor] Assistance with UnicodeDecodeError

2015-02-04 Thread James Chapman
Actually, it's more likely that the char you are grabbing is UTF-16 not
UTF-8 which is moving into the double byte...
* An assumption based on the following output:

 u = u'\u2014'
 s = u.encode(utf-16)
 print(s)
 ■¶
 s = u.encode(utf-32)
 print(s)
 ■  ¶
 s = u.encode(utf-16LE)
 print(s)
¶
 s = u.encode(utf-16BE)
 print(s)
 ¶

See https://en.wikipedia.org/wiki/Character_encoding to help with the
understanding of character encoding, code pages and why they are important.





James
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Assistance with UnicodeDecodeError

2015-02-04 Thread James Chapman

 I am trying to scrap text from a website using Python 2.7 in windows 8 and
 i am getting this error ***UnicodeDecodeError: 'charmap codec can't encode
 character u'\u2014 in position 11231 character maps to undefined*


For starters, move away from Python 2 unless you have a good reason to use
it. Unicode is built into Python 3 whereas it's an after thought in Python
2.

What's happening is that python doesn't understand the character set in use
and it's throwing the exception. You need to tell python what encoding to
use: (not all website are utf-8)


Code example (using python 2.7):

 u = u'\u2014'
 print(u)
Traceback (most recent call last):
  File stdin, line 1, in module
  File c:\Python27\lib\encodings\cp850.py, line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014' in
position 0: character maps to undefined
 s = u.encode(utf-8)
 print(s)
ÔÇö



I also strongly suggest you read:
https://docs.python.org/2/howto/unicode.html

There is much cursing to come. Unicode and especially multi-byte character
string processing is a nightmare!
Good luck ;-)

James
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Assistance with UnicodeDecodeError

2015-02-02 Thread Cristian Di Stefano

Hi Dave,

you should set the correct encoding (maybe utf-8) in order to handle 
data from web. You cannot handle unicode data with simple string, you 
should encode to ASCII or manage data with the unicode type


Best
Cristian

Il 31/01/2015 23:44, Dave Angel ha scritto:

On 01/31/2015 08:37 AM, J Mberia wrote:

Hi,



Welcome to Python tutor.  Thanks for posting using text email, and for 
specifying both your Python version and Operating system.



I am teaching myself programming in python and assistance with
UnicodeDecodeError

I am trying to scrap text from a website using Python 2.7 in windows 
8 and
i am getting this error ***UnicodeDecodeError: 'charmap codec can't 
encode

character u'\u2014 in position 11231 character maps to undefined*

*How do i resolve? Pls assist.*



You can start by posting the whole error message, including the stack 
trace.  Then you probably should include an appropriate segment of 
your code.


The message means that you've got some invalid characters that you're 
trying to convert.  That can either be that the data is invalid, or 
that you're specifying the wrong encoding, directly or implicitly.





---
Questa e-mail è stata controllata per individuare virus con Avast antivirus.
http://www.avast.com

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Assistance with UnicodeDecodeError

2015-02-02 Thread Dave Angel

On 02/02/2015 02:52 AM, Cristian Di Stefano wrote:

Hi Dave,

you should set the correct encoding (maybe utf-8) in order to handle
data from web. You cannot handle unicode data with simple string, you
should encode to ASCII or manage data with the unicode type

Best
Cristian



Please don't top-post, as it confuses who wrote what part and in what 
sequence.  But I can see you're already confused, as you're addressing 
me when replying to J Mberia.


In any case, one cannot encode to ASCII, so you have to be much more 
explicit in what you're trying to say.  Or just wait till the OP 
clarifies his own code.




Il 31/01/2015 23:44, Dave Angel ha scritto:

On 01/31/2015 08:37 AM, J Mberia wrote:

Hi,





Out of sequence quote elided


--
DaveA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Assistance with UnicodeDecodeError

2015-01-31 Thread J Mberia
Hi,

I am teaching myself programming in python and assistance with
UnicodeDecodeError

I am trying to scrap text from a website using Python 2.7 in windows 8 and
i am getting this error ***UnicodeDecodeError: 'charmap codec can't encode
character u'\u2014 in position 11231 character maps to undefined*

*How do i resolve? Pls assist.*

*Jerry*
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Assistance with UnicodeDecodeError

2015-01-31 Thread Dave Angel

On 01/31/2015 08:37 AM, J Mberia wrote:

Hi,



Welcome to Python tutor.  Thanks for posting using text email, and for 
specifying both your Python version and Operating system.



I am teaching myself programming in python and assistance with
UnicodeDecodeError

I am trying to scrap text from a website using Python 2.7 in windows 8 and
i am getting this error ***UnicodeDecodeError: 'charmap codec can't encode
character u'\u2014 in position 11231 character maps to undefined*

*How do i resolve? Pls assist.*



You can start by posting the whole error message, including the stack 
trace.  Then you probably should include an appropriate segment of your 
code.


The message means that you've got some invalid characters that you're 
trying to convert.  That can either be that the data is invalid, or that 
you're specifying the wrong encoding, directly or implicitly.


--
DaveA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor