Re: [Tutor] Error with incorrect encoding

2008-04-17 Thread linuxian iandsd
Kent was right,

 print u'\xae'.encode('utf-8')
 (R)



but i think you are using the wrong source file, i mean don't copy  paste
it from your browsers 'VIEW SOURCE' button. use python native urllib to get
the file.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Error with incorrect encoding

2008-04-17 Thread Alan Gauld
I don't know the cause of the error here but I will say that
parsing HTML with regular expressions is fraught with difficulty
unless you know that the HTML will be suitably formatted
in advance.

You may be better off using one of the HTML parsing
modules such as HTMLParser or even the more powerful
BeautifulSoup.

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld



Oleg Oltar [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
I am trying to parse an html page. Have following error while doing 
that


 src = sel.get_html_source()
links = re.findall(r'a class=al4[^]*/a', src)
for link in links:
print link



 ==
 ERROR: test_new (__main__.NewTest)
 --
 Traceback (most recent call last):
  File stdin, line 19, in test_new
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' 
 in
 position 90: ordinal not in range(128)

 --
 Ran 1 test in 6.345s






 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Error with incorrect encoding

2008-04-15 Thread Oleg Oltar
I am trying to parse an html page. Have following error while doing that


 src = sel.get_html_source()
links = re.findall(r'a class=al4[^]*/a', src)
for link in links:
print link



==
ERROR: test_new (__main__.NewTest)
--
Traceback (most recent call last):
  File stdin, line 19, in test_new
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in
position 90: ordinal not in range(128)

--
Ran 1 test in 6.345s
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Error with incorrect encoding

2008-04-15 Thread Kent Johnson
Oleg Oltar wrote:
 I am trying to parse an html page. Have following error while doing that
 
 
  src = sel.get_html_source()
 links = re.findall(r'a class=al4[^]*/a', src)
 for link in links:
 print link

Presumably get_html_source() is returning unicode? So link is a unicode 
string. To print, unicode must be encoded somehow. By default Python 
will try to encode as ascii, which causes the failure you are seeing.

Try
   print link.encode('xxx')
where 'xxx' is the value of sys.stdout.encoding, most likely either 
'utf-8' or 'windows-1252' depending on your platform.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor