Re: [Tutor] Error with incorrect encoding
Kent was right, print u'\xae'.encode('utf-8') (R) but i think you are using the wrong source file, i mean don't copy paste it from your browsers 'VIEW SOURCE' button. use python native urllib to get the file. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Error with incorrect encoding
I don't know the cause of the error here but I will say that parsing HTML with regular expressions is fraught with difficulty unless you know that the HTML will be suitably formatted in advance. You may be better off using one of the HTML parsing modules such as HTMLParser or even the more powerful BeautifulSoup. -- Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld Oleg Oltar [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I am trying to parse an html page. Have following error while doing that src = sel.get_html_source() links = re.findall(r'a class=al4[^]*/a', src) for link in links: print link == ERROR: test_new (__main__.NewTest) -- Traceback (most recent call last): File stdin, line 19, in test_new UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 90: ordinal not in range(128) -- Ran 1 test in 6.345s ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Error with incorrect encoding
I am trying to parse an html page. Have following error while doing that src = sel.get_html_source() links = re.findall(r'a class=al4[^]*/a', src) for link in links: print link == ERROR: test_new (__main__.NewTest) -- Traceback (most recent call last): File stdin, line 19, in test_new UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 90: ordinal not in range(128) -- Ran 1 test in 6.345s ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Error with incorrect encoding
Oleg Oltar wrote: I am trying to parse an html page. Have following error while doing that src = sel.get_html_source() links = re.findall(r'a class=al4[^]*/a', src) for link in links: print link Presumably get_html_source() is returning unicode? So link is a unicode string. To print, unicode must be encoded somehow. By default Python will try to encode as ascii, which causes the failure you are seeing. Try print link.encode('xxx') where 'xxx' is the value of sys.stdout.encoding, most likely either 'utf-8' or 'windows-1252' depending on your platform. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor