Replacing utf-8 characters

Hi, I am using Python to scrape web pages and I do not have problem 
unless I run into a site that is utf-8.  It seems & is changed to &amp; 
when the site is utf-8.


If I try to replace it with .replace('&amp;','&') it for some reason 
does not replace it.

For example: http://today.reuters.co.uk/news/default.aspx

The url in the page looks like this

http://today.reuters.co.uk/news/NewsArticle.aspx?type=topNews&storyID=2005-10-05T140937Z_01_MCC423599_RTRUKOC_0_UK-BRITAIN-CONSERVATIVES.xml

However when I pull it into python the URL ends up looking like this 
(notice the &amp; instead of just & in the URL)

http://today.reuters.co.uk/news/newsArticle.aspx?type=businessNews&amp;storyID=2005-10-05T094354Z_01_MOL530411_RTRUKOC_0_UK-CONSTRUCTION-BPB-STGOBAIN.xml

Any ideas?
-- 
http://mail.python.org/mailman/listinfo/python-list

Replacing utf-8 characters

Reply via email to