Hi, I am using Python to scrape web pages and I do not have problem unless I run into a site that is utf-8. It seems & is changed to & when the site is utf-8.
If I try to replace it with .replace('&','&') it for some reason does not replace it. For example: http://today.reuters.co.uk/news/default.aspx The url in the page looks like this http://today.reuters.co.uk/news/NewsArticle.aspx?type=topNews&storyID=2005-10-05T140937Z_01_MCC423599_RTRUKOC_0_UK-BRITAIN-CONSERVATIVES.xml However when I pull it into python the URL ends up looking like this (notice the & instead of just & in the URL) http://today.reuters.co.uk/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T094354Z_01_MOL530411_RTRUKOC_0_UK-CONSTRUCTION-BPB-STGOBAIN.xml Any ideas? -- http://mail.python.org/mailman/listinfo/python-list