> The last time I needed to do something like this I tried Search first, but > ended up using the A-Z on /programmes as the results were much more what I > was after. The HTML on /programmes is also easy to parse. I don't call using > an XML parser and XPath screen scraping :)
It's screen scraping if the output wasn't designed to be read by a machine. Change the format and you've got a broken screen scraper. If the output was XML any changes to the output would either be non-destructive to the existing format or would explicitly use a different version of the API on a different URL or with different arguments (like the difference between RSS and Atom). You could use a parser like Beautiful Soup to turn whatever rubbish you're looking at into perfectly traversable XML but it doesn't change the fact that the entire thing would break if the page author decided to juggle the layout around a bit. That's my rule of thumb about what constitutes screen scraping anyway. Iain - Sent via the backstage.bbc.co.uk discussion group. To unsubscribe, please visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html. Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/