Re: [backstage] Programatic searching of /programmes

Iain Wallace Wed, 18 Feb 2009 06:06:25 -0800

> The last time I needed to do something like this I tried Search first, but
> ended up using the A-Z on /programmes as the results were much more what I
> was after. The HTML on /programmes is also easy to parse. I don't call using
> an XML parser and XPath screen scraping :)


It's screen scraping if the output wasn't designed to be read by a
machine. Change the format and you've got a broken screen scraper. If
the output was XML any changes to the output would either be
non-destructive to the existing format or would explicitly use a
different version of the API on a different URL or with different
arguments (like the difference between RSS and Atom).

You could use a parser like Beautiful Soup to turn whatever rubbish
you're looking at into perfectly traversable XML but it doesn't change
the fact that the entire thing would break if the page author decided
to juggle the layout around a bit.

That's my rule of thumb about what constitutes screen scraping anyway.

Iain
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/

Re: [backstage] Programatic searching of /programmes

Reply via email to