Re: Proof-of-concept scraper for iPlayer web frontend TV data to JSON

Steven Maude Fri, 31 Oct 2014 18:37:09 -0700

On 01/11/2014 01:09, Rob Dixon wrote:

I would use the BBC server to do the search for me, after which there is
little work to be done. For instance, if I look for all Book at Bedtime
episodes with this URL



http://www.bbc.co.uk/radio/programmes/a-z/by/book%20at%20bedtime/player

then I am taken a page with a link to the series at

     http://www.bbc.co.uk/programmes/b006qtlx/episodes/player?page=1

through to `page=6`. That amounts to 52 programmes which, even on my
meagre 13 megabit connection that takes less than ten seconds, and the
results could be cached for practically instantaneous response for a
similar request in the future. There is also the possibility of writing
a batch solution that makes a query only every minute or so and could be
run continuously or overnight.

That's a neat idea! (I'd also been concerned with trying to recreate theRSS feeds for programme categories, so I'd focused on pulling everything.)

The search isn't perfect (e.g. try searching for "BBC News"), but youcould use that to refine the results to reduce the amount of scrapingyou need to do, then do better matching against title or synopsis inget_iplayer.

I'm more than happy to write a proof of concept if you're interested. I
have it half-written already just to get that timing information.

The one thing that bothers me is the terms and conditions of the web
site. I scanned through them quickly and couldn't find anything about
robotic access, but it would be a first if there isn't anything there.
If it's just a matter of obeying the /robots.txt then I'm more than
happy to go ahead.

At a glance, robots.txt doesn't seem to disallow accessing the sectionsneeded. In the terms of use, there is this though:

"(d) You agree to use BBC Online Services and access, download, viewand/or listen to BBC Content as supplied to you by the BBC and you maynot, and you may not assist anyone to, or attempt to, reverse engineer,decompile, disassemble, adapt, modify, copy, reproduce, lend, hire,rent, perform, sub-license, make available to the public, createderivative works from, broadcast, distribute, commercially exploit,transmit or otherwise use in any way BBC Online Services and/or BBCContent in whole or in part except to the extent permitted in theseTerms of Use, any relevant Additional Terms and at law."

If I'm downloading pages automatically and automatically reading certainsections of the HTML, is that viewing it as supplied to me by the BBC?


_______________________________________________
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer

Re: Proof-of-concept scraper for iPlayer web frontend TV data to JSON

Reply via email to