Re: [CODE4LIB] screen scraping
On Oct 3, 2011 9:19 AM, Ed Summers e...@pobox.com wrote: On Sun, Oct 2, 2011 at 10:32 PM, Ken Irwin kir...@wittenberg.edu wrote: 1. respect robots.txt Disclaimer: I am not a lawyer. Remember that robots.txt applies only to recursive web crawlers, and not to screen-scraping per se. In cases where it does apply, it has limited legal effect, but ignoring it is not cricket. Important considerations are: is access to the site governed by a license that prohibits the activity; is the content being scraped subject to copyright, and if so, is the screen scraping covered by one of the exceptions to exclusive rights of the copyright holder; is the screen-scraping activity disruptive and damaging to the site being used (trespass to chattels, etc.)? A bit of reflection on the Golden Rule probably is probably more important than pondering the legality of what you are doing. Ed invoking philosophy? With citation? (wikipedia still counts) :-p The usual objection to the golden rule apply here- just because one has no objection to having a screen scraper used on your own site doesn't automatically imply that others might not wish to have their sites scraped. Simon
Re: [CODE4LIB] screen scraping
On Sun, Oct 2, 2011 at 9:35 PM, Reese, Terry terry.re...@oregonstate.edu wrote: In Canada, the BC Supreme Court ruled that screen scrapping real estate listings from one site and using them on another indeed infringed on copyright. Not sure if this would cover your use -- but if you are coming from Canada, it might be something to consider. Decision URL: http://www.canlii.org/en/bc/bcsc/doc/2011/2011bcsc1196/2011bcsc1196.html If you read the decision, it looks as though the content found to be infringing was the property's description and photograph, which are creative works. Indexing factual data about a property *only* (asking price, address, square footage, etc) may have been on stronger legal footing. Regards, -Nate
Re: [CODE4LIB] screen scraping
Another reason to check with the webmaster, all legalities aside, is that their top ten list might actually be being built on an RSS feed, but for whatever reason they don't offer it directly as a feed (or they do, but it wasn't obvious to you where that feed was to be found). They might prefer you grab the feed rather than scrape the screen. I don't actually have any feed-based pages on our site that aren't also available as feeds -- but some people might. Also, for usage statistics reasons, I'd rather have bots hitting the feeds instead of the pages. Genny Engel Sonoma County Library gen...@sonoma.lib.ca.us 707 545-0831 x581 www.sonomalibrary.org -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nate Hill Sent: Sunday, October 02, 2011 7:23 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] screen scraping A question: what are the 'rules' around screen scraping? If one site doesn't offer an RSS feed and you want to grab (for example) their weekly top ten list with a script and then redisplay it on another site, is that bad form? Or even illegal? Thanks- Nate -- Nate Hill nathanielh...@gmail.com http://www.natehill.net
Re: [CODE4LIB] screen scraping
I don't know that there are two many rules about this, but here's what comes to mind for me: 1. respect robots.txt 2. cache content so you don't hit their site more often than is reasonable. (i'd say that once a day is pretty reasonable) 3. also cache or mockup or something when you're writing your code, so you're not pounding them with live hits while you're working out the bugs. as far as legality, i'm gonna leave that to someone else. citation is, of course, a really good start. Ken On Sun, Oct 2, 2011 at 22:23, Nate Hill nathanielh...@gmail.com wrote: A question: what are the 'rules' around screen scraping? If one site doesn't offer an RSS feed and you want to grab (for example) their weekly top ten list with a script and then redisplay it on another site, is that bad form? Or even illegal? Thanks- Nate -- Nate Hill nathanielh...@gmail.com http://www.natehill.net
Re: [CODE4LIB] screen scraping
On 10/2/2011 10:23 PM, Nate Hill wrote: A question: what are the 'rules' around screen scraping? If one site doesn't offer an RSS feed and you want to grab (for example) their weekly top ten list with a script and then redisplay it on another site, is that bad form? Or even illegal? If the site in question depends on advertising, what you are suggesting would be seriously uncool. If you don't get their approval and it's not for personal use, it may be a copyright violation also. r.
Re: [CODE4LIB] screen scraping
I think what I'm hearing here is that it would be a good idea to ask a webmaster on the other end if it's OK. Advertising... Roberto, good point I hadn't thought of that. Thanks. On Sun, Oct 2, 2011 at 7:46 PM, Roberto Hoyle rjho...@gmail.com wrote: On 10/2/2011 10:23 PM, Nate Hill wrote: A question: what are the 'rules' around screen scraping? If one site doesn't offer an RSS feed and you want to grab (for example) their weekly top ten list with a script and then redisplay it on another site, is that bad form? Or even illegal? If the site in question depends on advertising, what you are suggesting would be seriously uncool. If you don't get their approval and it's not for personal use, it may be a copyright violation also. r. -- Nate Hill nathanielh...@gmail.com http://www.natehill.net
Re: [CODE4LIB] screen scraping
I don’t know how well this applies to your specific use of screen-scraping, but for libraries’ broader use of crawlers to build archives, the Section 108 Study Group Recommendations are a good source of guidance (though not law). They propose specific copyright exceptions for libraries in regard to collecting and archiving “publicly accessible online content”. Their recommendations are clear sensible… they run from page 80-87 of the report. http://www.section108.gov/docs/Sec108StudyGroupReport.pdf Tracy Seneca California Digital Library From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Nate Hill [nathanielh...@gmail.com] Sent: Sunday, October 02, 2011 7:23 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] screen scraping A question: what are the 'rules' around screen scraping? If one site doesn't offer an RSS feed and you want to grab (for example) their weekly top ten list with a script and then redisplay it on another site, is that bad form? Or even illegal? Thanks- Nate -- Nate Hill nathanielh...@gmail.com http://www.natehill.net