Shouldn't be a problem if your honouring the robots.txt

Legal issues could be Stealing Copyrighted Material? thats if your reproducing it but if your analysing the content and links and keeping to the robots.txt rules I doubt your have a problem unless its crawling every 10 minutes,

wouldn't grabbing the RSS feed be better?

would http://diggdot.us be a good example of what your trying to do? or have i got the wrong idea entirely?

Any one else have any thoughts?

_gk

----- Original Message ----- From: "Berlin Brown" <[EMAIL PROTECTED]>
To: <nutch-user@lucene.apache.org>
Sent: Thursday, March 30, 2006 8:13 AM
Subject: Legal issues


What are say the legal issues of crawling a site like reddit, digg or
slashdot.  Assuming that you are just collecting links that users post
through that service and then you are regathering those links.  I
can't see an issue there.

The other extreme would be crawling google and requerying or something
along those lines.

Reply via email to