Shouldn't be a problem if your honouring the robots.txt
Legal issues could be Stealing Copyrighted Material? thats if your
reproducing it but if your analysing the content and links and keeping to
the robots.txt rules I doubt your have a problem unless its crawling every
10 minutes,
wouldn't grabbing the RSS feed be better?
would http://diggdot.us be a good example of what your trying to do? or have
i got the wrong idea entirely?
Any one else have any thoughts?
_gk
----- Original Message -----
From: "Berlin Brown" <[EMAIL PROTECTED]>
To: <nutch-user@lucene.apache.org>
Sent: Thursday, March 30, 2006 8:13 AM
Subject: Legal issues
What are say the legal issues of crawling a site like reddit, digg or
slashdot. Assuming that you are just collecting links that users post
through that service and then you are regathering those links. I
can't see an issue there.
The other extreme would be crawling google and requerying or something
along those lines.