cairo.ee.ucla.edu - - [19/Jan/2006:13:10:26 -0800] "GET /archives/best/
index.html HTTP/1.0" 200 5096 "-" "NutchCVS/0.8-dev (Nutch; http://
lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"
cairo.ee.ucla.edu - - [19/Jan/2006:13:10:27 -0800] "GET /ftpfiles.html 
HTTP/1.0" 200 5353 "-" "NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/
nutch/bot.html; nutch-agent@lucene.apache.org)"
cairo.ee.ucla.edu - - [19/Jan/2006:13:10:27 -0800] "GET /faqs/medi-cont.html 
HTTP/1.0" 200 25734 "-" "NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/
nutch/bot.html; nutch-agent@lucene.apache.org)"
cairo.ee.ucla.edu - - [19/Jan/2006:13:10:27 -0800] "GET /eclectic/felter/
index.html HTTP/1.0" 200 37998 "-" "NutchCVS/0.8-dev (Nutch; http://
lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"
cairo.ee.ucla.edu - - [19/Jan/2006:13:10:27 -0800] "GET /eclectic/kings/
index.html HTTP/1.0" 200 61955 "-" "NutchCVS/0.8-dev (Nutch; http://
lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"
cairo.ee.ucla.edu - - [19/Jan/2006:13:10:28 -0800] "GET /index.html HTTP/1.0" 
200 4542 "-" "NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/nutch/
bot.html; nutch-agent@lucene.apache.org)"
cairo.ee.ucla.edu - - [19/Jan/2006:13:10:28 -0800] "GET /gbx.php HTTP/1.0" 200 
48551 "-" "NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/nutch/bot.html; 
nutch-agent@lucene.apache.org)"
cairo.ee.ucla.edu - - [19/Jan/2006:13:10:29 -0800] "GET /eclectic/ellingwood/
index.html HTTP/1.0" 200 30049 "-" "NutchCVS/0.8-dev (Nutch; http://
lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"

That gbx.php is my guestbook, which I've blocked in robots.txt.
http://www.henriettesherbal.com/robots.txt

They hit a bot trap later on and got blocked, but nutch only picked up 3 files 
after it got the first 403.

Thanks,
Henriette

-- 
Henriette Kress, AHG                       Helsinki, Finland
Henriette's herbal homepage: http://www.henriettesherbal.com

Reply via email to