A nutch bot is crawling a 'submit' page on my site, and it shouldn't.
It's the only bot that hits it, and unfortunately it generates a blank
email.

Needless to say, I now know I need to change my software so that it
doesn't generate an email on a false hit, but the bot shouldn't be
spidering it anyway.  The only way to get there is via a form submit
'action'.  There is no href 'link'.

I've also just added a robots.txt entry, so if the software works as
advertised, I'm not likely to see any more of these.

A couple of log entries showing the issue:
131.112.16.140 - - [16/Feb/2006:18:51:01 -0800] "GET /booking/submit.php
HTTP/1.0" 200 3728 "-" "NutchCVS/0.7.1 (Nutch;
http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"
131.112.16.220 - - [12/Mar/2006:22:50:49 -0800] "GET /booking/submit.php
HTTP/1.0" 200 3761 "-" "NutchCVS/0.7.1 (Nutch;
http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"

Reply via email to