Dear Nutch Project Gurus,

I'm the webmaster of http://swisspig.net/, and I have noticed periodic access by the Nutch crawler at U Washington. However, today's access was strange, in that it attempted to crawl to a *portion* of a URL (which of course is not a link in itself). This might be a bug in the crawler, or a bug in a modification made by the UW folks. The relevant log snippets are:

128.208.6.200 - - [11/Jun/2006:18:27:27 -0400] "GET /robots.txt HTTP/1.0" 200 262 "" "NutchCVS/0.8-dev (Nutch running at UW; http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])" 128.208.6.200 - - [11/Jun/2006:18:27:28 -0400] "GET /post.php HTTP/1.0" 200 25000 "" "NutchCVS/0.8-dev (Nutch running at UW; http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])" 128.208.6.200 - - [11/Jun/2006:18:27:33 -0400] "GET / HTTP/1.0" 200 25000 "" "NutchCVS/0.8-dev (Nutch running at UW; http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])" 128.208.6.200 - - [11/Jun/2006:18:27:38 -0400] "GET /r/post/ HTTP/1.0" 200 25000 "" "NutchCVS/0.8-dev (Nutch running at UW; http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"

Please note that http://swisspig.net/post.php and http://swisspig.net/r/post/ are scripts (the same script actually -- I recently migrated from the format "/post.php?id=foo" to "/r/post/foo") that are not meant to be accessed directly. There are of course no links from http://swisspig.net/ to these URLs.


Regards,
Brian Ziman
webmaster, swisspig.net

Reply via email to