Hi, it appears that nutch doesn't obey the "Crawl-Delay:" robots.txt statement. Out robots.txt defines a crawl-delay of 30, and most robots seem to obey it, unlike this nuch from tonight:
209.235.6.4 wikipedia.7val.com - - [30/May/2006:06:34:40 +0200] "GET /w/index.php?title=Category:178_births&from=R/7val-fit-sid=ecb6f5bd55541ca2a7be6c12ff597620 HTTP/1.0" 200 8537 "-" "Nokia6620/2.0 (4.22.1) SymbianOS/7.0s Series60/2.1 Profile/MIDP-2.0 Configuration/CLDC-1.0/0.7.2 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" pid:21925 209.235.6.4 wikipedia.7val.com - - [30/May/2006:06:34:40 +0200] "GET /w/index.php?title=Category:1681_births&from=A/7val-fit-sid=4600da9ec83ea98b83f398b550c73720 HTTP/1.0" 200 12338 "-" "Nokia6620/2.0 (4.22.1) SymbianOS/7.0s Series60/2.1 Profile/MIDP-2.0 Configuration/CLDC-1.0/0.7.2 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" pid:21926 209.235.6.4 wikipedia.7val.com - - [30/May/2006:06:34:40 +0200] "GET /w/index.php?title=Category:1654_deaths&from=S/7val-fit-sid=7ef0086dd404bda46cae6effe8cee010 HTTP/1.0" 200 10457 "-" "Nokia6620/2.0 (4.22.1) SymbianOS/7.0s Series60/2.1 Profile/MIDP-2.0 Configuration/CLDC-1.0/0.7.2 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" pid:21927 209.235.6.4 wikipedia.7val.com - - [30/May/2006:06:34:40 +0200] "GET /w/index.php?title=Category:1702_births&from=W/7val-fit-sid=bdb93afe3a197c01d6472a54fcc6c220 HTTP/1.0" 200 8851 "-" "Nokia6620/2.0 (4.22.1) SymbianOS/7.0s Series60/2.1 Profile/MIDP-2.0 Configuration/CLDC-1.0/0.7.2 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" pid:21921 209.235.6.4 wikipedia.7val.com - - [30/May/2006:06:34:40 +0200] "GET /w/index.php?title=Category:1674_births&from=K/7val-fit-sid=86f70bf8064640d3a02bda22f2827610 HTTP/1.0" 200 12428 "-" "Nokia6620/2.0 (4.22.1) SymbianOS/7.0s Series60/2.1 Profile/MIDP-2.0 Configuration/CLDC-1.0/0.7.2 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" pid:21920 Do current versions of nutch support crawl-delay, or could you add this to future versions? regards, rainer canavan -- Rainer Canavan Head Of System Administration Sevenval AG Bahnhofsvorplatz 1 50667 K�ln Phone +49 221 6500789 Fax +49 221 6500788 Mobile +49 162 2048089