Hi all,
I am using crawl tool in Nutch81 under cygwin,trying to retrieve
pages from about 2 thousand websites,and the crawl process has been
running for nearly 20 hours.
But during the past 10 hours, the fetch status always remain the
same as below:
TOTAL urls: 165212
retry 0: 164110
retry 1: 814
retry 2: 288
min score: 0.0
avg score: 0.029228665
max score: 2.333
status 1 (DB_unfetched): 134960
status 2 (DB_fetched): 27812
status 3 (DB_gone): 2440
all the number in the status remain the same. DB_fetched page always
is 27812. From the console output and hadoop.log I can see the the
page fetching process is running without any error.
the size of the crawl db also have no change,always be 328M.
I have tried to solve this problem during all the last week. any hints
for this problem is appreciated. Thanks and bow~~~
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers