hi,
i have two servers with nutch 0.9 and hadoop
i generate segment
exec
bin/nutch generate crawlbooks/crawldb crawlbooks/segments -topN 15000
in mapred-default.xml
<name>mapred.map.tasks</name>
<value>2</value>
<name>mapred.reduce.tasks</name>
<value>2</value>
exec
bin/nutch fetch $segment
after fetching
exec
$ bin/nutch readseg -list $segment
NAME GENERATED FETCHER START FETCHER END
FETCHED PARSED
20070509104954 7500 2007-05-09T10:54:33 2007-05-09T10:56:26
2470 2464
i'm try -topN 50000 and 100000
FETCHED result near (2400 2600) all time
i,m fetching my host, injected link type
http://myhost.com/arc/*.txt
10x
--
View this message in context:
http://www.nabble.com/fetch-problem-tf3717200.html#a10399082
Sent from the Nutch - User mailing list archive at Nabble.com.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general