hi, 

i have two servers with nutch 0.9  and hadoop 

i generate segment 
exec
bin/nutch generate crawlbooks/crawldb crawlbooks/segments -topN 15000

in  mapred-default.xml

                <name>mapred.map.tasks</name>
                <value>2</value>

                <name>mapred.reduce.tasks</name>
                <value>2</value>


exec
bin/nutch fetch $segment 

after fetching 

exec
$ bin/nutch readseg -list $segment

NAME            GENERATED       FETCHER START           FETCHER END            
FETCHED PARSED
20070509104954  7500            2007-05-09T10:54:33     2007-05-09T10:56:26    
2470    2464


i'm try -topN 50000 and 100000  

FETCHED result near (2400 2600) all time

i,m fetching my host, injected link type

http://myhost.com/arc/*.txt 



10x


 




-- 
View this message in context: 
http://www.nabble.com/fetch-problem-tf3717200.html#a10399082
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to