derevo wrote:
> hi, 
> (2 servers hadoop nutch)
> 
> I am try to fetch my host with txt files ( http://site.net/file_1.txt ).
> More then 150000 txt files. 
> when i start fetch and look in access.log file in target host, i see only
> one slave host do fetch (SLAVE_1). 
> I try to restart fetching and slave host now is (SLAVE_2). 
> 
> in Task Tracker Status i see the same result

Fetchlist is by default partitioned in a way that all urls for same host
 will end up being fetched by a single node see PartitionUrlByHost.

To override this you would need to change the partitioner or stop using
it (both would require source code changes)

-- 
 Sami Siren

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to