derevo wrote: > hi, > (2 servers hadoop nutch) > > I am try to fetch my host with txt files ( http://site.net/file_1.txt ). > More then 150000 txt files. > when i start fetch and look in access.log file in target host, i see only > one slave host do fetch (SLAVE_1). > I try to restart fetching and slave host now is (SLAVE_2). > > in Task Tracker Status i see the same result
Fetchlist is by default partitioned in a way that all urls for same host will end up being fetched by a single node see PartitionUrlByHost. To override this you would need to change the partitioner or stop using it (both would require source code changes) -- Sami Siren ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
