On 6/12/07, patrik <[EMAIL PROTECTED]> wrote: > >When generator runs in distributed mode, it partitions urls to seperate > map tasks according to their hosts. > >This way, urls under the same host end up in the same map task (which > is necessary for politeness). So, > >in your case, you either have very few hosts (of which one has almost > 100K urls) or there is a problem > >with partitioning. > > Got it. Yup, all the urls are from one host. I understand it's not > polite, but is there any configuration setting that'll chagne that?
There is no configuration option. You may change PartitionUrlByHost's code so that it no longer partitions urls by host :) I think you may also run a segment merge. If you run segmerge on a single segment(where you set number of reduce tasks to the desired number of fetchers) segmerge will put equal number of urls to every part. Then set fetcher.max.threads.per.host to a value greater than 1 and you have a very unpolite fetcher. Please don't run this to fetch a site you don't control :) -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
