>When generator runs in distributed mode, it partitions urls to seperate map tasks according to their hosts. >This way, urls under the same host end up in the same map task (which is necessary for politeness). So, >in your case, you either have very few hosts (of which one has almost 100K urls) or there is a problem >with partitioning.
Got it. Yup, all the urls are from one host. I understand it's not polite, but is there any configuration setting that'll chagne that? patrik -----Original Message----- From: Do?acan Guney [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 12, 2007 12:05 AM To: [EMAIL PROTECTED] Subject: Re: Nutch/Hadoop Fetcher confusion Hi, On 6/12/07, patrik <[EMAIL PROTECTED]> wrote: > > I'm running Nutch 0.8.1 on 3 servers. Everything works fine, but I'm > confused about some Fetcher behavior. I'll generate a list of 100k > urls to fetch, that works fine. However, only 1 server in the cluster > actually fetches a reasonable number. 2 out of three go get at most 20 > pages. I've gotta believe I'm just missing some important > configuration settings. When generator runs in distributed mode, it partitions urls to seperate map tasks according to their hosts. This way, urls under the same host end up in the same map task (which is necessary for politeness). So, in your case, you either have very few hosts (of which one has almost 100K urls) or there is a problem with partitioning. > > Patrik > [...snip...] -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
