>When generator runs in distributed mode, it partitions urls to seperate
map tasks according to their hosts. 
>This way, urls under the same host end up in the same map task (which
is necessary for politeness). So, 
>in your case, you either have very few hosts (of which one has almost
100K urls) or there is a problem 
>with partitioning.

Got it. Yup, all the urls are from one host. I understand it's not
polite, but is there any configuration setting that'll chagne that?

patrik

-----Original Message-----
From: Do?acan Guney [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 12, 2007 12:05 AM
To: [EMAIL PROTECTED]
Subject: Re: Nutch/Hadoop Fetcher confusion


Hi,

On 6/12/07, patrik <[EMAIL PROTECTED]> wrote:
>
> I'm running Nutch 0.8.1 on 3 servers. Everything works fine, but I'm 
> confused about some Fetcher behavior. I'll generate a list of 100k 
> urls to fetch, that works fine. However, only 1 server in the cluster 
> actually fetches a reasonable number. 2 out of three go get at most 20

> pages. I've gotta believe I'm just missing some important 
> configuration settings.

When generator runs in distributed mode, it partitions urls to seperate
map tasks according to their hosts. This way, urls under the same host
end up in the same map task (which is necessary for politeness). So, in
your case, you either have very few hosts (of which one has almost 100K
urls) or there is a problem with partitioning.

>
> Patrik
>
[...snip...]


-- 
Doğacan Güney


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to