Duplicate Input and duplicate result

zehua Mon, 08 Dec 2008 14:58:31 -0800

We use the Hadoop and Nutch to crawl the website. We grab the URL list from
some SQL server and split them among the cluster. When we increase the
number of mapper, the number of duplicate results increase. For example, if
the number of mapper is 2, the record maybe replicated by 2. When there are
8 instance, the result is duplicate 8 times. Any idea about this? Where can
be the problem?
-- 
View this message in context: 
http://www.nabble.com/Duplicate-Input-and-duplicate-result-tp20905297p20905297.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Duplicate Input and duplicate result

Reply via email to