We use the Hadoop and Nutch to crawl the website. We grab the URL list from
some SQL server and split them among the cluster. When we increase the
number of mapper, the number of duplicate results increase. For example, if
the number of mapper is 2, the record maybe replicated by 2. When there are
8 instance, the result is duplicate 8 times. Any idea about this? Where can
be the problem?
-- 
View this message in context: 
http://www.nabble.com/Duplicate-Input-and-duplicate-result-tp20905297p20905297.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Reply via email to