[Nutch-dev] Re: All tasktrackers access same site at the same time (hadoop) please help

Doug Cutting Wed, 15 Feb 2006 15:04:01 -0800

Gal Nitzan wrote:

I noticed all tasktrackers are participating in the fetch.


I have only one site in the injected seed file

I have 5 tasktrackers all except one access the same site.


I just fixed a bug related to this.  Please try updating.

The problem was that MapReduce recently started supporting speculativeexecution, where, if some tasks appear to be executing slowly and thereare idle nodes, then these tasks automatically are run in parallel onanother node, and the results of the first that finishes are used. Butthis is not appropriate for fetching. So I just added a mechanism toHadoop to disable it and then disabled it in the Fetcher.

Note also that the slaves file is now located in the conf/ directory, asis a new file named hadoop-env.sh. This contains all relevantenvironment variables, so that we no longer have to rely on ssh'senvironment passing feature.


Doug


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: All tasktrackers access same site at the same time (hadoop) please help

Reply via email to