Gal Nitzan wrote:
I noticed all tasktrackers are participating in the fetch.
I have only one site in the injected seed file
I have 5 tasktrackers all except one access the same site.
I just fixed a bug related to this. Please try updating.
The problem was that MapReduce recently started supporting speculative
execution, where, if some tasks appear to be executing slowly and there
are idle nodes, then these tasks automatically are run in parallel on
another node, and the results of the first that finishes are used. But
this is not appropriate for fetching. So I just added a mechanism to
Hadoop to disable it and then disabled it in the Fetcher.
Note also that the slaves file is now located in the conf/ directory, as
is a new file named hadoop-env.sh. This contains all relevant
environment variables, so that we no longer have to rely on ssh's
environment passing feature.
Doug
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers