I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111.
In nutch-site.xml I specified parameters:
1) On the both machines:
property
namefs.default.name/name
value192.168.0.250:9009/value
descriptionThe name of the default file system. Either the
literal string local or a host:port for NDFS./description
/property
property
namemapred.job.tracker/name
value192.168.0.250:9010/value
descriptionThe host and port that the MapReduce job tracker runs
at. If local, then jobs are run in-process as a single map
and reduce task.
/description
/property
property
namemapred.map.tasks/name
value2/value
descriptionThe default number of map tasks per job. Typically set
to a prime several times greater than number of available hosts.
Ignored when mapred.job.tracker is local.
/description
/property
property
namemapred.tasktracker.tasks.maximum/name
value2/value
descriptionThe maximum number of tasks that will be run
simultaneously by a task tracker.
/description
/property
property
namemapred.reduce.tasks/name
value2/value
descriptionThe default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is local.
/description
/property
On 192.168.0.250 I started:
2) bin/nutch-daemon.sh start datanode
3) bin/nutch-daemon.sh start namenode
4) bin/nutch-daemon.sh start jobtracker
5) bin/nutch-daemon.sh start tasktracker
I created directory seeds and file urls in it. Urls contained 2 links.
Then I added that directory to NDFS (bin/nutch ndfs -put ./seeds seeds).
Directory was added successfully..
Then I launched command:
bin/nutch crawl seeds -depth 2
I a result I received log written by jobtracker:
051123 053118 Adding task 'task_m_z66npx' to set for tracker 'tracker_53845'
051123 053118 Adding task 'task_m_xaynqo' to set for tracker 'tracker_11518'
051123 053130 Task 'task_m_z66npx' has finished successfully.
Log written by tasktracker on 192.168.0.111:
..
051110 142607 task_m_z66npx 0.0% /user/root/seeds/urls:0+31
051110 142607 task_m_z66npx 1.0% /user/root/seeds/urls:0+31
051110 142607 Task task_m_z66npx is done.
Log written by tasktracker on 192.168.0.250:
051123 053125 task_m_xaynqo 0.12903225% /user/root/seeds/urls:31+31
051123 053126 task_m_xaynqo -683.9677% /user/root/seeds/urls:31+31
051123 053127 task_m_xaynqo -2129.9678% /user/root/seeds/urls:31+31
051123 053128 task_m_xaynqo -3483.0322% /user/root/seeds/urls:31+31
051123 053129 task_m_xaynqo -4976.2256% /user/root/seeds/urls:31+31
051123 053130 task_m_xaynqo -6449.1934% /user/root/seeds/urls:31+31
051123 053131 task_m_xaynqo -7898.258% /user/root/seeds/urls:31+31
051123 053132 task_m_xaynqo -9232.193% /user/root/seeds/urls:31+31
051123 053133 task_m_xaynqo -10694.3545% /user/root/seeds/urls:31+31
051123 053134 task_m_xaynqo -12139.226% /user/root/seeds/urls:31+31
051123 053135 task_m_xaynqo -13416.677% /user/root/seeds/urls:31+31
051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31
... and so on... e.g. in this log were records with reducing percents.
I concluded that was an attempt to separate inject to 2 machines e.g.
were 2 tasks: 'task_m_z66npx' and 'task_m_xaynqo'. And 'task_m_z66npx'
was finished successfully and 'task_m_xaynqo' caused some problems (negative
progress).
But if I change parameter mapred.reduce.tasks to 4 all tasks finished
successfully and all work right.
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 22, 2005 2:10 AM
To: nutch-dev@lucene.apache.org
Subject: Re: mapred.map.tasks
[EMAIL PROTECTED] wrote:
Why we need parameter mapred.map.tasks greater than number of available
host? If we set it equal to number of host, we got negative progress
percentages problem.
Can you please post a simple example that demonstrates the negative
progress problem? E.g., the minimal changes to your conf/ directory
required to illustrate this, how you start your daemons, etc.
Thanks,
Doug