There is a mapred.tasktracker.map.tasks.maximum configuration variable,
which defaults to 2 for hadoop in distributed mode. It defines the max
number of map tasks concurrently per tasktracker. With that though you
should have 1 task per machine/tasktracker, seems like you only have one
running total? Any more information about your setup or what you are
seeing?
To answer your previous question fetcher.threads.fetch defines the
number of fetchers inside of a map task. So say you have 2 map tasks on
each task tracker, 10 task trackers, and 20 threads = 2 * 10 * 20 = 400
concurrent fetchers.
Dennis
caezar wrote:
From many domains. It generates 15 tasks, as configured. Each task is
executed. One by one, no tasks run simultaneosly.
Dennis Kubes-2 wrote:
Are you fetching urls from a random set or all from a single domain? If
all from a single domain (including subdomains) then the partitionar for
fetcher will put them all into a single map task.
Dennis