[jira] Commented: (MAPREDUCE-1781) option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node

Hemanth Yamijala (JIRA) Thu, 13 May 2010 11:15:21 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867205#action_12867205
 ]


Hemanth Yamijala commented on MAPREDUCE-1781:
---------------------------------------------

bq. - is it possible to specify that I want 4 mappers/processors or am I 
limited to a static value at the startup of Hadoop?

The configuration per tasktracker can be different for each node, in general. 
However, that makes managing configurations much harder. Does that work for you 
now though ?

bq. which parameters are set at startup and which at job runtime.

OK. Possibly you should file a JIRA asking for this to be explained. But the 
general rule of thumb is that configurations whose names contain the names of 
daemons like 'tasktracker' will be start-up only parameters. Configurations 
whose names contain 'job' or 'task' can be overridden per job.

> option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of 
> mappers is bigger than no of nodes - always spawns 2 mapers/node
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1781
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1781
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.2
>         Environment: Debian Lenny x64, and Hadoop 0.20.2, 2GB RAM
>            Reporter: Tudor Vlad
>
> Hello
> I am a new user of Hadoop and I have some trouble using Hadoop Streaming and 
> the "-D mapred.tasktracker.map.tasks.maximum" option. 
> I'm experimenting with an unmanaged application (C++) which I want to run 
> over several nodes in 2 scenarios
> 1) the number of maps (input splits) is equal to the number of nodes
> 2) the number of maps is a multiple of the number of nodes (5, 10, 20, ...
> Initially, when running the tests in scenario 1 I would sometimes get 2 
> process/node on half the nodes. However I fixed this by adding the optin "-D 
> mapred.tasktracker.map.tasks.maximum=1", so everything works fine.
> In the case of scenario 2 (more maps than nodes) this directive no longer 
> works, always obtaining 2 processes/node. I tested the even with putting 
> maximum=5 and I still get 2 processes/node.
> The entire command I use is:
> /usr/bin/time --format="-duration:\t%e |\t-MFaults:\t%F 
> |\t-ContxtSwitch:\t%w" \
>  /opt/hadoop/bin/hadoop jar 
> /opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
>  -D mapred.tasktracker.map.tasks.maximum=1 \
>  -D mapred.map.tasks=30 \
>  -D mapred.reduce.tasks=0 \
>  -D io.file.buffer.size=5242880 \
>  -libjars "/opt/hadoop/contrib/streaming/hadoop-7debug.jar" \
>  -input input/test \
>  -output out1 \
>  -mapper "/opt/jobdata/script_1k" \
>  -inputformat "me.MyInputFormat"
> Why is this happening and how can I make it work properly (i.e. be able to 
> limit exactly how many mappers I can have at 1 time per node)?
> Thank you in advance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1781) option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node

Reply via email to