Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
Yes. The configuration is read only when the taskTracker starts. You can see more discussion on jira HADOOP-5170 (http://issues.apache.org/jira/browse/HADOOP-5170) for making it per job. -Amareshwari jason hadoop wrote: I certainly hope it changes but I am unaware that it is in the todo queue at present. 2009/2/18 S D Thanks Jason. That's useful information. Are you aware of plans to change this so that the maximum values can be changed without restarting the server? John 2009/2/18 jason hadoop The .maximum values are only loaded by the Tasktrackers at server start time at present, and any changes you make will be ignored. 2009/2/18 S D Thanks for your response Rasit. You may have missed a portion of my post. On a different note, when I attempt to pass params via -D I get a usage message; when I use -jobconf the command goes through (and works in the case of mapred.reduce.tasks=0 for example) but I get a deprecation warning). I'm using Hadoop 0.19.0 and -D is not working. Are you using version 0.19.0 as well? John On Wed, Feb 18, 2009 at 9:14 AM, Rasit OZDAS wrote: John, did you try -D option instead of -jobconf, I had -D option in my code, I changed it with -jobconf, this is what I get: ... ... Options: -input DFS input file(s) for the Map step -outputDFS output directory for the Reduce step -mapper The streaming command to run -combiner Combiner has to be a Java class -reducerThe streaming command to run -file File/dir to be shipped in the Job jar file -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional. -outputformat TextOutputFormat(default)|JavaClassName Optional. -partitioner JavaClassName Optional. -numReduceTasks Optional. -inputreader Optional. -cmdenv =Optional. Pass env.var to streaming commands -mapdebug Optional. To run this script when a map task fails -reducedebug Optional. To run this script when a reduce task fails -verbose Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a job tracker -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] For more details about these options: Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info I think -jobconf is not used in v.0.19 . 2009/2/18 S D I'm having trouble overriding the maximum number of map tasks that run on a given machine in my cluster. The default value of mapred.tasktracker.map.tasks.maximum is set to 2 in hadoop-default.xml. When running my job I passed -jobconf mapred.tasktracker.map.tasks.maximum=1 to limit map tasks to one per machine but each machine was still allocated 2 map tasks (simultaneously). The only way I was able to guarantee a maximum of one map task per machine was to change the value of the property in hadoop-site.xml. This is unsatisfactory since I'll often be changing the maximum on a per job basis. Any hints? On a different note, when I attempt to pass params via -D I get a usage message; when I use -jobconf the command goes through (and works in the case of mapred.reduce.tasks=0 for example) but I get a deprecation warning). Thanks, John -- M. Raşit ÖZDAŞ
Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
I see, John. I also use 0.19, just to note, -D option should come first, since it's one of generic options. I use it without any errors. Cheers, Rasit 2009/2/18 S D > Thanks for your response Rasit. You may have missed a portion of my post. > > > On a different note, when I attempt to pass params via -D I get a usage > message; when I use > > -jobconf the command goes through (and works in the case of > mapred.reduce.tasks=0 for > > example) but I get a deprecation warning). > > I'm using Hadoop 0.19.0 and -D is not working. Are you using version 0.19.0 > as well? > > John > > > On Wed, Feb 18, 2009 at 9:14 AM, Rasit OZDAS wrote: > > > John, did you try -D option instead of -jobconf, > > > > I had -D option in my code, I changed it with -jobconf, this is what I > get: > > > > ... > > ... > > Options: > > -input DFS input file(s) for the Map step > > -outputDFS output directory for the Reduce step > > -mapper The streaming command to run > > -combiner Combiner has to be a Java class > > -reducerThe streaming command to run > > -file File/dir to be shipped in the Job jar file > > -inputformat > > TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName > > Optional. > > -outputformat TextOutputFormat(default)|JavaClassName Optional. > > -partitioner JavaClassName Optional. > > -numReduceTasks Optional. > > -inputreader Optional. > > -cmdenv =Optional. Pass env.var to streaming commands > > -mapdebug Optional. To run this script when a map task fails > > -reducedebug Optional. To run this script when a reduce task > fails > > > > -verbose > > > > Generic options supported are > > -conf specify an application configuration file > > -D use value for given property > > -fs specify a namenode > > -jt specify a job tracker > > -files specify comma separated files > to > > be copied to the map reduce cluster > > -libjars specify comma separated jar > > files > > to include in the classpath. > > -archives specify comma separated > > archives to be unarchived on the compute machines. > > > > The general command line syntax is > > bin/hadoop command [genericOptions] [commandOptions] > > > > For more details about these options: > > Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info > > > > > > > > I think -jobconf is not used in v.0.19 . > > > > 2009/2/18 S D > > > > > I'm having trouble overriding the maximum number of map tasks that run > on > > a > > > given machine in my cluster. The default value of > > > mapred.tasktracker.map.tasks.maximum is set to 2 in hadoop-default.xml. > > > When > > > running my job I passed > > > > > > -jobconf mapred.tasktracker.map.tasks.maximum=1 > > > > > > to limit map tasks to one per machine but each machine was still > > allocated > > > 2 > > > map tasks (simultaneously). The only way I was able to guarantee a > > maximum > > > of one map task per machine was to change the value of the property in > > > hadoop-site.xml. This is unsatisfactory since I'll often be changing > the > > > maximum on a per job basis. Any hints? > > > > > > On a different note, when I attempt to pass params via -D I get a usage > > > message; when I use -jobconf the command goes through (and works in the > > > case > > > of mapred.reduce.tasks=0 for example) but I get a deprecation > warning). > > > > > > Thanks, > > > John > > > > > > > > > > > -- > > M. Raşit ÖZDAŞ > > > -- M. Raşit ÖZDAŞ
Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
I certainly hope it changes but I am unaware that it is in the todo queue at present. 2009/2/18 S D > Thanks Jason. That's useful information. Are you aware of plans to change > this so that the maximum values can be changed without restarting the > server? > > John > > 2009/2/18 jason hadoop > > > The .maximum values are only loaded by the Tasktrackers at server start > > time > > at present, and any changes you make will be ignored. > > > > > > 2009/2/18 S D > > > > > Thanks for your response Rasit. You may have missed a portion of my > post. > > > > > > > On a different note, when I attempt to pass params via -D I get a > usage > > > message; when I use > > > > -jobconf the command goes through (and works in the case of > > > mapred.reduce.tasks=0 for > > > > example) but I get a deprecation warning). > > > > > > I'm using Hadoop 0.19.0 and -D is not working. Are you using version > > 0.19.0 > > > as well? > > > > > > John > > > > > > > > > On Wed, Feb 18, 2009 at 9:14 AM, Rasit OZDAS > > wrote: > > > > > > > John, did you try -D option instead of -jobconf, > > > > > > > > I had -D option in my code, I changed it with -jobconf, this is what > I > > > get: > > > > > > > > ... > > > > ... > > > > Options: > > > > -input DFS input file(s) for the Map step > > > > -outputDFS output directory for the Reduce step > > > > -mapper The streaming command to run > > > > -combiner Combiner has to be a Java class > > > > -reducerThe streaming command to run > > > > -file File/dir to be shipped in the Job jar file > > > > -inputformat > > > > TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName > > > > Optional. > > > > -outputformat TextOutputFormat(default)|JavaClassName Optional. > > > > -partitioner JavaClassName Optional. > > > > -numReduceTasks Optional. > > > > -inputreader Optional. > > > > -cmdenv =Optional. Pass env.var to streaming commands > > > > -mapdebug Optional. To run this script when a map task fails > > > > -reducedebug Optional. To run this script when a reduce task > > > fails > > > > > > > > -verbose > > > > > > > > Generic options supported are > > > > -conf specify an application configuration > > file > > > > -D use value for given property > > > > -fs specify a namenode > > > > -jt specify a job tracker > > > > -files specify comma separated > files > > > to > > > > be copied to the map reduce cluster > > > > -libjars specify comma separated > jar > > > > files > > > > to include in the classpath. > > > > -archives specify comma > separated > > > > archives to be unarchived on the compute machines. > > > > > > > > The general command line syntax is > > > > bin/hadoop command [genericOptions] [commandOptions] > > > > > > > > For more details about these options: > > > > Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info > > > > > > > > > > > > > > > > I think -jobconf is not used in v.0.19 . > > > > > > > > 2009/2/18 S D > > > > > > > > > I'm having trouble overriding the maximum number of map tasks that > > run > > > on > > > > a > > > > > given machine in my cluster. The default value of > > > > > mapred.tasktracker.map.tasks.maximum is set to 2 in > > hadoop-default.xml. > > > > > When > > > > > running my job I passed > > > > > > > > > > -jobconf mapred.tasktracker.map.tasks.maximum=1 > > > > > > > > > > to limit map tasks to one per machine but each machine was still > > > > allocated > > > > > 2 > > > > > map tasks (simultaneously). The only way I was able to guarantee a > > > > maximum > > > > > of one map task per machine was to change the value of the property > > in > > > > > hadoop-site.xml. This is unsatisfactory since I'll often be > changing > > > the > > > > > maximum on a per job basis. Any hints? > > > > > > > > > > On a different note, when I attempt to pass params via -D I get a > > usage > > > > > message; when I use -jobconf the command goes through (and works in > > the > > > > > case > > > > > of mapred.reduce.tasks=0 for example) but I get a deprecation > > > warning). > > > > > > > > > > Thanks, > > > > > John > > > > > > > > > > > > > > > > > > > > > -- > > > > M. Raşit ÖZDAŞ > > > > > > > > > >
Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
Thanks Jason. That's useful information. Are you aware of plans to change this so that the maximum values can be changed without restarting the server? John 2009/2/18 jason hadoop > The .maximum values are only loaded by the Tasktrackers at server start > time > at present, and any changes you make will be ignored. > > > 2009/2/18 S D > > > Thanks for your response Rasit. You may have missed a portion of my post. > > > > > On a different note, when I attempt to pass params via -D I get a usage > > message; when I use > > > -jobconf the command goes through (and works in the case of > > mapred.reduce.tasks=0 for > > > example) but I get a deprecation warning). > > > > I'm using Hadoop 0.19.0 and -D is not working. Are you using version > 0.19.0 > > as well? > > > > John > > > > > > On Wed, Feb 18, 2009 at 9:14 AM, Rasit OZDAS > wrote: > > > > > John, did you try -D option instead of -jobconf, > > > > > > I had -D option in my code, I changed it with -jobconf, this is what I > > get: > > > > > > ... > > > ... > > > Options: > > > -input DFS input file(s) for the Map step > > > -outputDFS output directory for the Reduce step > > > -mapper The streaming command to run > > > -combiner Combiner has to be a Java class > > > -reducerThe streaming command to run > > > -file File/dir to be shipped in the Job jar file > > > -inputformat > > > TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName > > > Optional. > > > -outputformat TextOutputFormat(default)|JavaClassName Optional. > > > -partitioner JavaClassName Optional. > > > -numReduceTasks Optional. > > > -inputreader Optional. > > > -cmdenv =Optional. Pass env.var to streaming commands > > > -mapdebug Optional. To run this script when a map task fails > > > -reducedebug Optional. To run this script when a reduce task > > fails > > > > > > -verbose > > > > > > Generic options supported are > > > -conf specify an application configuration > file > > > -D use value for given property > > > -fs specify a namenode > > > -jt specify a job tracker > > > -files specify comma separated files > > to > > > be copied to the map reduce cluster > > > -libjars specify comma separated jar > > > files > > > to include in the classpath. > > > -archives specify comma separated > > > archives to be unarchived on the compute machines. > > > > > > The general command line syntax is > > > bin/hadoop command [genericOptions] [commandOptions] > > > > > > For more details about these options: > > > Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info > > > > > > > > > > > > I think -jobconf is not used in v.0.19 . > > > > > > 2009/2/18 S D > > > > > > > I'm having trouble overriding the maximum number of map tasks that > run > > on > > > a > > > > given machine in my cluster. The default value of > > > > mapred.tasktracker.map.tasks.maximum is set to 2 in > hadoop-default.xml. > > > > When > > > > running my job I passed > > > > > > > > -jobconf mapred.tasktracker.map.tasks.maximum=1 > > > > > > > > to limit map tasks to one per machine but each machine was still > > > allocated > > > > 2 > > > > map tasks (simultaneously). The only way I was able to guarantee a > > > maximum > > > > of one map task per machine was to change the value of the property > in > > > > hadoop-site.xml. This is unsatisfactory since I'll often be changing > > the > > > > maximum on a per job basis. Any hints? > > > > > > > > On a different note, when I attempt to pass params via -D I get a > usage > > > > message; when I use -jobconf the command goes through (and works in > the > > > > case > > > > of mapred.reduce.tasks=0 for example) but I get a deprecation > > warning). > > > > > > > > Thanks, > > > > John > > > > > > > > > > > > > > > > -- > > > M. Raşit ÖZDAŞ > > > > > >
Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
The .maximum values are only loaded by the Tasktrackers at server start time at present, and any changes you make will be ignored. 2009/2/18 S D > Thanks for your response Rasit. You may have missed a portion of my post. > > > On a different note, when I attempt to pass params via -D I get a usage > message; when I use > > -jobconf the command goes through (and works in the case of > mapred.reduce.tasks=0 for > > example) but I get a deprecation warning). > > I'm using Hadoop 0.19.0 and -D is not working. Are you using version 0.19.0 > as well? > > John > > > On Wed, Feb 18, 2009 at 9:14 AM, Rasit OZDAS wrote: > > > John, did you try -D option instead of -jobconf, > > > > I had -D option in my code, I changed it with -jobconf, this is what I > get: > > > > ... > > ... > > Options: > > -input DFS input file(s) for the Map step > > -outputDFS output directory for the Reduce step > > -mapper The streaming command to run > > -combiner Combiner has to be a Java class > > -reducerThe streaming command to run > > -file File/dir to be shipped in the Job jar file > > -inputformat > > TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName > > Optional. > > -outputformat TextOutputFormat(default)|JavaClassName Optional. > > -partitioner JavaClassName Optional. > > -numReduceTasks Optional. > > -inputreader Optional. > > -cmdenv =Optional. Pass env.var to streaming commands > > -mapdebug Optional. To run this script when a map task fails > > -reducedebug Optional. To run this script when a reduce task > fails > > > > -verbose > > > > Generic options supported are > > -conf specify an application configuration file > > -D use value for given property > > -fs specify a namenode > > -jt specify a job tracker > > -files specify comma separated files > to > > be copied to the map reduce cluster > > -libjars specify comma separated jar > > files > > to include in the classpath. > > -archives specify comma separated > > archives to be unarchived on the compute machines. > > > > The general command line syntax is > > bin/hadoop command [genericOptions] [commandOptions] > > > > For more details about these options: > > Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info > > > > > > > > I think -jobconf is not used in v.0.19 . > > > > 2009/2/18 S D > > > > > I'm having trouble overriding the maximum number of map tasks that run > on > > a > > > given machine in my cluster. The default value of > > > mapred.tasktracker.map.tasks.maximum is set to 2 in hadoop-default.xml. > > > When > > > running my job I passed > > > > > > -jobconf mapred.tasktracker.map.tasks.maximum=1 > > > > > > to limit map tasks to one per machine but each machine was still > > allocated > > > 2 > > > map tasks (simultaneously). The only way I was able to guarantee a > > maximum > > > of one map task per machine was to change the value of the property in > > > hadoop-site.xml. This is unsatisfactory since I'll often be changing > the > > > maximum on a per job basis. Any hints? > > > > > > On a different note, when I attempt to pass params via -D I get a usage > > > message; when I use -jobconf the command goes through (and works in the > > > case > > > of mapred.reduce.tasks=0 for example) but I get a deprecation > warning). > > > > > > Thanks, > > > John > > > > > > > > > > > -- > > M. Raşit ÖZDAŞ > > >
Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
Thanks for your response Rasit. You may have missed a portion of my post. > On a different note, when I attempt to pass params via -D I get a usage message; when I use > -jobconf the command goes through (and works in the case of mapred.reduce.tasks=0 for > example) but I get a deprecation warning). I'm using Hadoop 0.19.0 and -D is not working. Are you using version 0.19.0 as well? John On Wed, Feb 18, 2009 at 9:14 AM, Rasit OZDAS wrote: > John, did you try -D option instead of -jobconf, > > I had -D option in my code, I changed it with -jobconf, this is what I get: > > ... > ... > Options: > -input DFS input file(s) for the Map step > -outputDFS output directory for the Reduce step > -mapper The streaming command to run > -combiner Combiner has to be a Java class > -reducerThe streaming command to run > -file File/dir to be shipped in the Job jar file > -inputformat > TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName > Optional. > -outputformat TextOutputFormat(default)|JavaClassName Optional. > -partitioner JavaClassName Optional. > -numReduceTasks Optional. > -inputreader Optional. > -cmdenv =Optional. Pass env.var to streaming commands > -mapdebug Optional. To run this script when a map task fails > -reducedebug Optional. To run this script when a reduce task fails > > -verbose > > Generic options supported are > -conf specify an application configuration file > -D use value for given property > -fs specify a namenode > -jt specify a job tracker > -files specify comma separated files to > be copied to the map reduce cluster > -libjars specify comma separated jar > files > to include in the classpath. > -archives specify comma separated > archives to be unarchived on the compute machines. > > The general command line syntax is > bin/hadoop command [genericOptions] [commandOptions] > > For more details about these options: > Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info > > > > I think -jobconf is not used in v.0.19 . > > 2009/2/18 S D > > > I'm having trouble overriding the maximum number of map tasks that run on > a > > given machine in my cluster. The default value of > > mapred.tasktracker.map.tasks.maximum is set to 2 in hadoop-default.xml. > > When > > running my job I passed > > > > -jobconf mapred.tasktracker.map.tasks.maximum=1 > > > > to limit map tasks to one per machine but each machine was still > allocated > > 2 > > map tasks (simultaneously). The only way I was able to guarantee a > maximum > > of one map task per machine was to change the value of the property in > > hadoop-site.xml. This is unsatisfactory since I'll often be changing the > > maximum on a per job basis. Any hints? > > > > On a different note, when I attempt to pass params via -D I get a usage > > message; when I use -jobconf the command goes through (and works in the > > case > > of mapred.reduce.tasks=0 for example) but I get a deprecation warning). > > > > Thanks, > > John > > > > > > -- > M. Raşit ÖZDAŞ >
Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
John, did you try -D option instead of -jobconf, I had -D option in my code, I changed it with -jobconf, this is what I get: ... ... Options: -input DFS input file(s) for the Map step -outputDFS output directory for the Reduce step -mapper The streaming command to run -combiner Combiner has to be a Java class -reducerThe streaming command to run -file File/dir to be shipped in the Job jar file -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional. -outputformat TextOutputFormat(default)|JavaClassName Optional. -partitioner JavaClassName Optional. -numReduceTasks Optional. -inputreader Optional. -cmdenv =Optional. Pass env.var to streaming commands -mapdebug Optional. To run this script when a map task fails -reducedebug Optional. To run this script when a reduce task fails -verbose Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a job tracker -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] For more details about these options: Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info I think -jobconf is not used in v.0.19 . 2009/2/18 S D > I'm having trouble overriding the maximum number of map tasks that run on a > given machine in my cluster. The default value of > mapred.tasktracker.map.tasks.maximum is set to 2 in hadoop-default.xml. > When > running my job I passed > > -jobconf mapred.tasktracker.map.tasks.maximum=1 > > to limit map tasks to one per machine but each machine was still allocated > 2 > map tasks (simultaneously). The only way I was able to guarantee a maximum > of one map task per machine was to change the value of the property in > hadoop-site.xml. This is unsatisfactory since I'll often be changing the > maximum on a per job basis. Any hints? > > On a different note, when I attempt to pass params via -D I get a usage > message; when I use -jobconf the command goes through (and works in the > case > of mapred.reduce.tasks=0 for example) but I get a deprecation warning). > > Thanks, > John > -- M. Raşit ÖZDAŞ
Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
I'm having trouble overriding the maximum number of map tasks that run on a given machine in my cluster. The default value of mapred.tasktracker.map.tasks.maximum is set to 2 in hadoop-default.xml. When running my job I passed -jobconf mapred.tasktracker.map.tasks.maximum=1 to limit map tasks to one per machine but each machine was still allocated 2 map tasks (simultaneously). The only way I was able to guarantee a maximum of one map task per machine was to change the value of the property in hadoop-site.xml. This is unsatisfactory since I'll often be changing the maximum on a per job basis. Any hints? On a different note, when I attempt to pass params via -D I get a usage message; when I use -jobconf the command goes through (and works in the case of mapred.reduce.tasks=0 for example) but I get a deprecation warning). Thanks, John