[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334598#comment-14334598
 ] 

Ratandeep Ratti commented on MAPREDUCE-5653:
--------------------------------------------

Hi [~jlowe], [~mithun]
 We have also been hit by this recently. I spent some time investigating this.

Distcp has two modes of execution. 1 from the cmdline and the other is 
programmatically.

The patch will work correctly for programmatical usage if settings from 
mapred-site.xml have already been applied to the  input  *Configuration* 
parameter as the properties set by distcp-default.xml will not be overridden 
again since mapred-site (and also mapred-default/yarn-default/yarn-site) is 
loaded as a *default resource* before job submission.

For command line usage  Distcp adds distcp-default.xml as a *resource* (and not 
as a default resource) which would take higher precedence than default/site 
files mentioned before as they are loaded as *default resources* . Even if 
Distcp adds distcp-default.xml as a default resource, the code will be brittle 
and prone to which default resources are loaded first since 
mapred-site/mapred-default/yarn-site/yarn-default are all loaded in static 
blocks in classes org.apache.hadoop.mapreduce.{Job, Cluster} 

Since distcp is just like any other MR job I think the best way would be to get 
rid of un-needed conf from distcp-default.xml.
Below are the properties mentioned in distcp-default.xml
{noformat}
distcp.dynamic.strategy.impl
distcp.static.strategy.impl
mapred.job.map.memory.mb
mapred.job.reduce.memory.mb
mapred.reducer.new-api
mapreduce.reduce.class
{noformat}

Seems like getting rid of {noformat}mapred.job.{map|reduce}.memory.mb{noformat} 
is all we need as the rest are required by distcp.
Any other configuration the user wants to specify in distcp can very well be 
specified as jvm opts for cmd line usage and as simple parameters to 
Configuration option for programmatical usage.

Please update with your thoughts/concerns.

> DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5653
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5653
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: trunk, 0.23.9, 2.2.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: MAPREDUCE-5653.branch-0.23.patch, 
> MAPREDUCE-5653.branch-2.patch, MAPREDUCE-5653.trunk.patch
>
>
> When a DistCp job is run through Oozie (through a Java action that launches 
> DistCp), one sees that mapred.child.java.opts as set from the caller is 
> honoured by DistCp. But, DistCp doesn't seem to honour any overrides for 
> configs mapreduce.[map,reduce].memory.mb.
> Problem has been identified. I'll post a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to