[jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb

2014-07-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060919#comment-14060919
 ] 

Rohini Palaniswamy commented on MAPREDUCE-5785:
---

I was taking a look at https://issues.apache.org/jira/browse/MAPREDUCE-5785 and 
https://issues.apache.org/jira/browse/TEZ-699 . 
 
MR:
{code}
public static final float DEFAULT_MEMORY_MB_HEAP_RATIO = 1.33f;
float heapRatio = conf.getFloat(MRJobConfig.MEMORY_MB_HEAP_RATIO,
  MRJobConfig.DEFAULT_MEMORY_MB_HEAP_RATIO);
int taskHeapSize =(int)Math.ceil(taskContainerMb / heapRatio);
public static final float DEFAULT_IO_SORT_MB_HEAP_RATIO = 0.5f;
ioSortMbPer = JobContext.DEFAULT_IO_SORT_MB_HEAP_RATIO;
+}
+sortmb = (int)(maxHeapMb * ioSortMbPer);
{code}
Tez:
{code}
public static final String TEZ_CONTAINER_MAX_JAVA_HEAP_FRACTION =
+  TEZ_PREFIX + container.max.java.heap.fraction;
+  public static final double TEZ_CONTAINER_MAX_JAVA_HEAP_FRACTION_DEFAULT = 
0.8;
int maxMemory = (int)(resource.getMemory() * maxHeapFactor);
 {code}

Few issues, inconsistencies that I see:
  - The MR one is really confusing. For heap it is divide by and io.sort.mb it 
is multiplication. I think it would be easier to keep both same to avoid 
confusion.  I had to apply more of my grey cells to do the division. Would 
prefer multiplication to determine the percentage as it is more easy to compute 
in mind than division.
   -  io.sort.mb as 50% of heap seems too high for default value. Most of the 
pig jobs which have huge bags would start failing.
- Another issue is taking the defaults now, for a 
4G container – Tez Xmx = 3.2G, MR Xmx=3.0G
8G container – Tez Xmx = 6.2G, MR xmx = 6G. 
Though the defaults work well for 1 or 2G of memory, for higher memories they 
seem to be actually wasting a lot of memory considering not more than 500M is 
usually required for native memory even if the Xmx keeps increasing. We should 
account that factor into calculation instead of doing Xmx as just a direct 
percentage of resource.mb.

Tez settings are usually equivalent of MR settings with an internal mapping to 
the MR setting taking them if they are specified, so that it is easier for 
users to switch between frameworks. This is one thing I am seeing inconsistent 
in terms of how the value is specified and would be good to reconcile both to 
have same behavior.

 Derive task attempt JVM max heap size and io.sort.mb automatically from 
 mapreduce.*.memory.mb
 -

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, 
 MAPREDUCE-5785.v03.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry

2014-07-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060941#comment-14060941
 ] 

Jason Lowe commented on MAPREDUCE-5956:
---

This is marked as a Blocker for 2.5.0 but fixed in 2.6.0.  Should this be 
committed to branch-2.5 as well?

 MapReduce AM should not use maxAttempts to determine if this is the last retry
 --

 Key: MAPREDUCE-5956
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 2.4.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.6.0

 Attachments: MR-5956.patch, MR-5956.patch


 Found this while reviewing YARN-2074. The problem is that after YARN-2074, we 
 don't count AM preemption towards AM failures on RM side, but MapReduce AM 
 itself checks the attempt id against the max-attempt count to determine if 
 this is the last attempt.
 {code}
 public void computeIsLastAMRetry() {
   isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts;
 }
 {code}
 This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used

2014-07-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061245#comment-14061245
 ] 

Sangjin Lee commented on MAPREDUCE-5957:


I am working on a unit test that can confirm the bug and the fix. I'll submit a 
new patch with the unit test soon.

In the meantime, I would greatly appreciate your feedback on this. The main 
code changes are not affected by the unit test change.

 AM throws ClassNotFoundException with job classloader enabled if custom 
 output format/committer is used
 ---

 Key: MAPREDUCE-5957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: MAPREDUCE-5957.patch


 With the job classloader enabled, the MR AM throws ClassNotFoundException if 
 a custom output format class is specified.
 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
 Class com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
   at 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
   ... 8 more
 Caused by: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
   ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5951) Add support for the YARN Shared Cache

2014-07-14 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated MAPREDUCE-5951:


Attachment: MAPREDUCE-5951-trunk-v2.patch

Attached is a v2 patch. This fixes a bug around uploading the jobjar.

 Add support for the YARN Shared Cache
 -

 Key: MAPREDUCE-5951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: MAPREDUCE-5951-trunk-v1.patch, 
 MAPREDUCE-5951-trunk-v2.patch


 Implement the necessary changes so that the MapReduce application can 
 leverage the new YARN shared cache (i.e. YARN-1492).
 Specifically, allow per-job configuration so that MapReduce jobs can specify 
 which set of resources they would like to cache (i.e. jobjar, libjars, 
 archives, files).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5952) LocalContainerLauncher#renameMapOutputForReduce incorrectly assumes a single dir for mapOutIndex

2014-07-14 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061398#comment-14061398
 ] 

Karthik Kambatla commented on MAPREDUCE-5952:
-

Looks like the new patch moves the method to a different line number in the 
file. Can we keep it in place for better history tracking? Sorry for the 
additional inconvenience. 

 LocalContainerLauncher#renameMapOutputForReduce incorrectly assumes a single 
 dir for mapOutIndex
 

 Key: MAPREDUCE-5952
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5952
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: MAPREDUCE-5952.v01.patch, MAPREDUCE-5952.v02.patch, 
 MAPREDUCE-5952.v03.patch


 The javadoc comment for {{renameMapOutputForReduce}} incorrectly refers to a 
 single map output directory, whereas this depends on LOCAL_DIRS.
 mapOutIndex should be set to subMapOutputFile.getOutputIndexFile()
 {code}
 2014-06-30 14:48:35,574 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.io.FileNotFoundException: File 
 /Users/gshegalov/workspace/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapreduce.v2.TestMRJobs/org.apache.hadoop.mapreduce.v2.
   
 TestMRJobs-localDir-nm-2_3/usercache/gshegalov/appcache/application_1404164272885_0001/output/file.out.index
  does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:517)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:726)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:507)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
   
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
   
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:334)   
  
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:504)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.renameMapOutputForReduce(LocalContainerLauncher.java:471)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:292)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:178)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:221)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)  
   
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   
   
   at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
   
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:695) 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used

2014-07-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated MAPREDUCE-5957:
---

Status: Open  (was: Patch Available)

 AM throws ClassNotFoundException with job classloader enabled if custom 
 output format/committer is used
 ---

 Key: MAPREDUCE-5957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch


 With the job classloader enabled, the MR AM throws ClassNotFoundException if 
 a custom output format class is specified.
 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
 Class com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
   at 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
   ... 8 more
 Caused by: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
   ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used

2014-07-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated MAPREDUCE-5957:
---

Attachment: MAPREDUCE-5957.patch

 AM throws ClassNotFoundException with job classloader enabled if custom 
 output format/committer is used
 ---

 Key: MAPREDUCE-5957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch


 With the job classloader enabled, the MR AM throws ClassNotFoundException if 
 a custom output format class is specified.
 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
 Class com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
   at 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
   ... 8 more
 Caused by: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
   ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used

2014-07-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061424#comment-14061424
 ] 

Sangjin Lee commented on MAPREDUCE-5957:


Updated the patch with the unit test.

 AM throws ClassNotFoundException with job classloader enabled if custom 
 output format/committer is used
 ---

 Key: MAPREDUCE-5957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch


 With the job classloader enabled, the MR AM throws ClassNotFoundException if 
 a custom output format class is specified.
 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
 Class com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
   at 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
   ... 8 more
 Caused by: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
   ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used

2014-07-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated MAPREDUCE-5957:
---

Status: Patch Available  (was: Open)

 AM throws ClassNotFoundException with job classloader enabled if custom 
 output format/committer is used
 ---

 Key: MAPREDUCE-5957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch


 With the job classloader enabled, the MR AM throws ClassNotFoundException if 
 a custom output format class is specified.
 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
 Class com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
   at 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
   ... 8 more
 Caused by: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
   ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used

2014-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061542#comment-14061542
 ] 

Hadoop QA commented on MAPREDUCE-5957:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655644/MAPREDUCE-5957.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesAttempts
  
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs
  
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesTasks
  
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobConf
  org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServices

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4731//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4731//console

This message is automatically generated.

 AM throws ClassNotFoundException with job classloader enabled if custom 
 output format/committer is used
 ---

 Key: MAPREDUCE-5957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch


 With the job classloader enabled, the MR AM throws ClassNotFoundException if 
 a custom output format class is specified.
 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
 Class com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
   at 
 org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
   ... 8 more
 Caused by: java.lang.ClassNotFoundException: Class 
 com.foo.test.TestOutputFormat not found
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
   at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
   ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread zhihai xu (JIRA)
zhihai xu created MAPREDUCE-5969:


 Summary: Private non-Archive Files' size add twice in Distributed 
Cache directory size calculation.
 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu


Private non-Archive Files' size add twice in Distributed Cache directory size 
calculation. Private non-Archive Files list is passed in by -files command 
line option. The Distributed Cache directory size is used to check whether the 
total cache files size exceed the cache size limitation,  the default cache 
size limitation is 10G.
I add log in addCacheInfoUpdate and setSize in 
TrackerDistributedCacheManager.java.
I use the following command to test:
hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar 
/tmp/zxu/test_in/ /tmp/zxu/test_out
to add two files into distributed cache:WordCount.java and wordcount.jar.
WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
bytes. The total should be 6260.
The log show these files size added twice:
add one time before download to local node and add second time after download 
to local node, so total file number becomes 4 instead of 2:
addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
In the code, for Private non-Archive File, the first time we add file size is 
at 
getLocalCache:
if (!isArchive) {
  //for private archives, the lengths come over RPC from the 
  //JobLocalizer since the JobLocalizer is the one who expands
  //archives and gets the total length
  lcacheStatus.size = fileStatus.getLen();

  LOG.info(getLocalCache: + localizedPath +  size = 
  + lcacheStatus.size);
  // Increase the size and sub directory count of the cache
  // from baseDirSize and baseDirNumberSubDir.
  baseDirManager.addCacheInfoUpdate(lcacheStatus);
}
The second time we add file size is at 
setSize:
  synchronized (status) {
status.size = size;
baseDirManager.addCacheInfoUpdate(status);
  }
The fix is not to add the file size for for Private non-Archive File after 
download(downloadCacheObject).




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Status: Patch Available  (was: Open)

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 The second time we add file size is at 
 setSize:
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: MAPREDUCE-5969.branch1.patch

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 The second time we add file size is at 
 setSize:
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061554#comment-14061554
 ] 

zhihai xu commented on MAPREDUCE-5969:
--

I submit patch for review:
I tested this patch. The following log show the total file size in the cache is 
 6260 and total file number in cache is 2.
, Which is correct. Without this patch,  the total file size in the cache is  
12588 and total file number in cache is 4.
addCacheInfoUpdate size: 2395 num: 1 
addCacheInfoUpdate size: 6260 num: 2 

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 The second time we add file size is at 
 setSize:
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061555#comment-14061555
 ] 

Hadoop QA commented on MAPREDUCE-5969:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655668/MAPREDUCE-5969.branch1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4732//console

This message is automatically generated.

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 The second time we add file size is at 
 setSize:
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5952) LocalContainerLauncher#renameMapOutputForReduce incorrectly assumes a single dir for mapOutIndex

2014-07-14 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-5952:
-

Attachment: MAPREDUCE-5952.v04.patch

[~kasha], the reason the method was moved because it's static in nature (does 
not need any instance fields) and I needed to make it static for easier 
testability. {{EventHandler}} is not a static inner class. Making it static 
will also move other methods around. Here is v04 that avoids moving 
{{renameMapOutputForReduce}} but it will remain an instance method.

 LocalContainerLauncher#renameMapOutputForReduce incorrectly assumes a single 
 dir for mapOutIndex
 

 Key: MAPREDUCE-5952
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5952
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: MAPREDUCE-5952.v01.patch, MAPREDUCE-5952.v02.patch, 
 MAPREDUCE-5952.v03.patch, MAPREDUCE-5952.v04.patch


 The javadoc comment for {{renameMapOutputForReduce}} incorrectly refers to a 
 single map output directory, whereas this depends on LOCAL_DIRS.
 mapOutIndex should be set to subMapOutputFile.getOutputIndexFile()
 {code}
 2014-06-30 14:48:35,574 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.io.FileNotFoundException: File 
 /Users/gshegalov/workspace/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapreduce.v2.TestMRJobs/org.apache.hadoop.mapreduce.v2.
   
 TestMRJobs-localDir-nm-2_3/usercache/gshegalov/appcache/application_1404164272885_0001/output/file.out.index
  does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:517)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:726)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:507)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
   
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
   
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:334)   
  
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:504)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.renameMapOutputForReduce(LocalContainerLauncher.java:471)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:292)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:178)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:221)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)  
   
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   
   
   at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
   
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:695) 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5952) LocalContainerLauncher#renameMapOutputForReduce incorrectly assumes a single dir for mapOutIndex

2014-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061615#comment-14061615
 ] 

Hadoop QA commented on MAPREDUCE-5952:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655673/MAPREDUCE-5952.v04.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app:

  
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesAttempts
  
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs
  
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesTasks
  
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobConf
  org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServices

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4733//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4733//console

This message is automatically generated.

 LocalContainerLauncher#renameMapOutputForReduce incorrectly assumes a single 
 dir for mapOutIndex
 

 Key: MAPREDUCE-5952
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5952
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: MAPREDUCE-5952.v01.patch, MAPREDUCE-5952.v02.patch, 
 MAPREDUCE-5952.v03.patch, MAPREDUCE-5952.v04.patch


 The javadoc comment for {{renameMapOutputForReduce}} incorrectly refers to a 
 single map output directory, whereas this depends on LOCAL_DIRS.
 mapOutIndex should be set to subMapOutputFile.getOutputIndexFile()
 {code}
 2014-06-30 14:48:35,574 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.io.FileNotFoundException: File 
 /Users/gshegalov/workspace/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapreduce.v2.TestMRJobs/org.apache.hadoop.mapreduce.v2.
   
 TestMRJobs-localDir-nm-2_3/usercache/gshegalov/appcache/application_1404164272885_0001/output/file.out.index
  does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:517)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:726)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:507)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
   
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
   
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:334)   
  
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:504)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.renameMapOutputForReduce(LocalContainerLauncher.java:471)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:292)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:178)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:221)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)  
   
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   
   
   at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
   
   at 
 

[jira] [Commented] (MAPREDUCE-5790) Default map hprof profile options do not work

2014-07-14 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061657#comment-14061657
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5790:


Looks good, +1. Checking this in..

 Default map hprof profile options do not work
 -

 Key: MAPREDUCE-5790
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5790
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
 Environment: java version 1.6.0_31
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Reporter: Andrew Wang
Assignee: Gera Shegalov
Priority: Blocker
 Attachments: MAPREDUCE-5790.v01.patch, MAPREDUCE-5790.v02.patch


 I have an MR job doing the following:
 {code}
 Job job = Job.getInstance(conf);
 // Enable profiling
 job.setProfileEnabled(true);
 job.setProfileTaskRange(true, 0);
 job.setProfileTaskRange(false, 0);
 {code}
 When I run this job, some of my map tasks fail with this error message:
 {noformat}
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /data/5/yarn/nm/usercache/hdfs/appcache/application_1394482121761_0012/container_1394482121761_0012_01_41/launch_container.sh:
  line 32: $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN   -Xmx825955249 -Djava.io.tmpdir=$PWD/tmp 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41
  -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
 ${mapreduce.task.profile.params} org.apache.hadoop.mapred.YarnChild 
 10.20.212.12 43135 attempt_1394482121761_0012_r_00_0 41 
 1/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stdout
  
 2/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stderr
  : bad substitution
 {noformat}
 It looks like ${mapreduce.task.profile.params} is not getting subbed in 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5790) Default map hprof profile options do not work

2014-07-14 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5790:
---

   Resolution: Fixed
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2 and branch-2.5. Thanks Gera!

 Default map hprof profile options do not work
 -

 Key: MAPREDUCE-5790
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5790
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
 Environment: java version 1.6.0_31
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Reporter: Andrew Wang
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5790.v01.patch, MAPREDUCE-5790.v02.patch


 I have an MR job doing the following:
 {code}
 Job job = Job.getInstance(conf);
 // Enable profiling
 job.setProfileEnabled(true);
 job.setProfileTaskRange(true, 0);
 job.setProfileTaskRange(false, 0);
 {code}
 When I run this job, some of my map tasks fail with this error message:
 {noformat}
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /data/5/yarn/nm/usercache/hdfs/appcache/application_1394482121761_0012/container_1394482121761_0012_01_41/launch_container.sh:
  line 32: $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN   -Xmx825955249 -Djava.io.tmpdir=$PWD/tmp 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41
  -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
 ${mapreduce.task.profile.params} org.apache.hadoop.mapred.YarnChild 
 10.20.212.12 43135 attempt_1394482121761_0012_r_00_0 41 
 1/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stdout
  
 2/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stderr
  : bad substitution
 {noformat}
 It looks like ${mapreduce.task.profile.params} is not getting subbed in 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5790) Default map hprof profile options do not work

2014-07-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061664#comment-14061664
 ] 

Hudson commented on MAPREDUCE-5790:
---

FAILURE: Integrated in Hadoop-trunk-Commit #5882 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5882/])
MAPREDUCE-5790. Made it easier to enable hprof profile options by default. 
Contributed by Gera Shegalov. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610578)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/conf/TestJobConf.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestMRJobsWithProfiler.java


 Default map hprof profile options do not work
 -

 Key: MAPREDUCE-5790
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5790
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
 Environment: java version 1.6.0_31
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Reporter: Andrew Wang
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5790.v01.patch, MAPREDUCE-5790.v02.patch


 I have an MR job doing the following:
 {code}
 Job job = Job.getInstance(conf);
 // Enable profiling
 job.setProfileEnabled(true);
 job.setProfileTaskRange(true, 0);
 job.setProfileTaskRange(false, 0);
 {code}
 When I run this job, some of my map tasks fail with this error message:
 {noformat}
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /data/5/yarn/nm/usercache/hdfs/appcache/application_1394482121761_0012/container_1394482121761_0012_01_41/launch_container.sh:
  line 32: $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN   -Xmx825955249 -Djava.io.tmpdir=$PWD/tmp 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41
  -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
 ${mapreduce.task.profile.params} org.apache.hadoop.mapred.YarnChild 
 10.20.212.12 43135 attempt_1394482121761_0012_r_00_0 41 
 1/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stdout
  
 2/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stderr
  : bad substitution
 {noformat}
 It looks like ${mapreduce.task.profile.params} is not getting subbed in 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5918) LineRecordReader can return the same decompressor to CodecPool multiple times

2014-07-14 Thread Sergey Murylev (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061704#comment-14061704
 ] 

Sergey Murylev commented on MAPREDUCE-5918:
---

Can anyone of committers review this bug fix?

 LineRecordReader can return the same decompressor to CodecPool multiple times
 -

 Key: MAPREDUCE-5918
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5918
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Sergey Murylev
Assignee: Sergey Murylev
Priority: Minor
 Fix For: trunk

 Attachments: MAPREDUCE-5918.1.patch


 LineRecordReader can return the same decompressor to CodecPool multiple times 
 if method close() called multiple times. In this case CodecPool doesn't 
 guarantee that it always return different decompressors. This issue can cause 
 some difficult reproducible and difficult diagnosable bugs in Hadoop based 
 programs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)