[jira] [Assigned] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-02 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned MAPREDUCE-6304:


Assignee: Naganarasimha G R

> Specifying node labels when submitting MR jobs
> --
>
> Key: MAPREDUCE-6304
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jian Fang
>Assignee: Naganarasimha G R
>
> Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
> node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394020#comment-14394020
 ] 

Hadoop QA commented on MAPREDUCE-5465:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12709076/MAPREDUCE-5465-9.patch
  against trunk revision 6a6a59d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1152 javac 
compiler warnings (more than the trunk's current 1151 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapred.pipes.TestPipeApplication
  org.apache.hadoop.mapred.TestMRTimelineEventHandling
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
  org.apache.hadoop.mapred.TestClusterMRNotification

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5370//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5370//artifact/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5370//console

This message is automatically generated.

> Container killed before hprof dumps profile.out
> ---
>
> Key: MAPREDUCE-5465
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am, mrv2
>Reporter: Radim Kolar
>Assignee: Ming Ma
> Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, 
> MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, 
> MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, 
> MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps 
> profile.out at process exit. It is dumped after task signaled to AM that work 
> is finished.
> AM kills container with finished work without waiting for hprof to finish 
> dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
> works) , it could not finish dump in time before being killed making entire 
> dump unusable because cpu and heap stats are missing.
> There needs to be better delay before container is killed if profiling is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out

2015-04-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393533#comment-14393533
 ] 

Ray Chiang commented on MAPREDUCE-5465:
---

Great!  Thanks!

> Container killed before hprof dumps profile.out
> ---
>
> Key: MAPREDUCE-5465
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am, mrv2
>Reporter: Radim Kolar
>Assignee: Ming Ma
> Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, 
> MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, 
> MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, 
> MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps 
> profile.out at process exit. It is dumped after task signaled to AM that work 
> is finished.
> AM kills container with finished work without waiting for hprof to finish 
> dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
> works) , it could not finish dump in time before being killed making entire 
> dump unusable because cpu and heap stats are missing.
> There needs to be better delay before container is killed if profiling is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6266) Job#getTrackingURL should consistently return a proper URL

2015-04-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393523#comment-14393523
 ] 

Robert Kanter commented on MAPREDUCE-6266:
--

The 003 patch looks good to me.  +1

[~djp], any additional comments?

> Job#getTrackingURL should consistently return a proper URL
> --
>
> Key: MAPREDUCE-6266
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6266
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: supportability
> Attachments: MAPREDUCE-6266.001.patch, MAPREDUCE-6266.002.patch, 
> MAPREDUCE-6266.003.patch
>
>
> When a job is running, Job#getTrackingURL returns a proper URL like:
> http://:8088/proxy/application_1424910897258_0004/
> Once a job is finished and the job has moved to the JHS, then 
> Job#getTrackingURL returns a URL without the protocol like:
> :19888/jobhistory/job/job_1424910897258_0004



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5465) Container killed before hprof dumps profile.out

2015-04-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated MAPREDUCE-5465:
---
Attachment: MAPREDUCE-5465-9.patch

Ray, here is the rebased patch.

> Container killed before hprof dumps profile.out
> ---
>
> Key: MAPREDUCE-5465
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am, mrv2
>Reporter: Radim Kolar
>Assignee: Ming Ma
> Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, 
> MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, 
> MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, 
> MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps 
> profile.out at process exit. It is dumped after task signaled to AM that work 
> is finished.
> AM kills container with finished work without waiting for hprof to finish 
> dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
> works) , it could not finish dump in time before being killed making entire 
> dump unusable because cpu and heap stats are missing.
> There needs to be better delay before container is killed if profiling is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393404#comment-14393404
 ] 

Hadoop QA commented on MAPREDUCE-5799:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635312/MAPREDUCE-5799.diff
  against trunk revision 6a6a59d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5369//console

This message is automatically generated.

> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2015-04-02 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393297#comment-14393297
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-5799:
--

I have this problem in non-uber mode too:

15/04/02 14:07:30 INFO mapreduce.Job: Task Id : 
attempt_1426201417905_0002_m_00_1, Status : FAILED
Error: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
Method)
at 
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at 
org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:114)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:97)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1602)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:873)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1525)


> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2015-04-02 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393295#comment-14393295
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-5799:
--

I have this problem in non-uber mode too:

15/04/02 14:07:30 INFO mapreduce.Job: Task Id : 
attempt_1426201417905_0002_m_00_1, Status : FAILED
Error: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
Method)
at 
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at 
org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:114)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:97)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1602)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:873)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1525)


> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393201#comment-14393201
 ] 

Hadoop QA commented on MAPREDUCE-6305:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12709031/MAPREDUCE-6305.v1.patch
  against trunk revision 96649c3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5368//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5368//console

This message is automatically generated.

> AM/Task log page should be able to link back to the job
> ---
>
> Key: MAPREDUCE-6305
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: MAPREDUCE-6305.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393202#comment-14393202
 ] 

Hudson commented on MAPREDUCE-6303:
---

FAILURE: Integrated in Hadoop-trunk-Commit #7496 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7496/])
MAPREDUCE-6303. Read timeout when retrying a fetch error can be fatal to a 
reducer. Contributed by Jason Lowe. (junping_du: rev 
eccb7d46efbf07abcc6a01bd5e7d682f6815b824)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java


> Read timeout when retrying a fetch error can be fatal to a reducer
> --
>
> Key: MAPREDUCE-6303
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters 
> a read timeout when trying to re-establish the connection then the reducer 
> can fail.  The read timeout exception can leak to the top of the Fetcher 
> thread which will cause the reduce task to teardown.  This type of error can 
> repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-04-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6303:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I have commit this patch to trunk, branch-2 and branch-2.7. Thanks [~jlowe] for 
contributing the patch!

> Read timeout when retrying a fetch error can be fatal to a reducer
> --
>
> Key: MAPREDUCE-6303
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters 
> a read timeout when trying to re-establish the connection then the reducer 
> can fail.  The read timeout exception can leak to the top of the Fetcher 
> thread which will cause the reduce task to teardown.  This type of error can 
> repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6297) Task Id of the failed task in diagnostics should link to the task page

2015-04-02 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393152#comment-14393152
 ] 

Siqi Li commented on MAPREDUCE-6297:


[~jira.shegalov], I have modified TaskID.forName to use the same regex in patch 
v4. 

> Task Id of the failed task in diagnostics should link to the task page
> --
>
> Key: MAPREDUCE-6297
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6297
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.6.0
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Minor
> Attachments: 58CCA024-7455-4A87-BCFD-C88054FF841B.png, 
> MAPREDUCE-6297.v1.patch, MAPREDUCE-6297.v2.patch, MAPREDUCE-6297.v3.patch, 
> MAPREDUCE-6297.v4.patch
>
>
> Currently we have to copy it and search in the task list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job

2015-04-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-6305:
---
Status: Patch Available  (was: Open)

> AM/Task log page should be able to link back to the job
> ---
>
> Key: MAPREDUCE-6305
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: MAPREDUCE-6305.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job

2015-04-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-6305:
---
Attachment: MAPREDUCE-6305.v1.patch

> AM/Task log page should be able to link back to the job
> ---
>
> Key: MAPREDUCE-6305
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: MAPREDUCE-6305.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6297) Task Id of the failed task in diagnostics should link to the task page

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393145#comment-14393145
 ] 

Hadoop QA commented on MAPREDUCE-6297:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12709018/MAPREDUCE-6297.v4.patch
  against trunk revision 9ed43f2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5367//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5367//console

This message is automatically generated.

> Task Id of the failed task in diagnostics should link to the task page
> --
>
> Key: MAPREDUCE-6297
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6297
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.6.0
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Minor
> Attachments: 58CCA024-7455-4A87-BCFD-C88054FF841B.png, 
> MAPREDUCE-6297.v1.patch, MAPREDUCE-6297.v2.patch, MAPREDUCE-6297.v3.patch, 
> MAPREDUCE-6297.v4.patch
>
>
> Currently we have to copy it and search in the task list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job

2015-04-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reassigned MAPREDUCE-6305:
--

Assignee: Siqi Li

> AM/Task log page should be able to link back to the job
> ---
>
> Key: MAPREDUCE-6305
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Siqi Li
>Assignee: Siqi Li
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job

2015-04-02 Thread Siqi Li (JIRA)
Siqi Li created MAPREDUCE-6305:
--

 Summary: AM/Task log page should be able to link back to the job
 Key: MAPREDUCE-6305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6297) Task Id of the failed task in diagnostics should link to the task page

2015-04-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-6297:
---
Attachment: MAPREDUCE-6297.v4.patch

> Task Id of the failed task in diagnostics should link to the task page
> --
>
> Key: MAPREDUCE-6297
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6297
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.6.0
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Minor
> Attachments: 58CCA024-7455-4A87-BCFD-C88054FF841B.png, 
> MAPREDUCE-6297.v1.patch, MAPREDUCE-6297.v2.patch, MAPREDUCE-6297.v3.patch, 
> MAPREDUCE-6297.v4.patch
>
>
> Currently we have to copy it and search in the task list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-04-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392987#comment-14392987
 ] 

Junping Du commented on MAPREDUCE-6303:
---

+1. Patch LGTM. Will commit it shortly.

> Read timeout when retrying a fetch error can be fatal to a reducer
> --
>
> Key: MAPREDUCE-6303
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters 
> a read timeout when trying to re-establish the connection then the reducer 
> can fail.  The read timeout exception can leak to the top of the Fetcher 
> thread which will cause the reduce task to teardown.  This type of error can 
> repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-04-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392805#comment-14392805
 ] 

Junping Du commented on MAPREDUCE-6303:
---

bq. Junping Du, I would appreciate it if you could take a look. Thanks!
Sure. I am looking at it now. Thanks!

> Read timeout when retrying a fetch error can be fatal to a reducer
> --
>
> Key: MAPREDUCE-6303
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters 
> a read timeout when trying to re-establish the connection then the reducer 
> can fail.  The read timeout exception can leak to the top of the Fetcher 
> thread which will cause the reduce task to teardown.  This type of error can 
> repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392757#comment-14392757
 ] 

Hadoop QA commented on MAPREDUCE-6303:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12708981/MAPREDUCE-6303.001.patch
  against trunk revision 867d5d2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5366//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5366//console

This message is automatically generated.

> Read timeout when retrying a fetch error can be fatal to a reducer
> --
>
> Key: MAPREDUCE-6303
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters 
> a read timeout when trying to re-establish the connection then the reducer 
> can fail.  The read timeout exception can leak to the top of the Fetcher 
> thread which will cause the reduce task to teardown.  This type of error can 
> repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6302) deadlock in a job between map and reduce cores allocation

2015-04-02 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392761#comment-14392761
 ] 

mai shurong commented on MAPREDUCE-6302:


Sorry, I made a clerical mistake in the comment. The  
mapreduce.job.reduce.slowstart.completedmaps is 1.0.
When I set parameter mapreduce.job.reduce.slowstart.completedmaps to 1.0, jobs 
could always run successfully and the deadlock didnot happen any more.



> deadlock in a job between map and reduce cores allocation 
> --
>
> Key: MAPREDUCE-6302
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: mai shurong
>Priority: Critical
> Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
> queue_with_max163cores.png, queue_with_max263cores.png, 
> queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-04-02 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6303:
--
Status: Patch Available  (was: Open)

> Read timeout when retrying a fetch error can be fatal to a reducer
> --
>
> Key: MAPREDUCE-6303
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters 
> a read timeout when trying to re-establish the connection then the reducer 
> can fail.  The read timeout exception can leak to the top of the Fetcher 
> thread which will cause the reduce task to teardown.  This type of error can 
> repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-04-02 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6303:
--
Attachment: MAPREDUCE-6303.001.patch

Patch that treats errors from setupConnectionsWithRetry when trying to 
re-establish the connection the same way we do errors when trying to establish 
the initial connection.  Added a unit test that fails without the fix and 
passes afterwards.

[~djp], I would appreciate it if you could take a look.  Thanks!


> Read timeout when retrying a fetch error can be fatal to a reducer
> --
>
> Key: MAPREDUCE-6303
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters 
> a read timeout when trying to re-establish the connection then the reducer 
> can fail.  The read timeout exception can leak to the top of the Fetcher 
> thread which will cause the reduce task to teardown.  This type of error can 
> repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4857) Fix 126 error during map/reduce phase

2015-04-02 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-4857:
---
  Resolution: Not a Problem
   Fix Version/s: (was: 1.0.4)
Target Version/s:   (was: 1.0.4)
  Status: Resolved  (was: Patch Available)

Does not appear like we're planning any 1.0.x (vs. 1.1.x or 1.2.x) releases 
anymore at this point, so am closing this out. Feel free to reopen though if I 
have missed something.

> Fix 126 error during map/reduce phase
> -
>
> Key: MAPREDUCE-4857
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4857
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.4
>Reporter: Fengdong Yu
> Attachments: MAPREDUCE-4857.patch
>
>
> There is rare happenings during map or reduce phase, but mostly in map phase, 
> the Exception messages: 
> java.lang.Throwable: Child Error
>   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.io.IOException: Task process exit with nonzero status of 126.
>   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> and error logs are cleaned, so It's very hard to debug.
> but I compared DefaultTaskController.java with 0.22, they use "bash command" 
> to start the job scritp, but 1.0.4 use "bash, "-c", command".
> I removed "-c", everything is ok, 126 error code never happen again.
> I read man document of bash, it indicates when fork a new thread with write 
> command, another thread with "bash -c" also has a writable fd. so I think it 
> could return 126 status occasionally.
> So, there is only one line fix for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6302) deadlock in a job between map and reduce cores allocation

2015-04-02 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392266#comment-14392266
 ] 

mai shurong commented on MAPREDUCE-6302:


When I set  parameter mapreduce.job.reduce.slowstart.completedmaps to 0.5, jobs 
could always run successfully and the deadlock didnot happen any more. 

> deadlock in a job between map and reduce cores allocation 
> --
>
> Key: MAPREDUCE-6302
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: mai shurong
>Priority: Critical
> Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
> queue_with_max163cores.png, queue_with_max263cores.png, 
> queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)