[jira] [Commented] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966256#comment-14966256
 ] 

Hudson commented on MAPREDUCE-6495:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #2457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2457/])
MAPREDUCE-6495. Docs for archive-logs tool (rkanter) (rkanter: rev 
0c4af0f99811a7138954391df3761aef9ff09155)
* 
hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
* hadoop-project/src/site/site.xml
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/MapredCommands.md
* hadoop-tools/hadoop-archive-logs/src/site/markdown/HadoopArchiveLogs.md
* hadoop-tools/hadoop-archive-logs/src/site/resources/css/site.css
* hadoop-mapreduce-project/CHANGES.txt


> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966235#comment-14966235
 ] 

Hudson commented on MAPREDUCE-6495:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #520 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/520/])
MAPREDUCE-6495. Docs for archive-logs tool (rkanter) (rkanter: rev 
0c4af0f99811a7138954391df3761aef9ff09155)
* 
hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
* hadoop-mapreduce-project/CHANGES.txt
* hadoop-tools/hadoop-archive-logs/src/site/markdown/HadoopArchiveLogs.md
* hadoop-tools/hadoop-archive-logs/src/site/resources/css/site.css
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/MapredCommands.md
* hadoop-project/src/site/site.xml


> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966229#comment-14966229
 ] 

Hudson commented on MAPREDUCE-6495:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #561 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/561/])
MAPREDUCE-6495. Docs for archive-logs tool (rkanter) (rkanter: rev 
0c4af0f99811a7138954391df3761aef9ff09155)
* hadoop-tools/hadoop-archive-logs/src/site/resources/css/site.css
* hadoop-mapreduce-project/CHANGES.txt
* hadoop-tools/hadoop-archive-logs/src/site/markdown/HadoopArchiveLogs.md
* hadoop-project/src/site/site.xml
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/MapredCommands.md
* 
hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java


> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966170#comment-14966170
 ] 

Hudson commented on MAPREDUCE-6495:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2508 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2508/])
MAPREDUCE-6495. Docs for archive-logs tool (rkanter) (rkanter: rev 
0c4af0f99811a7138954391df3761aef9ff09155)
* hadoop-tools/hadoop-archive-logs/src/site/markdown/HadoopArchiveLogs.md
* hadoop-tools/hadoop-archive-logs/src/site/resources/css/site.css
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/MapredCommands.md
* 
hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
* hadoop-project/src/site/site.xml
* hadoop-mapreduce-project/CHANGES.txt


> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966156#comment-14966156
 ] 

Hudson commented on MAPREDUCE-6495:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #1296 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1296/])
MAPREDUCE-6495. Docs for archive-logs tool (rkanter) (rkanter: rev 
0c4af0f99811a7138954391df3761aef9ff09155)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/MapredCommands.md
* hadoop-mapreduce-project/CHANGES.txt
* hadoop-tools/hadoop-archive-logs/src/site/markdown/HadoopArchiveLogs.md
* 
hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
* hadoop-tools/hadoop-archive-logs/src/site/resources/css/site.css
* hadoop-project/src/site/site.xml


> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6512) FileOutputCommitter tasks unconditionally create parent directories

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966138#comment-14966138
 ] 

Hadoop QA commented on MAPREDUCE-6512:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 49s | The applied patch generated  1 
new checkstyle issues (total was 140, now 140). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   8m 16s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | mapreduce tests |   1m 54s | Tests failed in 
hadoop-mapreduce-client-core. |
| {color:red}-1{color} | mapreduce tests | 148m 19s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| | | 206m  7s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.fs.TestFilterFileSystem |
|   | hadoop.fs.TestHarFileSystem |
|   | hadoop.mapred.TestFileOutputCommitter |
|   | hadoop.mapred.TestMRIntermediateDataEncryption |
|   | hadoop.mapred.TestReduceFetch |
|   | hadoop.mapred.join.TestDatamerge |
|   | hadoop.mapred.jobcontrol.TestLocalJobControl |
|   | hadoop.mapred.TestMultipleTextOutputFormat |
|   | hadoop.mapred.lib.TestKeyFieldBasedComparator |
|   | hadoop.conf.TestNoDefaultsJobConf |
|   | hadoop.mapred.TestMRTimelineEventHandling |
|   | hadoop.mapred.TestLazyOutput |
|   | hadoop.mapred.TestJobCleanup |
|   | hadoop.mapred.TestJobCounters |
|   | hadoop.mapred.TestJobName |
|   | hadoop.mapred.TestReporter |
|   | hadoop.mapred.TestCollect |
|   | hadoop.fs.slive.TestSlive |
|   | hadoop.fs.TestDFSIO |
|   | hadoop.mapred.lib.TestChainMapReduce |
|   | hadoop.mapred.pipes.TestPipeApplication |
|   | hadoop.mapred.TestMerge |
|   | hadoop.mapred.TestUserDefinedCounters |
|   | hadoop.mapred.lib.TestMultithreadedMapRunner |
|   | hadoop.mapred.TestReduceFetchFromPartialMem |
|   | hadoop.mapred.lib.TestMultipleOutputs |
|   | hadoop.mapred.lib.aggregate.TestAggregates |
|   | hadoop.mapred.TestClusterMapReduceTestCase |
|   | hadoop.mapred.TestMRCJCFileOutputCommitter |
|   | hadoop.mapred.TestFieldSelection |
|   | hadoop.mapred.TestMapRed |
|   | hadoop.mapred.TestClusterMRNotification |
|   | hadoop.mapred.TestOldCombinerGrouping |
|   | hadoop.mapred.TestTaskCommit |
|   | hadoop.mapred.TestJavaSerialization |
|   | hadoop.mapred.TestNetworkedJob |
|   | hadoop.mapred.TestLineRecordReaderJobs |
|   | hadoop.mapred.TestLocalMRNotification |
|   | hadoop.fs.TestFileSystem |
|   | hadoop.mapred.TestSpecialCharactersInOutputPath |
|   | hadoop.mapred.TestMiniMRChildTask |
| Timed out tests | org.apache.hadoop.mapreduce.TestLargeSort |
|   | org.apache.hadoop.mapred.TestJobSysDirWithDFS |
|   | org.apache.hadoop.mapred.TestMiniMRClasspath |
|   | org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767680/MAPREDUCE-6512.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c8b6f3 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6073/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6073/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6073/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6073/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6073/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console

[jira] [Commented] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966132#comment-14966132
 ] 

Hudson commented on MAPREDUCE-6495:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #575 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/575/])
MAPREDUCE-6495. Docs for archive-logs tool (rkanter) (rkanter: rev 
0c4af0f99811a7138954391df3761aef9ff09155)
* hadoop-tools/hadoop-archive-logs/src/site/resources/css/site.css
* 
hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
* hadoop-tools/hadoop-archive-logs/src/site/markdown/HadoopArchiveLogs.md
* hadoop-project/src/site/site.xml
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/MapredCommands.md
* hadoop-mapreduce-project/CHANGES.txt


> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966088#comment-14966088
 ] 

Hadoop QA commented on MAPREDUCE-6489:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 36s | Pre-patch branch-2 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   6m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 48s | The applied patch generated  2 
new checkstyle issues (total was 726, now 724). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 18s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   1m 50s | Tests passed in 
hadoop-mapreduce-client-core. |
| | |  41m 11s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767701/MAPREDUCE-6489-branch-2.003.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | branch-2 / 4921420 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6074/console |


This message was automatically generated.

> Fail fast rogue tasks that write too much to local disk
> ---
>
> Key: MAPREDUCE-6489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.1
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
> Attachments: MAPREDUCE-6489-branch-2.003.patch, 
> MAPREDUCE-6489.001.patch, MAPREDUCE-6489.002.patch, MAPREDUCE-6489.003.patch
>
>
> Tasks of the rogue jobs can write too much to local disk, negatively 
> affecting the jobs running in collocated containers. Ideally YARN will be 
> able to limit amount of local disk used by each task: YARN-4011. Until then, 
> the mapreduce task can fail fast if the task is writing too much (above a 
> configured threshold) to local disk.
> As we discussed 
> [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750]
>  the suggested approach is that the MapReduce task checks for BYTES_WRITTEN 
> counter for the local disk and throws an exception when it goes beyond a 
> configured value.  It is true that written bytes is larger than the actual 
> used disk space, but to detect a rogue task the exact value is not required 
> and a very large value for written bytes to local disk is a good indicative 
> that the task is misbehaving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966050#comment-14966050
 ] 

Hudson commented on MAPREDUCE-6495:
---

FAILURE: Integrated in Hadoop-trunk-Commit #8674 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8674/])
MAPREDUCE-6495. Docs for archive-logs tool (rkanter) (rkanter: rev 
0c4af0f99811a7138954391df3761aef9ff09155)
* hadoop-tools/hadoop-archive-logs/src/site/markdown/HadoopArchiveLogs.md
* hadoop-tools/hadoop-archive-logs/src/site/resources/css/site.css
* 
hadoop-tools/hadoop-archive-logs/src/main/java/org/apache/hadoop/tools/HadoopArchiveLogs.java
* hadoop-project/src/site/site.xml
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/MapredCommands.md
* hadoop-mapreduce-project/CHANGES.txt


> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6495:
-
Fix Version/s: 2.8.0

> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6495:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the review Anubhav.  I made the change to the committed patch.  
Comitted to trunk and branch-2!

> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6495:
-
Attachment: MAPREDUCE-6495.002.patch

The 002 patch has the super minor change, for reference.

> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk

2015-10-20 Thread Maysam Yabandeh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maysam Yabandeh updated MAPREDUCE-6489:
---
Attachment: MAPREDUCE-6489-branch-2.003.patch

Thanks [~jlowe] for the review. I am attaching 
MAPREDUCE-6489-branch-2.003.patch for branch-2.

> Fail fast rogue tasks that write too much to local disk
> ---
>
> Key: MAPREDUCE-6489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.1
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
> Attachments: MAPREDUCE-6489-branch-2.003.patch, 
> MAPREDUCE-6489.001.patch, MAPREDUCE-6489.002.patch, MAPREDUCE-6489.003.patch
>
>
> Tasks of the rogue jobs can write too much to local disk, negatively 
> affecting the jobs running in collocated containers. Ideally YARN will be 
> able to limit amount of local disk used by each task: YARN-4011. Until then, 
> the mapreduce task can fail fast if the task is writing too much (above a 
> configured threshold) to local disk.
> As we discussed 
> [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750]
>  the suggested approach is that the MapReduce task checks for BYTES_WRITTEN 
> counter for the local disk and throws an exception when it goes beyond a 
> configured value.  It is true that written bytes is larger than the actual 
> used disk space, but to detect a rogue task the exact value is not required 
> and a very large value for written bytes to local disk is a good indicative 
> that the task is misbehaving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6495) Docs for archive-logs tool

2015-10-20 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966018#comment-14966018
 ] 

Anubhav Dhoot commented on MAPREDUCE-6495:
--

Looks good other than one reference in handleOpts still says "yarn 
archive-logs" instead of "mapred archive-logs".
+1 pending that



> Docs for archive-logs tool
> --
>
> Key: MAPREDUCE-6495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6495.001.patch
>
>
> Write documentation for the 'mapred archive-logs' tool added in 
> MAPREDUCE-6415.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk

2015-10-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965902#comment-14965902
 ] 

Jason Lowe commented on MAPREDUCE-6489:
---

+1 latest patch looks good to me.  However the patch does not apply cleanly to 
branch-2.  [~maysamyabandeh] could you provide a branch-2 patch as well?

> Fail fast rogue tasks that write too much to local disk
> ---
>
> Key: MAPREDUCE-6489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.1
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
> Attachments: MAPREDUCE-6489.001.patch, MAPREDUCE-6489.002.patch, 
> MAPREDUCE-6489.003.patch
>
>
> Tasks of the rogue jobs can write too much to local disk, negatively 
> affecting the jobs running in collocated containers. Ideally YARN will be 
> able to limit amount of local disk used by each task: YARN-4011. Until then, 
> the mapreduce task can fail fast if the task is writing too much (above a 
> configured threshold) to local disk.
> As we discussed 
> [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750]
>  the suggested approach is that the MapReduce task checks for BYTES_WRITTEN 
> counter for the local disk and throws an exception when it goes beyond a 
> configured value.  It is true that written bytes is larger than the actual 
> used disk space, but to detect a rogue task the exact value is not required 
> and a very large value for written bytes to local disk is a good indicative 
> that the task is misbehaving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6512) FileOutputCommitter tasks unconditionally create parent directories

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated MAPREDUCE-6512:

Attachment: (was: YARN-4269.2.patch)

> FileOutputCommitter tasks unconditionally create parent directories
> ---
>
> Key: MAPREDUCE-6512
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6512
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: MAPREDUCE-6512.patch, MAPREDUCE-6512.patch
>
>
> If the output directory is deleted then subsequent tasks should fail. Instead 
> they blindly create the missing parent directories, leading the job to be 
> "succesful" despite potentially missing almost all of the output. Task 
> attempts should fail if the parent app attempt directory is missing when they 
> go to create their task attempt directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6512) FileOutputCommitter tasks unconditionally create parent directories

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated MAPREDUCE-6512:

Attachment: YARN-4269.2.patch

> FileOutputCommitter tasks unconditionally create parent directories
> ---
>
> Key: MAPREDUCE-6512
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6512
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: MAPREDUCE-6512.patch, MAPREDUCE-6512.patch
>
>
> If the output directory is deleted then subsequent tasks should fail. Instead 
> they blindly create the missing parent directories, leading the job to be 
> "succesful" despite potentially missing almost all of the output. Task 
> attempts should fail if the parent app attempt directory is missing when they 
> go to create their task attempt directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6512) FileOutputCommitter tasks unconditionally create parent directories

2015-10-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated MAPREDUCE-6512:

Attachment: MAPREDUCE-6512.patch

> FileOutputCommitter tasks unconditionally create parent directories
> ---
>
> Key: MAPREDUCE-6512
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6512
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: MAPREDUCE-6512.patch, MAPREDUCE-6512.patch
>
>
> If the output directory is deleted then subsequent tasks should fail. Instead 
> they blindly create the missing parent directories, leading the job to be 
> "succesful" despite potentially missing almost all of the output. Task 
> attempts should fail if the parent app attempt directory is missing when they 
> go to create their task attempt directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6516) JVM reuse in Yarn

2015-10-20 Thread Yingqi Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingqi Lu updated MAPREDUCE-6516:
-
Description: 
Dear All,

Recently, we identified an issue inside Yarn with MapReduce. There is a 
significant amount of time spent in libjvm.so and most of which is compilation. 

Attached is a flame graph (visual call graph) of a query running for about 8 
mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
functions are colored in red. Data show that more than 40% of overall execution 
time is spent in compilation itself, but still a lot of code ran in the 
interpreter mode by looking inside the JVM themselves. In the ideal case, we 
want everything runs with compiled code over and over again. However in 
reality, mappers and reducers are long died before the compilation benefits 
kick in. In other word, we take the performance hit from both compilation and 
interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
was removed in Yarn. We are right now working on a bunch of JVM parameters to 
minimize the impact of the performance, but still think it would be good to 
open a discussion here to seek for more permanent solutions since it ties to 
the nature of how Yarn works. 

We are wondering if any of you have seen this issue before or if there is any 
on-going project already happening to address this? 

Data for this graph was collected across the entire system with multiple JVMs 
running. The workload we use is BigBench workload 
(https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).

Thanks,
Yingqi Lu

  was:
Dear All,

Recently, we identified an issue inside Yarn with MapReduce. There is a 
significant amount of time spent in libjvm.so and most of which is compilation. 

Attached is a flame graph (visual call graph) of a query running for about 8 
mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
functions are colored in red. Data show that more than 40% of overall execution 
time is spent in compilation itself, but still a lot of code ran in the 
interpreter mode by looking inside the JVM themselves. In the ideal case, we 
want everything runs with compiled code over and over again. However in 
reality, mappers and reducers are long died before the compilation benefits 
kick in. In other word, we take the performance hit from both compilation and 
interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
was removed in Yarn. We are right now working on a bunch of JVM parameters to 
minimize the impact of the performance, but still think it would be good to 
open a discussion here to seek for more permanent solutions since it ties to 
the nature of how Yarn works. 

We are wondering if any of you have seen this issue before or if there is any 
on-going project already happening to address this? 

Data for this graph was collected across the entire system with multiple JVMs 
running. The workload we use is BigBench workload 
(https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).

Thanks,
Yingqi Lu

1. Software and workloads used in performance tests may have been optimized for 
performance only on Intel microprocessors. Performance tests, such as SYSmark 
and MobileMark, are measured using specific computer systems, components, 
software, operations and functions. Any change to any of those factors may 
cause the results to vary. You should consult other information and performance 
tests to assist you in fully evaluating your contemplated purchases, including 
the performance of that product when combined with other products.


> JVM reuse in Yarn
> -
>
> Key: MAPREDUCE-6516
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6516
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Yingqi Lu
>  Labels: performance
> Attachments: flamegraph.png
>
>
> Dear All,
> Recently, we identified an issue inside Yarn with MapReduce. There is a 
> significant amount of time spent in libjvm.so and most of which is 
> compilation. 
> Attached is a flame graph (visual call graph) of a query running for about 8 
> mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
> functions are colored in red. Data show that more than 40% of overall 
> execution time is spent in compilation itself, but still a lot of code ran in 
> the interpreter mode by looking inside the JVM themselves. In the ideal case, 
> we want everything runs with compiled code over and over again. However in 
> reality, mappers and reducers are long died before the compilation benefits 
> kick in. In other word, we take the performance hit from both compilation and 
> interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
> was removed in Yarn. We are right now working o

[jira] [Commented] (MAPREDUCE-6516) JVM reuse in Yarn

2015-10-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965842#comment-14965842
 ] 

Jason Lowe commented on MAPREDUCE-6516:
---

Moved this to MAPREDUCE.  JVM reuse is not something YARN does but rather 
something the application framework running on YARN chooses to do.

This is essentially a duplicate of MAPREDUCE-3902, whose effort seems to have 
stalled.  Some of the main contributors to that effort changed their focus to 
Tez which can run many MapReduce jobs as-is and does support container reuse in 
YARN (along with a number of other optimizations).


> JVM reuse in Yarn
> -
>
> Key: MAPREDUCE-6516
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6516
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Yingqi Lu
>  Labels: performance
> Attachments: flamegraph.png
>
>
> Dear All,
> Recently, we identified an issue inside Yarn with MapReduce. There is a 
> significant amount of time spent in libjvm.so and most of which is 
> compilation. 
> Attached is a flame graph (visual call graph) of a query running for about 8 
> mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
> functions are colored in red. Data show that more than 40% of overall 
> execution time is spent in compilation itself, but still a lot of code ran in 
> the interpreter mode by looking inside the JVM themselves. In the ideal case, 
> we want everything runs with compiled code over and over again. However in 
> reality, mappers and reducers are long died before the compilation benefits 
> kick in. In other word, we take the performance hit from both compilation and 
> interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
> was removed in Yarn. We are right now working on a bunch of JVM parameters to 
> minimize the impact of the performance, but still think it would be good to 
> open a discussion here to seek for more permanent solutions since it ties to 
> the nature of how Yarn works. 
> We are wondering if any of you have seen this issue before or if there is any 
> on-going project already happening to address this? 
> Data for this graph was collected across the entire system with multiple JVMs 
> running. The workload we use is BigBench workload 
> (https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).
> Thanks,
> Yingqi Lu
> 1. Software and workloads used in performance tests may have been optimized 
> for performance only on Intel microprocessors. Performance tests, such as 
> SYSmark and MobileMark, are measured using specific computer systems, 
> components, software, operations and functions. Any change to any of those 
> factors may cause the results to vary. You should consult other information 
> and performance tests to assist you in fully evaluating your contemplated 
> purchases, including the performance of that product when combined with other 
> products.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (MAPREDUCE-6516) JVM reuse in Yarn

2015-10-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe moved YARN-4282 to MAPREDUCE-6516:
-

Key: MAPREDUCE-6516  (was: YARN-4282)
Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> JVM reuse in Yarn
> -
>
> Key: MAPREDUCE-6516
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6516
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Yingqi Lu
>  Labels: performance
> Attachments: flamegraph.png
>
>
> Dear All,
> Recently, we identified an issue inside Yarn with MapReduce. There is a 
> significant amount of time spent in libjvm.so and most of which is 
> compilation. 
> Attached is a flame graph (visual call graph) of a query running for about 8 
> mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java 
> functions are colored in red. Data show that more than 40% of overall 
> execution time is spent in compilation itself, but still a lot of code ran in 
> the interpreter mode by looking inside the JVM themselves. In the ideal case, 
> we want everything runs with compiled code over and over again. However in 
> reality, mappers and reducers are long died before the compilation benefits 
> kick in. In other word, we take the performance hit from both compilation and 
> interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it 
> was removed in Yarn. We are right now working on a bunch of JVM parameters to 
> minimize the impact of the performance, but still think it would be good to 
> open a discussion here to seek for more permanent solutions since it ties to 
> the nature of how Yarn works. 
> We are wondering if any of you have seen this issue before or if there is any 
> on-going project already happening to address this? 
> Data for this graph was collected across the entire system with multiple JVMs 
> running. The workload we use is BigBench workload 
> (https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).
> Thanks,
> Yingqi Lu
> 1. Software and workloads used in performance tests may have been optimized 
> for performance only on Intel microprocessors. Performance tests, such as 
> SYSmark and MobileMark, are measured using specific computer systems, 
> components, software, operations and functions. Any change to any of those 
> factors may cause the results to vary. You should consult other information 
> and performance tests to assist you in fully evaluating your contemplated 
> purchases, including the performance of that product when combined with other 
> products.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2015-10-20 Thread chong chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965589#comment-14965589
 ] 

chong chen commented on MAPREDUCE-6513:
---

How to account for task failure vs how to re-schedule tasks are two different 
things? I don't understand why we have to tie these two together. This seems to 
be a design limitation. Clearly, for this case, raising priority is an optimum 
solution. Since AM already finishes ramp up reducer once (651 reducers), to 
repeat that process, you have to ramp down the whole thing and gradually ramp 
up again, which generates another round of communication overhead between AM 
and RM/scheduler. 

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob
>Assignee: Varun Saxena
>Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2015-10-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965309#comment-14965309
 ] 

Sunil G commented on MAPREDUCE-6513:


In my opinion, the failure of node is not an issue caused by Job (or AM). It 
was a case where a node is down due to some other problem (OS bug/some 
maintenance work). I feel its better we do not account such cases as a Task 
fail towards attempt, because it can result to an ultimate Job fail. (a problem 
in cluster/yarn need not have to account towards any job/application failure 
counts)
So if we could handle the bug, like "do not ramp up reducers when there is a 
hanged map" seems more a better approach here. Thoughts?

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob
>Assignee: Varun Saxena
>Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6499) Add More Informations Of Jobs In JobHistory Main Page

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965277#comment-14965277
 ] 

Hadoop QA commented on MAPREDUCE-6499:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 28s | The applied patch generated  
26 new checkstyle issues (total was 19, now 38). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 3  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 51s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests |   5m 52s | Tests failed in 
hadoop-mapreduce-client-hs. |
| | |  44m 43s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.mapreduce.v2.hs.TestJobHistoryEntities |
|   | hadoop.mapreduce.v2.hs.TestJobHistoryParsing |
|   | hadoop.mapreduce.v2.hs.webapp.dao.TestJobInfo |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767602/HADOOP-MAPREDUCE-6499.2.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9cb5d35 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6072/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-hs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6072/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-hs test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6072/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6072/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6072/console |


This message was automatically generated.

> Add More Informations Of Jobs In JobHistory Main Page
> -
>
> Key: MAPREDUCE-6499
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6499
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Fix For: 2.7.1
>
> Attachments: HADOOP-MAPREDUCE-6499.2.patch, 
> HADOOP-MAPREDUCE-6499.patch
>
>
> Now in  JobHistory Main Page show too little information about finished 
> jobs,even don't have the job's running total time,only 
> startTime,finishedTime.So,in jobHistory main page,we can add more 
> informations about jobs, that we can better analyze jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6499) Add More Informations Of Jobs In JobHistory Main Page

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965263#comment-14965263
 ] 

Hadoop QA commented on MAPREDUCE-6499:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 29s | The applied patch generated  
56 new checkstyle issues (total was 19, now 68). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 53s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests |   5m 58s | Tests failed in 
hadoop-mapreduce-client-hs. |
| | |  45m 18s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.mapreduce.v2.hs.webapp.TestBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764404/HADOOP-MAPREDUCE-6499.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9cb5d35 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6071/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-hs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6071/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-hs test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6071/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6071/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6071/console |


This message was automatically generated.

> Add More Informations Of Jobs In JobHistory Main Page
> -
>
> Key: MAPREDUCE-6499
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6499
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Fix For: 2.7.1
>
> Attachments: HADOOP-MAPREDUCE-6499.2.patch, 
> HADOOP-MAPREDUCE-6499.patch
>
>
> Now in  JobHistory Main Page show too little information about finished 
> jobs,even don't have the job's running total time,only 
> startTime,finishedTime.So,in jobHistory main page,we can add more 
> informations about jobs, that we can better analyze jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6515) Update Application priority in AM side from AM-RM heartbeat

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965254#comment-14965254
 ] 

Hadoop QA commented on MAPREDUCE-6515:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 59s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  3s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  5s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 44s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 41s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:red}-1{color} | mapreduce tests |   0m 45s | Tests failed in 
hadoop-mapreduce-client-common. |
| {color:green}+1{color} | mapreduce tests |   1m 52s | Tests passed in 
hadoop-mapreduce-client-core. |
| {color:green}+1{color} | mapreduce tests |   6m  4s | Tests passed in 
hadoop-mapreduce-client-hs. |
| | |  66m 41s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-mapreduce-client-common |
| Failed unit tests | hadoop.mapreduce.TestTypeConverter |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767279/0001-MAPREDUCE-6515.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9cb5d35 |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6070/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6070/artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6070/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6070/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6070/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| hadoop-mapreduce-client-hs test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6070/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6070/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6070/console |


This message was automatically generated.

> Update Application priority in AM side from AM-RM heartbeat
> ---
>
> Key: MAPREDUCE-6515
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6515
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-MAPREDUCE-6515.patch
>
>
> After YARN-4170, Application Priority is available via heartbeat call. Update 
> this information in AM sothat client can fetch this information via JobStatus 
> (JobReport) call.
> This is as per the discussion happened in MAPREDUCE-5870.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2015-10-20 Thread chong chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965222#comment-14965222
 ] 

chong chen commented on MAPREDUCE-6513:
---

Another way to think about this. Current reducer ramp up and down are designed 
to handle normal case, like this one "slowstart.completedmaps config here was 
0.05". Once all mapper tasks are scheduled and allocated, AM already submits 
all reducers to the system. At this stage, it is a nature thinking to handle 
failure mapper as abnormal case rather than resetting the whole thing and going 
through ramp up/down again. 

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob
>Assignee: Varun Saxena
>Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6499) Add More Informations Of Jobs In JobHistory Main Page

2015-10-20 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated MAPREDUCE-6499:
-
Attachment: HADOOP-MAPREDUCE-6499.2.patch

add the latest new patch.

> Add More Informations Of Jobs In JobHistory Main Page
> -
>
> Key: MAPREDUCE-6499
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6499
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Fix For: 2.7.1
>
> Attachments: HADOOP-MAPREDUCE-6499.2.patch, 
> HADOOP-MAPREDUCE-6499.patch
>
>
> Now in  JobHistory Main Page show too little information about finished 
> jobs,even don't have the job's running total time,only 
> startTime,finishedTime.So,in jobHistory main page,we can add more 
> informations about jobs, that we can better analyze jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6499) Add More Informations Of Jobs In JobHistory Main Page

2015-10-20 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated MAPREDUCE-6499:
-
Attachment: (was: HADOOP-MAPREDUCE-6499.2.patch)

> Add More Informations Of Jobs In JobHistory Main Page
> -
>
> Key: MAPREDUCE-6499
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6499
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Fix For: 2.7.1
>
> Attachments: HADOOP-MAPREDUCE-6499.patch
>
>
> Now in  JobHistory Main Page show too little information about finished 
> jobs,even don't have the job's running total time,only 
> startTime,finishedTime.So,in jobHistory main page,we can add more 
> informations about jobs, that we can better analyze jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2015-10-20 Thread chong chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965197#comment-14965197
 ] 

chong chen commented on MAPREDUCE-6513:
---

How to re-schedule failure/killed tasks vs how to count task exit reason are 
two different things. 

For your case, node is not healthy, it is a typical abnormal case for task 
failure. And this is a small probability event in a healthy cluster 
environment. And for a small set of map task rerun, we should be smarter to let 
it complete quickly rather than bother going through this heavy reducer ramp up 
and down flow. Because this not only slowing down overall job scheduling 
throughput, but also adding unnecessary loads on YARN core scheduler. Workload 
requests (over 600 reducers) have been submitted to the system, for a small set 
of map tasks, AM ramps down all reducer and put those small mapper into the 
front of queue in order to get scheduling, then later, has to gradually 
re-submit them is not an efficient way to handle things. It generates 
unnecessary load on core scheduler. Since YARN is the central brain of big data 
system, it manages large scale multi-tenant cluster. The design philosophy 
should always keep that in mind and try to reduce unnecessary loads on core. 

I think what you discover later is a problem, we need to correct them. But for 
this particular case, I still prefer treating them abnormal failure and bump up 
task priority. 


> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob
>Assignee: Varun Saxena
>Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6499) Add More Informations Of Jobs In JobHistory Main Page

2015-10-20 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated MAPREDUCE-6499:
-
Attachment: HADOOP-MAPREDUCE-6499.2.patch

modify TestHsJobsBlock testcase and modify wrong name hint in search input on 
bottom of jobHistory app page table.

> Add More Informations Of Jobs In JobHistory Main Page
> -
>
> Key: MAPREDUCE-6499
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6499
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Fix For: 2.7.1
>
> Attachments: HADOOP-MAPREDUCE-6499.2.patch, 
> HADOOP-MAPREDUCE-6499.patch
>
>
> Now in  JobHistory Main Page show too little information about finished 
> jobs,even don't have the job's running total time,only 
> startTime,finishedTime.So,in jobHistory main page,we can add more 
> informations about jobs, that we can better analyze jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6515) Update Application priority in AM side from AM-RM heartbeat

2015-10-20 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated MAPREDUCE-6515:
---
Status: Patch Available  (was: Open)

Try checking the jenkins result

> Update Application priority in AM side from AM-RM heartbeat
> ---
>
> Key: MAPREDUCE-6515
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6515
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-MAPREDUCE-6515.patch
>
>
> After YARN-4170, Application Priority is available via heartbeat call. Update 
> this information in AM sothat client can fetch this information via JobStatus 
> (JobReport) call.
> This is as per the discussion happened in MAPREDUCE-5870.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2015-10-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965160#comment-14965160
 ] 

Varun Saxena commented on MAPREDUCE-6513:
-

Sorry meant "As pointed out there was a loop of reducers being preempted and 
ramped up again and again."

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob
>Assignee: Varun Saxena
>Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2015-10-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965097#comment-14965097
 ] 

Varun Saxena commented on MAPREDUCE-6513:
-

Thanks [~cchen1257] and [~devaraj.k] for sharing your thoughts on this.

The obvious solution which we considered when we got this issue was to mark the 
map task as failed so that its priority becomes 5, which would mean Scheduler 
will assign resources to it before reducers. But after long discussion 
internally, we decided against it. Main reason being should we mark a mapper as 
failed when it is perfectly fine and has been marked succeeded. Also this would 
be counted towards task attempt failure. Whether to kill it or fail it frankly 
is a debatable topic and there was a long discussion on it in the JIRA where 
this code was added(refer to MAPREDUCE-3921)1
cc [~bikassaha], [~vinodkv] so that they can also share their thoughts on this.

Moreover, once the map task has been killed its as good as an original task 
attempt which is in scheduled stage (with new task attempt scheduled). So if 
resources could be assigned to original attempt, they should be to this new 
attempt as well(if headroom is available). This made me think that there must 
be some other problem as well. Kindly note that slowstart.completedmaps config 
here was 0.05

Assuming headroom coming from RM was correct and digging into logs we found a 
couple of issues. As pointed away there was a loop of reducers being preempted 
and ramped up again and again.
Firstly we noticed that AM was always ramping up and never ramping down 
reducers. So we thought we can have a configuration which can decide when the 
maps are starved and not ramp up reducers if maps are starving. This would 
ensure that maps get more chance to be assigned in above scenario.
Secondly, when we ramp down all the scheduled reduces, we were not updating the 
ask and hence RM was continuing to allocate resources for reducers(which were 
later rejected by AM) even though it could have assigned these resources to 
mappers straight away.

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob
>Assignee: Varun Saxena
>Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2015-10-20 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965034#comment-14965034
 ] 

Devaraj K commented on MAPREDUCE-6513:
--

I agree with [~chen317], failed maps have highest 
priority(PRIORITY_FAST_FAIL_MAP) than the reducers(PRIORITY_REDUCE), MR AM 
should get a container for failed map than reducer here if resources are 
available for map.

[~Jobo]/[~varun_saxena], What is the map memory requesting for this Job? And do 
you have chance to share the complete log of MR App Master?


> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob
>Assignee: Varun Saxena
>Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)