[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247263#comment-15247263
 ] 

Hadoop QA commented on MAPREDUCE-6513:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 37s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
5s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 24s {color} 
| {color:red} 
hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.8.0_77
 with JDK v1.8.0_77 generated 1 new + 84 unchanged - 0 fixed = 85 total (was 
84) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 42s {color} 
| {color:red} 
hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_95
 with JDK v1.7.0_95 generated 1 new + 85 unchanged - 0 fixed = 86 total (was 
85) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 35s 
{color} | {color:red} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: 
patch generated 33 new + 1673 unchanged - 2 fixed = 1706 total (was 1675) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2871 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 1m 11s 
{color} | {color:red} The patch has 303 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 5s {color} | 
{color:red} hadoop-mapreduce-client-app in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 46s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed with 
JDK v1.7.0_95.

[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-18 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6513:
--
Attachment: MAPREDUCE-6513.3_1.branch-2.7.patch

Rebased branch-2.7 patch.

Since MAPREDUCE-6513 is on top of MAPREDUCE-5465, and scope of MAPREDUCE-5465 
seems too big to pull into branch-2.7. I just manually resolved a couple of 
conflicts. Ran related unit tests, all passed.

[~varun_saxena], [~vinodkv], could you take a final look at attached patch?

Thanks,

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, 
> MAPREDUCE-6513.3_1.branch-2.7.patch, MAPREDUCE-6513.3_1.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) Mappers get rescheduled on node transition even after all reducers are completed

2016-04-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247181#comment-15247181
 ] 

Wangda Tan commented on MAPREDUCE-5817:
---

Thanks [~sjlee0], committing now.

> Mappers get rescheduled on node transition even after all reducers are 
> completed
> 
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.3.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch, 
> mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already 
> finished. We found that the job was rescheduling and running a number of 
> mappers beyond the point of reducer completion. In one situation, the job ran 
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes 
> into the app master, it just reschedules all mappers that already ran on the 
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period. 
> Once this window opens, another node transition can prolong it, and this can 
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, 
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
> reschedule mapper tasks. If all reducers are completed, the mapper outputs 
> are no longer needed, and there is no need to reschedule mapper tasks as they 
> would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246947#comment-15246947
 ] 

Hadoop QA commented on MAPREDUCE-5044:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 50s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 49s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 1s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 52s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 52s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 22s 
{color} | {color:red} root: patch generated 1 new + 324 unchanged - 0 fixed = 
325 total (was 324) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 33s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 23s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 11s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the pa

[jira] [Commented] (MAPREDUCE-6681) TestUberAM fails intermittently

2016-04-18 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246932#comment-15246932
 ] 

Haibo Chen commented on MAPREDUCE-6681:
---

https://issues.apache.org/jira/browse/MAPREDUCE-6647 expects 
TaskAttempt.container.Resource to be not null, but LocalContainerAllocator in 
uber mode doesn't set the resource associated with the containers. 
https://issues.apache.org/jira/browse/MAPREDUCE-6677 is created to fix the test 
failures here.

> TestUberAM  fails intermittently 
> -
>
> Key: MAPREDUCE-6681
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6681
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Haibo Chen
>
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.verifySleepJobCounters(TestMRJobs.java:474)
>   at 
> org.apache.hadoop.mapreduce.v2.TestUberAM.verifySleepJobCounters(TestUberAM.java:71)
> {noformat}
> *PreCommit Build* 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6434/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAPREDUCE-6681) TestUberAM fails intermittently

2016-04-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-6681:
-

Assignee: Haibo Chen

> TestUberAM  fails intermittently 
> -
>
> Key: MAPREDUCE-6681
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6681
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Haibo Chen
>
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.verifySleepJobCounters(TestMRJobs.java:474)
>   at 
> org.apache.hadoop.mapreduce.v2.TestUberAM.verifySleepJobCounters(TestUberAM.java:71)
> {noformat}
> *PreCommit Build* 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6434/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6676) [ NNBench ] Throw IOException when rename,delete fails

2016-04-18 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246916#comment-15246916
 ] 

Brahma Reddy Battula commented on MAPREDUCE-6676:
-

TestCase failures are unrelated, Raised MAPREDUCE-6681 and MAPREDUCE-6682 to 
track..

> [ NNBench ] Throw IOException when rename,delete fails
> --
>
> Key: MAPREDUCE-6676
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6676
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: MAPREDUCE-6676.patch
>
>
> Throw IOException when rename,delete fails, currently it's unknown to user 
> when rename and delte fails..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6682) TestMRCJCFileOutputCommitter fais intermittently

2016-04-18 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created MAPREDUCE-6682:
---

 Summary: TestMRCJCFileOutputCommitter fais intermittently
 Key: MAPREDUCE-6682
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6682
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Brahma Reddy Battula


{noformat}
java.lang.AssertionError: Output directory not empty expected:<0> but was:<4>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.mapred.TestMRCJCFileOutputCommitter.testAbort(TestMRCJCFileOutputCommitter.java:153)
{noformat}

*PreCommit Report* 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6434/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6681) TestUberAM fails intermittently

2016-04-18 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created MAPREDUCE-6681:
---

 Summary: TestUberAM  fails intermittently 
 Key: MAPREDUCE-6681
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6681
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Brahma Reddy Battula


{noformat}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.verifySleepJobCounters(TestMRJobs.java:474)
at 
org.apache.hadoop.mapreduce.v2.TestUberAM.verifySleepJobCounters(TestUberAM.java:71)
{noformat}

*PreCommit Build* 

https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6434/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-04-18 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246860#comment-15246860
 ] 

Haibo Chen commented on MAPREDUCE-6657:
---

Thanks a lot for you comments, [~templedf] I have added a brief javadoc and 
made the timeout to be 500. Let me know if 500 looks reasonable to you. 
Also, the test method is now using the existing dfs cluster instead of a new 
local one. The only method in TestHistoryManager that is using is both dfs 
clusters is testCreateDirsWithAdditionalFileSystem(), so maybe it makes more 
sense to move that method out? 
The behavior of JHS, when name node is in safe mode, is that it throws a 
YarnRuntimeException with a timeout message. I think the assert message is 
actually in line with the expected behavior.

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246787#comment-15246787
 ] 

Hadoop QA commented on MAPREDUCE-6680:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 37s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 59s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 58s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799378/MAPREDUCE-6680-v3.patch
 |
| JIRA Issue | MAPREDUCE-6680 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2114dac365e7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /te

[jira] [Commented] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246735#comment-15246735
 ] 

Junping Du commented on MAPREDUCE-6680:
---

bq.  (T1 != T3) or (T1 == T3 but T1.toSeconds >= T2.toSeconds) 
I mean (T1 != T3) or (T1.toSeconds == T2.toSeconds) or (T1.toSeconds == 
(T2.toSeconds +1))

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6680-v2.patch, MAPREDUCE-6680-v3.patch, 
> MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6680:
--
Attachment: MAPREDUCE-6680-v3.patch

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6680-v2.patch, MAPREDUCE-6680-v3.patch, 
> MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6677) LocalContainerAllocator doesn't specify resource of the containers allocated.

2016-04-18 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246727#comment-15246727
 ] 

Haibo Chen commented on MAPREDUCE-6677:
---

Timeout again.

> LocalContainerAllocator doesn't specify resource of the containers allocated.
> -
>
> Key: MAPREDUCE-6677
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6677
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6677.001.patch, mapreduce6677.002.patch, 
> mapreduce6677.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6677) LocalContainerAllocator doesn't specify resource of the containers allocated.

2016-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246722#comment-15246722
 ] 

Hadoop QA commented on MAPREDUCE-6677:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 15s 
{color} | {color:green} 
hadoop-mapreduce-project_hadoop-mapreduce-client-jdk1.8.0_77 with JDK v1.8.0_77 
generated 0 new + 356 unchanged - 6 fixed = 356 total (was 362) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 12s 
{color} | {color:green} hadoop-mapreduce-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 14s 
{color} | {color:green} 
hadoop-mapreduce-project_hadoop-mapreduce-client-jdk1.7.0_95 with JDK v1.7.0_95 
generated 0 new + 361 unchanged - 6 fixed = 361 total (was 367) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 59s 
{color} | {color:green} hadoop-mapreduce-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 45s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed with 
JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 130m 3s {color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed with JDK 
v1.8.0_77. {color

[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6680:
--
Attachment: (was: MAPREDUCE-6680-v3.patch)

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6680-v2.patch, MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6680:
--
Attachment: MAPREDUCE-6680-v3.patch

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6680-v2.patch, MAPREDUCE-6680-v3.patch, 
> MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246699#comment-15246699
 ] 

Junping Du commented on MAPREDUCE-6680:
---

bq.  we can record latest scanning time on userDir as T2, and we scan the list 
of directory in case: (T1 != T3) or (T1 = T3 but T1.toSeconds = T2.toSeconds), 
that can get rid of skip problem and not involve many unnecessary scan - given 
the chance file update happen at the same second with scan time is very low 
(scan interval is default to be 3 minutes).
Another possibility is T1 (X second Y millisecond) and T3 (X second Z 
millisecond) is cast to second (X+1) in FS. To address this, we need to check 
(T1 != T3) or (T1 == T3 but T1.toSeconds >= T2.toSeconds). Updated this in v3 
patch.

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6680-v2.patch, MAPREDUCE-6680-v3.patch, 
> MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-04-18 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Attachment: MAPREDUCE-5044.008.patch

[~jira.shegalov], [~mingma], [~xgong], [~jlowe], 
Upmerged patch and attaching MAPREDUCE-5044.008.patch.



> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.v01.patch, 
> MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, 
> MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-04-18 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246282#comment-15246282
 ] 

Daniel Templeton commented on MAPREDUCE-6657:
-

Thanks for the patch, [~haibochen].

I hate that HDFS expects you to parse the text of their exceptions to figure 
out what's going on.  Wanna look into whether the API would allow you to throw 
a properly typed exception?  Maybe just file a followup JIRA?

In your test code, it would be nice to add a javadoc header that explains what 
you're testing.

I don't love that you're running two mini-clusters and ignoring one of them.  
Is there any way to do the test with the existing mini-cluster without 
disrupting the other tests?  If not, I'd consider creating a new test class so 
that you don't have two mini-clusters running.

Is 2000ms the shortest reasonable duration for the timeout?  Seems long to me...

{code}
  Assert.assertEquals("Job History Server is expected to time out.",
{code}

Your assert message is misleading.  It should instead say that it didn't get 
the expected error message.

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246081#comment-15246081
 ] 

Hadoop QA commented on MAPREDUCE-6680:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 30s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 52s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 32s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799287/MAPREDUCE-6680-v2.patch
 |
| JIRA Issue | MAPREDUCE-6680 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9fa1feedf7fd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /tes

[jira] [Updated] (MAPREDUCE-6677) LocalContainerAllocator doesn't specify resource of the containers allocated.

2016-04-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6677:
--
Attachment: mapreduce6677.003.patch

Updated patch according to the newly-generated javadoc warning. The unit test 
is again unrelated. 

> LocalContainerAllocator doesn't specify resource of the containers allocated.
> -
>
> Key: MAPREDUCE-6677
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6677
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6677.001.patch, mapreduce6677.002.patch, 
> mapreduce6677.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246070#comment-15246070
 ] 

Hadoop QA commented on MAPREDUCE-6680:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 33s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 54s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 44s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799287/MAPREDUCE-6680-v2.patch
 |
| JIRA Issue | MAPREDUCE-6680 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0541f6ebbf45 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /te

[jira] [Commented] (MAPREDUCE-5397) AM crashes because Webapp failed to start on multi node cluster

2016-04-18 Thread Yi Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246036#comment-15246036
 ] 

Yi Yao commented on MAPREDUCE-5397:
---

Thanks for your quick response, Jonathan. 

> AM crashes because Webapp failed to start on multi node cluster
> ---
>
> Key: MAPREDUCE-5397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5397
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jian He
> Attachments: MRAppMasterlog.txt, log.txt
>
>
> I set up a 12 nodes cluster and tried submitting jobs but get this exception.
> But job is able to succeed after AM crashes and retry a few times(2 or 3)
> {code}
> 2013-07-12 18:56:28,438 INFO [main] org.mortbay.log: Extract 
> jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce
>  to /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp
> 2013-07-12 18:56:28,528 WARN [main] org.mortbay.log: Failed startup of 
> context 
> org.mortbay.jetty.webapp.WebAppContext@2726b2{/,jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce}
> java.io.FileNotFoundException: 
> /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp/webapps/mapreduce/.keep 
> (No such file or directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:194)
>   at java.io.FileOutputStream.(FileOutputStream.java:145)
>   at org.mortbay.resource.JarResource.extract(JarResource.java:215)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.resolveWebApp(WebAppContext.java:974)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.getWebInf(WebAppContext.java:832)
>   at 
> org.mortbay.jetty.webapp.WebInfConfiguration.configureClassLoader(WebInfConfiguration.java:62)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>   at org.mortbay.jetty.Server.doStart(Server.java:224)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:684)
>   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:211)
>   at 
> org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:134)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5397) AM crashes because Webapp failed to start on multi node cluster

2016-04-18 Thread Joshua Snyder (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246031#comment-15246031
 ] 

Joshua Snyder commented on MAPREDUCE-5397:
--

The instance that I am running is 2.7.1, which is prior to the above fix.  I 
won't be able to test on a 2.7.2+ system for a while, but will apply the 
-Djava.io.tmpdir option to yarn.app.mapreduce.am.admin-command-opts .  Thanks, 
Jonathan.

> AM crashes because Webapp failed to start on multi node cluster
> ---
>
> Key: MAPREDUCE-5397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5397
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jian He
> Attachments: MRAppMasterlog.txt, log.txt
>
>
> I set up a 12 nodes cluster and tried submitting jobs but get this exception.
> But job is able to succeed after AM crashes and retry a few times(2 or 3)
> {code}
> 2013-07-12 18:56:28,438 INFO [main] org.mortbay.log: Extract 
> jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce
>  to /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp
> 2013-07-12 18:56:28,528 WARN [main] org.mortbay.log: Failed startup of 
> context 
> org.mortbay.jetty.webapp.WebAppContext@2726b2{/,jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce}
> java.io.FileNotFoundException: 
> /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp/webapps/mapreduce/.keep 
> (No such file or directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:194)
>   at java.io.FileOutputStream.(FileOutputStream.java:145)
>   at org.mortbay.resource.JarResource.extract(JarResource.java:215)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.resolveWebApp(WebAppContext.java:974)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.getWebInf(WebAppContext.java:832)
>   at 
> org.mortbay.jetty.webapp.WebInfConfiguration.configureClassLoader(WebInfConfiguration.java:62)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>   at org.mortbay.jetty.Server.doStart(Server.java:224)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:684)
>   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:211)
>   at 
> org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:134)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5397) AM crashes because Webapp failed to start on multi node cluster

2016-04-18 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246012#comment-15246012
 ] 

Jonathan Eagles commented on MAPREDUCE-5397:


One thing else regarding this failure is that in some cases, node manager boxes 
can be running a tmp directory cleaner process which can inadvertently delete 
old entries in the /tmp directory. This could affect long running jobs 
depending on the settings.

> AM crashes because Webapp failed to start on multi node cluster
> ---
>
> Key: MAPREDUCE-5397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5397
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jian He
> Attachments: MRAppMasterlog.txt, log.txt
>
>
> I set up a 12 nodes cluster and tried submitting jobs but get this exception.
> But job is able to succeed after AM crashes and retry a few times(2 or 3)
> {code}
> 2013-07-12 18:56:28,438 INFO [main] org.mortbay.log: Extract 
> jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce
>  to /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp
> 2013-07-12 18:56:28,528 WARN [main] org.mortbay.log: Failed startup of 
> context 
> org.mortbay.jetty.webapp.WebAppContext@2726b2{/,jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce}
> java.io.FileNotFoundException: 
> /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp/webapps/mapreduce/.keep 
> (No such file or directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:194)
>   at java.io.FileOutputStream.(FileOutputStream.java:145)
>   at org.mortbay.resource.JarResource.extract(JarResource.java:215)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.resolveWebApp(WebAppContext.java:974)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.getWebInf(WebAppContext.java:832)
>   at 
> org.mortbay.jetty.webapp.WebInfConfiguration.configureClassLoader(WebInfConfiguration.java:62)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>   at org.mortbay.jetty.Server.doStart(Server.java:224)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:684)
>   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:211)
>   at 
> org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:134)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6680:
--
Attachment: MAPREDUCE-6680-v2.patch

v2 patch fix checkstyle issue.

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6680-v2.patch, MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) Mappers get rescheduled on node transition even after all reducers are completed

2016-04-18 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246018#comment-15246018
 ] 

Sangjin Lee commented on MAPREDUCE-5817:


I have no objections to backporting this to branch-2.7 or branch-2.6. Thanks!

> Mappers get rescheduled on node transition even after all reducers are 
> completed
> 
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.3.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch, 
> mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already 
> finished. We found that the job was rescheduling and running a number of 
> mappers beyond the point of reducer completion. In one situation, the job ran 
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes 
> into the app master, it just reschedules all mappers that already ran on the 
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period. 
> Once this window opens, another node transition can prolong it, and this can 
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, 
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
> reschedule mapper tasks. If all reducers are completed, the mapper outputs 
> are no longer needed, and there is no need to reschedule mapper tasks as they 
> would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5397) AM crashes because Webapp failed to start on multi node cluster

2016-04-18 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245998#comment-15245998
 ] 

Jonathan Eagles commented on MAPREDUCE-5397:


Couple of leads on a work-around.

MAPREDUCE-6472 (fixed in 2.7.2) which sets the java.io.tmpdir which is the 
directory to the container's working directory

Reference Documentation
http://www.eclipse.org/jetty/documentation/current/ref-temporary-directories.html

In versions before 2.7.2, you can always suppliment the job specific 
yarn.app.mapreduce.am.command-opts by adding -Djava.io.tmpdir=./tmp or add it 
in the cluster wide admin settings yarn.app.mapreduce.am.admin-command-opts.

This has the added benefit of getting cleaned up when a job is finished. If a 
job dies with data written to /tmp, it will be not be cleaned up and can 
pollute the tmp file system and suffer the effects (jobs fail, slow jobs, etc).



> AM crashes because Webapp failed to start on multi node cluster
> ---
>
> Key: MAPREDUCE-5397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5397
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jian He
> Attachments: MRAppMasterlog.txt, log.txt
>
>
> I set up a 12 nodes cluster and tried submitting jobs but get this exception.
> But job is able to succeed after AM crashes and retry a few times(2 or 3)
> {code}
> 2013-07-12 18:56:28,438 INFO [main] org.mortbay.log: Extract 
> jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce
>  to /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp
> 2013-07-12 18:56:28,528 WARN [main] org.mortbay.log: Failed startup of 
> context 
> org.mortbay.jetty.webapp.WebAppContext@2726b2{/,jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce}
> java.io.FileNotFoundException: 
> /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp/webapps/mapreduce/.keep 
> (No such file or directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:194)
>   at java.io.FileOutputStream.(FileOutputStream.java:145)
>   at org.mortbay.resource.JarResource.extract(JarResource.java:215)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.resolveWebApp(WebAppContext.java:974)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.getWebInf(WebAppContext.java:832)
>   at 
> org.mortbay.jetty.webapp.WebInfConfiguration.configureClassLoader(WebInfConfiguration.java:62)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>   at org.mortbay.jetty.Server.doStart(Server.java:224)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:684)
>   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:211)
>   at 
> org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:134)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-18 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245966#comment-15245966
 ] 

Eric Payne commented on MAPREDUCE-6633:
---

Thanks [~shahrs87]. I cherry picked this back to 2.7.

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Fix For: 2.7.3
>
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-18 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6633:
--
Fix Version/s: (was: 2.8.0)
   2.7.3

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Fix For: 2.7.3
>
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5397) AM crashes because Webapp failed to start on multi node cluster

2016-04-18 Thread Yi Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245962#comment-15245962
 ] 

Yi Yao commented on MAPREDUCE-5397:
---

I got the same error. Is there any workaround or fix for it?

> AM crashes because Webapp failed to start on multi node cluster
> ---
>
> Key: MAPREDUCE-5397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5397
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jian He
> Attachments: MRAppMasterlog.txt, log.txt
>
>
> I set up a 12 nodes cluster and tried submitting jobs but get this exception.
> But job is able to succeed after AM crashes and retry a few times(2 or 3)
> {code}
> 2013-07-12 18:56:28,438 INFO [main] org.mortbay.log: Extract 
> jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce
>  to /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp
> 2013-07-12 18:56:28,528 WARN [main] org.mortbay.log: Failed startup of 
> context 
> org.mortbay.jetty.webapp.WebAppContext@2726b2{/,jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce}
> java.io.FileNotFoundException: 
> /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp/webapps/mapreduce/.keep 
> (No such file or directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:194)
>   at java.io.FileOutputStream.(FileOutputStream.java:145)
>   at org.mortbay.resource.JarResource.extract(JarResource.java:215)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.resolveWebApp(WebAppContext.java:974)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.getWebInf(WebAppContext.java:832)
>   at 
> org.mortbay.jetty.webapp.WebInfConfiguration.configureClassLoader(WebInfConfiguration.java:62)
>   at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>   at org.mortbay.jetty.Server.doStart(Server.java:224)
>   at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:684)
>   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:211)
>   at 
> org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:134)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245834#comment-15245834
 ] 

Hadoop QA commented on MAPREDUCE-6680:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 38s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s 
{color} | {color:red} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs: 
patch generated 1 new + 16 unchanged - 0 fixed = 17 total (was 16) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 34s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 57s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 18s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799262/MAPREDUCE-6680.patch |
| JIRA Issue | MAPREDUCE-6680 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux fb28783d1eb4 3.13.0-36-lowlatency #63-Ubunt

[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6680:
--
Status: Patch Available  (was: Open)

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6680:
--
Attachment: MAPREDUCE-6680.patch

Put a patch with proper fix. The unit test could be very hard given the issue 
appears to be platform related and intermittently. I think the fix is 
straight-forward enough.

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245731#comment-15245731
 ] 

Junping Du commented on MAPREDUCE-6680:
---

Given analysis in description, the solution is: we can record latest scanning 
time on userDir as T2, and we scan the list of directory in case: (T1 != T3) or 
(T1 = T3 but T1.toSeconds = T2.toSeconds), that can get rid of skip problem and 
not involve many unnecessary scan - given the chance file update happen at the 
same second with scan time is very low (scan interval is default to be 3 
minutes).

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6680:
--
Description: 
In our cluster based on a Cloud FileSystem, we notice JHS sometimes could skip 
directory with .jhist file in scanning.
The behavior is like:
First round scan, doesn't found .jhist file:
{noformat}
16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a directory 
with 6 files in it.
16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
...
{noformat}

Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
until application failed as timeout.

>From our analysis, we found the root cause is: most of Cloud File System 
>(Azure FS, S3, etc.) is truncating file/directory modification time to seconds 
>instead of milliseconds - which could due to limit of http protocol (from 
>discussion at: https://forums.aws.amazon.com/thread.jspa?messageID=476615). 

So if the time sequence is happen to be: latest non .jhist file modification on 
directory happens at T1, directory scanning happens at T2, .jhist file added to 
directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 after 
truncating to seconds, this issue could appear.

  was:
In our cluster based on a Cloud FileSystem, we notice JHS sometimes could skip 
directory with .jhist file in scanning.
The behavior is like:
First round scan, doesn't found .jhist file:
{noformat}
16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a directory 
with 6 files in it.
16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
...
{noformat}

Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
until application failed as timeout.

>From our analysis, we found the root cause is: most of Cloud File System 
>(Azure, S3, etc.) is truncating file/directory modification time to seconds 
>instead of milliseconds - which could due to limit of http protocol (from 
>discussion at: https://forums.aws.amazon.com/thread.jspa?messageID=476615). 

So if the time sequence is happen to be: latest non .jhist file modification on 
directory happens at T1, directory scanning happens at T2, .jhist file added to 
directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 after 
truncating to seconds, this issue could appear.


> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-18 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated MAPREDUCE-6633:
--
Target Version/s: 3.0.0, 2.8.0, 2.7.3

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-18 Thread Junping Du (JIRA)
Junping Du created MAPREDUCE-6680:
-

 Summary: JHS UserLogDir scan algorithm sometime could skip 
directory with update in CloudFS (Azure FileSystem, S3, etc.)
 Key: MAPREDUCE-6680
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Junping Du
Assignee: Junping Du


In our cluster based on a Cloud FileSystem, we notice JHS sometimes could skip 
directory with .jhist file in scanning.
The behavior is like:
First round scan, doesn't found .jhist file:
{noformat}
16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a directory 
with 6 files in it.
16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
...
{noformat}

Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
until application failed as timeout.

>From our analysis, we found the root cause is: most of Cloud File System 
>(Azure, S3, etc.) is truncating file/directory modification time to seconds 
>instead of milliseconds - which could due to limit of http protocol (from 
>discussion at: https://forums.aws.amazon.com/thread.jspa?messageID=476615). 

So if the time sequence is happen to be: latest non .jhist file modification on 
directory happens at T1, directory scanning happens at T2, .jhist file added to 
directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 after 
truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6649) getFailureInfo not returning any failure info

2016-04-18 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated MAPREDUCE-6649:
---
Fix Version/s: 2.8.0

> getFailureInfo not returning any failure info
> -
>
> Key: MAPREDUCE-6649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6649.001.patch, MAPREDUCE-6649.002.patch
>
>
> The following command does not produce any failure info as to why the job 
> failed. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dmapreduce.jobtracker.split.metainfo.maxsize=10 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1 -rt 1
> {noformat}
> {noformat}
> 2016-03-07 10:34:58,112 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0004 failed with 
> state FAILED due to: 
> {noformat}
> To contrast, here is a command and associated command line output to show a 
> failed job that gives the correct failiure info. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dyarn.app.mapreduce.am.command-opts=-goober 
> -Dmapreduce.job.queuename=default -m 20 -r 0 -mt 3
> {noformat}
> {noformat}
> 2016-03-07 10:30:13,103 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0003 failed with 
> state FAILED due to: Application application_1457364518683_0003 failed 3 
> times due to AM Container for appattempt_1457364518683_0003_03 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: Exception from container-launch.
> Container id: container_1457364518683_0003_03_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
>   at org.apache.hadoop.util.Shell.run(Shell.java:838)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6649) getFailureInfo not returning any failure info

2016-04-18 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated MAPREDUCE-6649:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks, [~eepayne] for reviewing and committing this!

> getFailureInfo not returning any failure info
> -
>
> Key: MAPREDUCE-6649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6649.001.patch, MAPREDUCE-6649.002.patch
>
>
> The following command does not produce any failure info as to why the job 
> failed. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dmapreduce.jobtracker.split.metainfo.maxsize=10 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1 -rt 1
> {noformat}
> {noformat}
> 2016-03-07 10:34:58,112 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0004 failed with 
> state FAILED due to: 
> {noformat}
> To contrast, here is a command and associated command line output to show a 
> failed job that gives the correct failiure info. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dyarn.app.mapreduce.am.command-opts=-goober 
> -Dmapreduce.job.queuename=default -m 20 -r 0 -mt 3
> {noformat}
> {noformat}
> 2016-03-07 10:30:13,103 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0003 failed with 
> state FAILED due to: Application application_1457364518683_0003 failed 3 
> times due to AM Container for appattempt_1457364518683_0003_03 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: Exception from container-launch.
> Container id: container_1457364518683_0003_03_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
>   at org.apache.hadoop.util.Shell.run(Shell.java:838)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-04-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245559#comment-15245559
 ] 

Hadoop QA commented on MAPREDUCE-6240:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} MAPREDUCE-6240 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12742446/MAPREDUCE-6240.003.patch
 |
| JIRA Issue | MAPREDUCE-6240 |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6441/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-04-18 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245548#comment-15245548
 ] 

Bolke de Bruin commented on MAPREDUCE-6240:
---

Where are we on this one? We have the issue now when compiling bigtop's sqoop2 
and we don't even know where to start debugging.

> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)