[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-05-01 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265883#comment-13265883
 ] 

Robert Joseph Evans commented on MAPREDUCE-4088:


No I am looking mostly at being sure that the code is working as expected.  
Thanks Ravi, +1

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: MAPREDUCE-4088.branch-1.patch, 
 MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch


 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 
 (Interpreted frame)
 This should never happen. A stuck task should never prevent other tasks from 
 different jobs on the same node from committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-30 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264958#comment-13264958
 ] 

Robert Joseph Evans commented on MAPREDUCE-4088:


The patch looks good but I am not an expert on this part of the code have you 
run it on a cluster at all? if so what tests have you run with it?

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: MAPREDUCE-4088.branch-1.patch, 
 MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch


 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 
 (Interpreted frame)
 This should never happen. A stuck task should never prevent other tasks from 
 different jobs on the same node from committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please 

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-30 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265108#comment-13265108
 ] 

Ravi Prakash commented on MAPREDUCE-4088:
-

I ran the patched branch-1 build on my local dev box. The tasks were cleaned up 
successfully as usual after a simple wordcount job. Are you looking for 
specific tests?

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: MAPREDUCE-4088.branch-1.patch, 
 MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch


 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 
 (Interpreted frame)
 This should never happen. A stuck task should never prevent other tasks from 
 different jobs on the same node from committing.

--
This message is automatically generated by JIRA.
If you think it was sent 

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263714#comment-13263714
 ] 

Hadoop QA commented on MAPREDUCE-4088:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12524865/MAPREDUCE-4088.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2319//console

This message is automatically generated.

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: MAPREDUCE-4088.patch, MAPREDUCE-4088.patch


 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - 

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-27 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263723#comment-13263723
 ] 

Ravi Prakash commented on MAPREDUCE-4088:
-

0.23 and later is not afflicted with this problem because of the different 
architecture (TaskCleanerImpl tries to abortTask)

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: MAPREDUCE-4088.patch, MAPREDUCE-4088.patch


 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 
 (Interpreted frame)
 This should never happen. A stuck task should never prevent other tasks from 
 different jobs on the same node from committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263731#comment-13263731
 ] 

Hadoop QA commented on MAPREDUCE-4088:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12524866/MAPREDUCE-4088.branch-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2320//console

This message is automatically generated.

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, 
 MAPREDUCE-4088.patch


 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-27 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263802#comment-13263802
 ] 

Robert Joseph Evans commented on MAPREDUCE-4088:


Jenkins is not smart enough to apply the patch properly to branch-1 instead of 
trunk and run the tests.  You need to run test-patch yourself manually.

I looked at the patch, and it looks OK for the most part.  

Inside taskCleanupThread you are generating a StringBuffer, and appending 
things to it, but that string buffer is never used anywhere.  Did you miss a 
log statement or something there?  Because I would like a log statement saying 
what is happening and why the cleanup was suspended.

Have you run the tests?

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, 
 MAPREDUCE-4088.patch


 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264012#comment-13264012
 ] 

Hadoop QA commented on MAPREDUCE-4088:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12524929/MAPREDUCE-4088.branch-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2325//console

This message is automatically generated.

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: MAPREDUCE-4088.branch-1.patch, 
 MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch


 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, 

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-26 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263197#comment-13263197
 ] 

Ravi Prakash commented on MAPREDUCE-4088:
-

Courtesy Koji

{quote}
Each TaskTracker has a single thread for taskCleanup taking work from
tasksToCleanup queue.

For each task to cleanup, it firsts call 
  checkJobStatusAndWait(action);

And inside
 {noformat}
  private void checkJobStatusAndWait(TaskTrackerAction action)
...
synchronized (runningJobs) {
  rjob = runningJobs.get(jobId);
}
if (rjob != null) {
  synchronized (rjob) {
while (rjob.localizing) {
  rjob.wait();
}
  }
}
 {noformat}

So this thread would wait while the task is being localized.
Even if one task is hung on localization, entire cleanup is stopped.
{quote}

East or west! Koji is the best!
Soda lemon ginger pop! Koji is on the top!
Yyaayyy yaa yaa for Koji!

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical

 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 

[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-26 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263246#comment-13263246
 ] 

Ravi Prakash commented on MAPREDUCE-4088:
-

Does anyone have any suggestions on how to fix this?  I'm thinking of this: 
 * We could have a timeout in the wait for checkJobStatusAndWait. 
 * If we time out, we simply put the action back into the queue (hoping next 
time around it succeeds)
 * This might make the isIdle method more complicated :(


 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical

 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 
 (Interpreted frame)
 This should never happen. A stuck task should never prevent other tasks from 
 different jobs on the same node from committing.

--
This message is automatically generated by JIRA.
If you think it was sent