[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265883#comment-13265883 ] Robert Joseph Evans commented on MAPREDUCE-4088: No I am looking mostly at being sure that the code is working as expected. Thanks Ravi, +1 Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 (Interpreted frame) This should never happen. A stuck task should never prevent other tasks from different jobs on the same node from committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264958#comment-13264958 ] Robert Joseph Evans commented on MAPREDUCE-4088: The patch looks good but I am not an expert on this part of the code have you run it on a cluster at all? if so what tests have you run with it? Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 (Interpreted frame) This should never happen. A stuck task should never prevent other tasks from different jobs on the same node from committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265108#comment-13265108 ] Ravi Prakash commented on MAPREDUCE-4088: - I ran the patched branch-1 build on my local dev box. The tasks were cleaned up successfully as usual after a simple wordcount job. Are you looking for specific tests? Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 (Interpreted frame) This should never happen. A stuck task should never prevent other tasks from different jobs on the same node from committing. -- This message is automatically generated by JIRA. If you think it was sent
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263714#comment-13263714 ] Hadoop QA commented on MAPREDUCE-4088: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524865/MAPREDUCE-4088.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2319//console This message is automatically generated. Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: MAPREDUCE-4088.patch, MAPREDUCE-4088.patch We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) -
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263723#comment-13263723 ] Ravi Prakash commented on MAPREDUCE-4088: - 0.23 and later is not afflicted with this problem because of the different architecture (TaskCleanerImpl tries to abortTask) Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: MAPREDUCE-4088.patch, MAPREDUCE-4088.patch We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 (Interpreted frame) This should never happen. A stuck task should never prevent other tasks from different jobs on the same node from committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263731#comment-13263731 ] Hadoop QA commented on MAPREDUCE-4088: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524866/MAPREDUCE-4088.branch-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2320//console This message is automatically generated. Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame)
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263802#comment-13263802 ] Robert Joseph Evans commented on MAPREDUCE-4088: Jenkins is not smart enough to apply the patch properly to branch-1 instead of trunk and run the tests. You need to run test-patch yourself manually. I looked at the patch, and it looks OK for the most part. Inside taskCleanupThread you are generating a StringBuffer, and appending things to it, but that string buffer is never used anywhere. Did you miss a log statement or something there? Because I would like a log statement saying what is happening and why the cleanup was suspended. Have you run the tests? Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264012#comment-13264012 ] Hadoop QA commented on MAPREDUCE-4088: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524929/MAPREDUCE-4088.branch-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2325//console This message is automatically generated. Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.branch-1.patch, MAPREDUCE-4088.patch, MAPREDUCE-4088.patch We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14,
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263197#comment-13263197 ] Ravi Prakash commented on MAPREDUCE-4088: - Courtesy Koji {quote} Each TaskTracker has a single thread for taskCleanup taking work from tasksToCleanup queue. For each task to cleanup, it firsts call checkJobStatusAndWait(action); And inside {noformat} private void checkJobStatusAndWait(TaskTrackerAction action) ... synchronized (runningJobs) { rjob = runningJobs.get(jobId); } if (rjob != null) { synchronized (rjob) { while (rjob.localizing) { rjob.wait(); } } } {noformat} So this thread would wait while the task is being localized. Even if one task is hung on localization, entire cleanup is stopped. {quote} East or west! Koji is the best! Soda lemon ginger pop! Koji is on the top! Yyaayyy yaa yaa for Koji! Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) -
[jira] [Commented] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing
[ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263246#comment-13263246 ] Ravi Prakash commented on MAPREDUCE-4088: - Does anyone have any suggestions on how to fix this? I'm thinking of this: * We could have a timeout in the wait for checkJobStatusAndWait. * If we time out, we simply put the action back into the queue (hoping next time around it succeeds) * This might make the isIdle method more complicated :( Task stuck in JobLocalizer prevented other tasks on the same node from committing - Key: MAPREDUCE-4088 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical We saw that as a result of HADOOP-6963, one task was stuck in this Thread 23668: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) TONS MORE OF THIS SAME LINE - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) . . - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) ne=451 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) While all other tasks on the same node were stuck in Thread 32141: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 (Interpreted frame) This should never happen. A stuck task should never prevent other tasks from different jobs on the same node from committing. -- This message is automatically generated by JIRA. If you think it was sent