[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538604#comment-14538604 ] Hadoop QA commented on MAPREDUCE-5465: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732001/MAPREDUCE-5465-branch-2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ea11590 | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5704/console | This message was automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465-branch-2.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535390#comment-14535390 ] Jason Lowe commented on MAPREDUCE-5465: --- _Finally_ carved out some time to take a look at this again. My sincere apologies for the long delay. Patch looks great overall, but I noticed that we're ignoring the kill event in the SUCCESS_CONTAINER_CLEANUP. That seems wrong, since we do something significantly different just before or just after that state, in SUCCESS_FINISHING_CONTAINER and SUCCEEDED, respectively. However that issue existed before this patch, so we could address it as a separate JIRA. The patch doesn't apply to branch-2. [~mingma] would you mind providing a patch for branch-2 as well? > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508179#comment-14508179 ] Ray Chiang commented on MAPREDUCE-5465: --- Just as a quick follow up, I also did some testing on the mapreduce.task.exit.timeout property. On a machine without extraneous load, it took about 20 seconds to write out the largest profile.out that I could generate. So, having a default timeout of 60 seconds seems like it will be sufficient for all but the most loaded nodes. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505221#comment-14505221 ] Ray Chiang commented on MAPREDUCE-5465: --- No problem. Thanks for getting back to me. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504966#comment-14504966 ] Jason Lowe commented on MAPREDUCE-5465: --- Sorry, I have been very busy as of late and haven't had time to get back to this patch. I'll try to get some review comments on the new patch posted by next week. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503970#comment-14503970 ] Ray Chiang commented on MAPREDUCE-5465: --- [~jlowe], I know it's been a while since this has been reviewed by you, but any thoughts? I figured I'd check with you while this was still fresh in my mind. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503966#comment-14503966 ] Ray Chiang commented on MAPREDUCE-5465: --- +1 (nonbinding) for version 9 patch. - All failed Jenkins unit tests pass in my tree. - Version 9 generates a clean state diagram. Version 8 had a state with no transitions to it, while the latest has clean transitions from SUCCESS_FINISHING_CONTAINER to either SUCCESS_CONTAINER_CLEANUP or KILL_CONTAINER_CLEANUP. - Running with a modified WordCount job, I get the following results: + The non-patched version (running with depth=6) generates a profile.out of 69818 bytes and does not contain the SITES or CPU SAMPLES information. + The patched version (running with depth=6) generates a profile.out of 11348778 bytes and contains both the SITES and CPU SAMPLES information. + The patched version (running with depth=100) generates a profile.out of 379301524 bytes and contains both the SITES and CPU SAMPLES information. Similar result with depth=1. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394020#comment-14394020 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709076/MAPREDUCE-5465-9.patch against trunk revision 6a6a59d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1152 javac compiler warnings (more than the trunk's current 1151 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.pipes.TestPipeApplication org.apache.hadoop.mapred.TestMRTimelineEventHandling org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapred.TestClusterMRNotification Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5370//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5370//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5370//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393533#comment-14393533 ] Ray Chiang commented on MAPREDUCE-5465: --- Great! Thanks! > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387119#comment-14387119 ] Ray Chiang commented on MAPREDUCE-5465: --- [~mingma], can you give me an idea when you can get an rebased patch uploaded? Thanks. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378906#comment-14378906 ] Ray Chiang commented on MAPREDUCE-5465: --- Thanks for the info. And I'd be interested in looking at the latest version that you've got rebased against trunk. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378161#comment-14378161 ] Ming Ma commented on MAPREDUCE-5465: [~rchiang], thanks for looking into this. SUCCESS_CONTAINER_CLEANUP can be transitioned from SUCCESS_FINISHING_CONTAINER. For ExitFinishingOnTimeoutTransition , you can search for FINISHING_ON_TIMEOUT_TRANSITION. We have been running a slight different version of this patch in our production clusters for a while. I can rebase the patch for trunk if people are interested in it. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378075#comment-14378075 ] Ray Chiang commented on MAPREDUCE-5465: --- [~mingma], [~jlowe] what are your thoughts on the current status of this patch? I'd appreciate any information before I dig too much deeper. Thanks. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369991#comment-14369991 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666262/MAPREDUCE-5465-8.patch against trunk revision 61a4c7f. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5314//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369984#comment-14369984 ] Ray Chiang commented on MAPREDUCE-5465: --- I'm still catching up on this JIRA for reviewing, but I do have some questions. 1) I see that SUCCESS_CONTAINER_CLEANUP still exists as a state, but I'm not seeing any transitions to that state. Is it still needed? Or is it a hidden transition now? 2) I see the instantiation of the ExitFinishingOnTimeoutTransition class, but I'm not seeing it used anywhere. Along similar lines, I'm not seeing any exit transitions from the SUCCESS_FINISHING_CONTAINER and FAIL_FINISHING_CONTAINER classes. I'm also not sure if that is deliberate or not. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120291#comment-14120291 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666262/MAPREDUCE-5465-8.patch against trunk revision ce04621. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4843//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4843//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119488#comment-14119488 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666142/MAPREDUCE-5465-8.patch against trunk revision 08a9ac7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.app.TestRecovery org.apache.hadoop.mapred.TestLineRecordReader org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader org.apache.hadoop.mapreduce.v2.hs.webapp.dao.TestJobInfo org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryEntities org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestMRIntermediateDataRunning org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4842//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4842//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114874#comment-14114874 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665193/MAPREDUCE-5465-7.patch against trunk revision d8774cc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1303 javac compiler warnings (more than the trunk's current 1275 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 9 warning messages. See https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4835//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesAttempts org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesTasks org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobConf org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobConf org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesAttempts org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesTasks org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapreduce.security.TestJHSSecurity org.apache.hadoop.mapred.TestJobCounters The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoopTests org.apache.hadoop.mapreducTests org.apache.hadoop.mapreTests org.apache.haTests org.apache.hadoop.maTests org.apache.hadoop.mapreduce.v2.appTests org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler org.apache.hadoop.mapreduce.v2.hs.TestHistoryServerFileSystemStateStoreService org.apache.hadoop.mapred.TestMultipleLevelCaching org.apache.hadoop.mapred.TestReporter org.apache.hadoop.mapreduce.TestLargeSort {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4835//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4835//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4835//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113915#comment-14113915 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664996/MAPREDUCE-5465-7.patch against trunk revision c4c9a78. {color:red}-1 @author{color}. The patch appears to contain @author tags which the Hadoop community has agreed to not allow in code contributions. {color:green}+1 tests included{color}. The patch appears to include new or modified test files. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4831//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113897#comment-14113897 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org against trunk revision c4c9a78. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4830//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993652#comment-13993652 ] Jason Lowe commented on MAPREDUCE-5465: --- The release audit warnings are unrelated, filed MAPREDUCE-5885. The TestPipeApplication timeout is also unrelated, see MAPREDUCE-5868. Thanks for updating the patch, Ming! Sorry for the long delay in getting back to this. I've been thinking about the performance implications of this change. I'm wondering if we should treat the finishing states as if they're the corresponding completed states from external entities (i.e.: task/job). We would send T_ATTEMPT_SUCCEEDED or T_ATTEMPT_FAILED and set task finish times to the time the attempt said it succeeded or failed rather than the time the container completed. Similarly we would map the internal finishing states to their respective external SUCCEEDED/FAILED state rather than RUNNING. From the task/job perspective they're not particularly interested in when the attempt exits, rather they only care about when the task says it's output is available. This would allow the task and job to react to success/failure transitions in the same timeframe that it does today, so there should be a minimal performance impact. The only impact would be if the container needs to complete to free up enough space for the next task's container to be allocated, and in most cases the task will complete quick enough that the AM will receive the new container in the same heartbeat that it used to before this change. Actually this may end up being slightly faster than what it does today, since today it connects to the NM and sends the kill command before it considers the task completed. This proposal would have the task complete as soon as the task indicated via the umbilical. Other comments on the latest patch: - Rather than have the finishing states call the cleanup container transition and have that transition have to special-case being called by finishing states, it'd be cleaner to factor out the common code from the cleanup container transition that they're trying to leverage and call that instead. Transitions doing state or event checks usually means somethings a bit off, since the transition should already know what event triggered it and what state(s) it applies to. - Similarly, the timeout transitions should have dedicated transition code that not only warns in the AM log but also sets an attempt diagnostic message. It can re-use some/all of the cleanup container transition so it's not replicating code. With the diagnostic it will be much more likely the user will be aware of the timeout issue and fix their task code. Tasks that timeout during finishing can still succeed, so users probably won't even know something went wrong unless they bother to examine the AM log and happen to notice it. - This change looks like some accidental reformatting: {noformat} --- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/LocalContainerLauncher.java +++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/LocalContainerLauncher.java @@ -222,7 +222,7 @@ public void run() { // remember the current attempt futures.put(event.getTaskAttemptID(), future); -} else if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP) { + } else if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP) { // cancel (and interrupt) the current running task associated with the // event {noformat} - Nit: a sendContainerCompleted utility method to send the CONTAINER_COMPLETED event would be nice - Nit: code should be formatted to 80 columns, comments for the state transitions in particular. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995149#comment-13995149 ] Jason Lowe commented on MAPREDUCE-5465: --- bq. 5. (t3-t1) can also impact job latency. Notifying task/job earlier won't help to improve (t3-t1). It can if we're assuming sufficient capacity in the cluster. t3 is dependent upon when the AM asks for the containers, and the sooner the AM knows a task completed the sooner it can ask for new containers (e.g.: map tasks completing leading to launching reduce tasks). The other scenario where job completion time is reduced is when reduce tasks which are already running are waiting upon the final map task. In that case we should be notifying the reduce tasks of the map task completion event as soon as the completion message arrives across the umbilical from the map task and not wait until we receive the container completion from the RM. That delay will directly lead to longer job times. Regarding the out-of-band heartbeat, agreed that we should consider sending OOB heartbeats on container completion rather than kill. Filed YARN-2046 to track that issue. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994140#comment-13994140 ] Ming Ma commented on MAPREDUCE-5465: Thanks, Jason! We have discussed the performance implication in https://issues.apache.org/jira/browse/YARN-221. It is good to revisit the issue. 1. I assume job latency is the metric we want to use. The question is how much such change impacts the job latency. 2. Say umbilical notification is at t1, task receives T_ATTEMPT_SUCCEEDED or T_ATTEMPT_FAILED at t2, MRAppMaster acquires new containers from RM for next set of tasks at t3. 3. How much does (t2-t1) impact job latency? It depends on the job characteristics. mapper output can be available sooner; reducer containers can be scheduled sooner, etc. But it isn't going to be linear to number of tasks; given tasks run in parallel. So it should be much smaller. I don't have the formula. It will be useful to compare the performance difference using actual jobs. 4. Your suggestion of notifying task/job right after t1 is a good idea to improve (t2-t1). I assume it doesn't change the state transition of task attempt. We need to confirm state machine correctness point of view, given there might be some assumptions between task attempt and task state machines. 5. (t3-t1) can also impact job latency. Notifying task/job earlier won't help to improve (t3-t1). 6. To improve (t3-t1), perhaps when container exits, it should send OutofBandHeartBeat. Currently OutofBandHeartBeat is sent only when stopContainer is called. Perhaps This is useful when NM->RM's heartbeat interval is big. 7. It appears there is some issue w.r.t. the current stopContainer's calling NodeStatusUpdaterImpl's OutofBandHeartBeat processing. stopContainer first enqueues "kill" container event before calling NodeStatusUpdaterImpl's OutofBandHeartBeat. So it is possible the NodeStatusUpdaterImpl heartbeat thread sends the heartbeat to RM before the main Dispatcher thread processes the event and mark the container as completed. Thus the OutofBandHeartBeat doesn't include that container in the completed container list. Does it really need to call NodeStatusUpdaterImpl's OutofBandHeartBeat in stopContainer? It seems it is better to call it only when a container exits. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990365#comment-13990365 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643435/MAPREDUCE-5465-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 3 release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.pipes.TestPipeApplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4584//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4584//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4584//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977483#comment-13977483 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641300/MAPREDUCE-5465-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4545//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4545//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974786#comment-13974786 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640939/MAPREDUCE-5465-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1288 javac compiler warnings (more than the trunk's current 1287 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4539//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4539//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4539//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4539//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964753#comment-13964753 ] Ming Ma commented on MAPREDUCE-5465: Thanks Jason for the review. I will upload the updated patch soon. Want to comment on the couple points you mentioned. 1. Yes, putting finishTaskMonitor under TaskAttemptListenerImpl isn't clean, given TaskAttemptListenerImpl should only deal with TaskUmbilicalProtocol related. I will move it out to AppContext layer. 2. Handling of TA_FAILMSG event. TA_FAILMSG can be triggered by task JVM as well as user via "hadoop job -fail-task command". For the case where task JVM reports failure, yes, it can wait for the container to exit. For the case where end users send the command, it will need to clean up the container right away. I skipped that for simplicity. If we want to support that, it seems we will need a new event like TA_FAILMSG_BY_USER. 3. Why are we transitioning from FINISHING_CONTAINER to SUCCESS_CONTAINER_CLEANUP rather than to SUCCEEDED when we receive a container completed event? It was done for simplicity so that all successful states will go to SUCCESS_CONTAINER_CLEANUP first. But I agree it can go directly to SUCCEEDED when we receive a container completed event. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963278#comment-13963278 ] Jason Lowe commented on MAPREDUCE-5465: --- Thanks for picking this up, Ming. Some comments on the patch: - The new interface for TaskAttemptListener seems more complicated than necessary. IIUC it's not valid to call registerFinishingTask without having called unregister essentially at the same time, so wondering if we should just add a single method, something like setTaskFinishing(TaskAttemptId, WrappedJvmID) to denote when the task JVM has finished and we're simply waiting for the container to complete. A typical lifecycle could pass a null JVM to the unregister method if the JVM already finished, and we can still skip the finishing method and go straight from registerLauchedTask -> unregister if the task container exits unexpectedly. Either that or it seems the TaskAttemptExitHandler should not be part of TaskAttemptListener and instead conveyed separately to the TaskAttemptImpl (possibly via AppContext along with other things), since it seems tacked on and not related to the other workings of TaskAttemptListener. - It looks like we're only handling the successful task case. We don't want to proactively kill tasks that have reported failure just like we don't want to proactively kill tasks that have reported success. - Why are we transitioning from FINISHING_CONTAINER to SUCCESS_CONTAINER_CLEANUP rather than to SUCCEEDED when we receive a container completed event? The SUCCESS_CONTAINER_CLEANUP state is for waiting for a container completed event to arrive, but we are leaving the FINISHING_CONTAINER state due to the arrival of that very event. - The new properties should be added to mapred-default.xml for documentation purposes. - Suggestion: TaskAttemptFinishingMonitor may be a more accurate name than TaskAttemptExitHandler - Nit: The default values for the new properties should be named in MRJobConfig - Nit: comments should be formatted for 80 columns - Nit: In the comment for the new FINISHING_CONTAINER state: "That will a chance" should be "That will give a chance" - Nit: testFinshingAttemptTimedout -> testFinishingAttemptTimeout - Nit: testTaskAttemptDiognosticEventOnFinishing -> testTaskAttemptDiagnosticEventOnFinishing > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960447#comment-13960447 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638749/MAPREDUCE-5465-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1484 javac compiler warnings (more than the trunk's current 1483 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4486//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4486//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4486//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958178#comment-13958178 ] Hadoop QA commented on MAPREDUCE-5465: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632292/MAPREDUCE-5465-2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4481//console This message is automatically generated. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: trunk, 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916964#comment-13916964 ] Ming Ma commented on MAPREDUCE-5465: I discussed with Ravi offline and will provide the patch for review soon. The basic approach is to define a new state called FINISHING_CONTAINER for TaskAttemptStateInternal. TaskAttempt will transition to this new state after it receives TaskUmbilicalProtocol's done notification from the task JVM. This will give a chance for the container to exit by itself. Normally the attempt will receive container exit notification via NM -> RM -> AM route; if it doesn't get the notification in time, it will time out and clean up the container via stopContainer. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ming Ma > Attachments: MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806907#comment-13806907 ] Ravi Prakash commented on MAPREDUCE-5465: - And I totally forgot to answer Hitesh's question! Sorry about that. :( NM_SLEEP_DELAY_BEFORE_SIGKILL_MS handles the timeout before a SIGKILL. However at that point in time, a SIGTERM has already been sent once. This can be disruptive too. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ravi Prakash > Attachments: MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763638#comment-13763638 ] Jason Lowe commented on MAPREDUCE-5465: --- Another way to solve it is to add a separate task state for finishing tasks, similar to the FINISHING state that was added for apps, where the AM is simply waiting for the container completed event or the task to expire. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ravi Prakash > Attachments: MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763633#comment-13763633 ] Ravi Prakash commented on MAPREDUCE-5465: - Actually this patch will probably tie up the dispatcher thread for 250ms for every task. Not a good idea for big jobs. I guess one way might be to queue in a Future to submit the CONTAINER_REMOTE_CLEANUP. Then I'd have to wait for all those events to drain before exit. Considerably more involved :( > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Radim Kolar >Assignee: Ravi Prakash > Attachments: MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762341#comment-13762341 ] Ravi Prakash commented on MAPREDUCE-5465: - Has any one tried fixing this? If not, I may take a crack it. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Radim Kolar > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742554#comment-13742554 ] Jason Lowe commented on MAPREDUCE-5465: --- Moved to MAPREDUCE since this is a problem with the MR ApplicationMaster and not YARN. > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Radim Kolar > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira