[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421423#comment-13421423 ] Tom White commented on MAPREDUCE-4400: -- +1 > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: performance, task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418913#comment-13418913 ] Shrinivas Joshi commented on MAPREDUCE-4400: As I said earlier, I did verify that this patch was working as expected. It does minimize synchronization over MR3809. This patch looks good to be committed to me. > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418699#comment-13418699 ] Tom White commented on MAPREDUCE-4400: -- Luke - yes, please do. > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418539#comment-13418539 ] Luke Lu commented on MAPREDUCE-4400: Yes. The speed up is more pronounced with outofband heartbeat, which has similar effect of MAPREDUCE-1906 (which is not in branch-1). MRv2 doesn't need this patch as it was addressed by MAPREDUCE-3809. Tom, can we file a separate jira to improve the change in trunk? Shrinivas, you're encouraged to review and +1 on the patch :) > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418476#comment-13418476 ] Shrinivas Joshi commented on MAPREDUCE-4400: Can I request a code review and commit of this patch so that it gets integrated in to MRv1 branch in the mean time it is ported to MRv2? Thanks. > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418473#comment-13418473 ] Shrinivas Joshi commented on MAPREDUCE-4400: @Luke: I have not tried with outofband heartbeat property. Do you expect this to show more perf gains along with your patch? > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417774#comment-13417774 ] Luke Lu commented on MAPREDUCE-4400: @Shinivas: have you tried this with mapreduce.tasktracker.outofband.heartbeat=true? (needs a cluster restart of course). > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415838#comment-13415838 ] Shrinivas Joshi commented on MAPREDUCE-4400: Hi Luke - In our experiments your patch did achieve the same effect as what MAPREDUCE-4381 was trying to in terms of performance. We noticed good performance gains on Mahout KMeans clustering workload (~ 4%). It would be nice if we can get the branch-1 version of your change reviewed and checked-in in the mean time. Thanks. > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409752#comment-13409752 ] Tom White commented on MAPREDUCE-4400: -- Maybe have a patch for trunk/branch-2 to bring the two into line then? I think it's good to minimize the number of differences where possible. > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409740#comment-13409740 ] Luke Lu commented on MAPREDUCE-4400: Thanks for the pointer to MAPREDUCE-3809, Tom. IMO, this patch is slightly better as it minimizes synchronization. > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409627#comment-13409627 ] Tom White commented on MAPREDUCE-4400: -- The same issue was fixed in trunk and branch-2 in MAPREDUCE-3809 in much the same way. How about backporting that code to branch-1? > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > Attachments: mapreduce-4400-branch-1.patch > > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4400) Fix performance regression for small jobs/workflows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406969#comment-13406969 ] Luke Lu commented on MAPREDUCE-4400: Thanks to John Poelman and Shreyas Subramanya of IBM BigInsights performance QA for noticing the issue and verifying my fix. > Fix performance regression for small jobs/workflows > --- > > Key: MAPREDUCE-4400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4400 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0, 1.0.3 >Reporter: Luke Lu >Assignee: Luke Lu > > There is a significant performance regression for small jobs/workflows (vs > 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. > PigMix has an average 40% regression against 0.20.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira