[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207019#comment-14207019 ] Lefty Leverenz commented on HIVE-7063: -- Pinging [~rhbutani]: No doc needed for this optimization? Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048451#comment-14048451 ] Lefty Leverenz commented on HIVE-7063: -- No user doc for this? Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047304#comment-14047304 ] Harish Butani commented on HIVE-7063: - thanks [~ashutoshc]. Have uploaded a patch addressing the issues you raised Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047368#comment-14047368 ] Ashutosh Chauhan commented on HIVE-7063: +1 Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047385#comment-14047385 ] Hive QA commented on HIVE-7063: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12653080/HIVE-7063.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5671 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/630/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/630/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-630/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12653080 Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046696#comment-14046696 ] Ashutosh Chauhan commented on HIVE-7063: This is not going to optimize limit with rank like following : {code} select * from ( select p_mfgr, rank() over(..) from part) a limit 4; {code} Rather, this optimization is targeted for rank with filter predicates. It does seem like users are likely to write query with filter predicate given semantics of rank so this may not be an issue, but I think its good to note here so expectations are clear. Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046699#comment-14046699 ] Harish Butani commented on HIVE-7063: - Yes, in your case we can optimize as though 'rank 5' was specified. Though I cannot see a valid use case of writing a limit after a windowing expression, as you point out the more common case is a predicate on rank. Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046720#comment-14046720 ] Ashutosh Chauhan commented on HIVE-7063: Make sense. I left few comments on RB. Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038234#comment-14038234 ] Hive QA commented on HIVE-7063: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651239/HIVE-7063.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5656 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-519/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651239 Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998389#comment-13998389 ] Gopal V commented on HIVE-7063: --- This would be exceptionally useful - I have seen at least two implementations of TOPN UDAFs for this. Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)