[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-11-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207019#comment-14207019
 ] 

Lefty Leverenz commented on HIVE-7063:
--

Pinging [~rhbutani]:  No doc needed for this optimization?

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048451#comment-14048451
 ] 

Lefty Leverenz commented on HIVE-7063:
--

No user doc for this?

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-29 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047304#comment-14047304
 ] 

Harish Butani commented on HIVE-7063:
-

thanks [~ashutoshc]. Have uploaded a patch addressing the issues you raised

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047368#comment-14047368
 ] 

Ashutosh Chauhan commented on HIVE-7063:


+1

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047385#comment-14047385
 ] 

Hive QA commented on HIVE-7063:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653080/HIVE-7063.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5671 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/630/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/630/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-630/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653080

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch, HIVE-7063.3.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046696#comment-14046696
 ] 

Ashutosh Chauhan commented on HIVE-7063:


This is not going to optimize limit with rank like following :
{code}
select * from ( select p_mfgr, rank() over(..) from part) a limit 4;
{code}
Rather, this optimization is targeted for rank with filter predicates. It does 
seem like users are likely to write query with filter predicate given semantics 
of rank so this may not be an issue, but I think its good to note here so 
expectations are clear.

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-27 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046699#comment-14046699
 ] 

Harish Butani commented on HIVE-7063:
-

Yes, in your case we can optimize as though 'rank  5' was specified.  Though I 
cannot see a valid use case of writing a limit after a windowing expression, as 
you point out the more common case is a predicate on rank.

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046720#comment-14046720
 ] 

Ashutosh Chauhan commented on HIVE-7063:


Make sense. I left few comments on RB.

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-06-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038234#comment-14038234
 ] 

Hive QA commented on HIVE-7063:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651239/HIVE-7063.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5656 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-519/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651239

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case

2014-05-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998389#comment-13998389
 ] 

Gopal V commented on HIVE-7063:
---

This would be exceptionally useful - I have seen at least two implementations 
of TOPN UDAFs for this.

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani

 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)