[jira] [Commented] (HIVE-7659) Unnecessary sort in query plan

2014-08-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096774#comment-14096774
 ] 

Rui Li commented on HIVE-7659:
--

After some research, I found the unnecessary sort is mainly introduced when we 
generate GBY operator. This patch ignores the sort order in RS if the partition 
keys, sorting keys and grouping keys are the same. Otherwise, e.g. in case of 
DISTINCT or data skew, we apply the sort shuffle according to the sort order so 
that the query can produce correct results.

 Unnecessary sort in query plan
 --

 Key: HIVE-7659
 URL: https://issues.apache.org/jira/browse/HIVE-7659
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7659-spark.patch


 For hive on spark.
 Currently we rely on the sort order in RS to decide whether we need a 
 sortByKey transformation. However a simple group by query will also have the 
 sort order set to '+'.
 Consider the query: select key from table group by key. The RS in the map 
 work will have sort order set to '+', thus requiring a sortByKey shuffle.
 To avoid the unnecessary sort, we should either use another way to decide if 
 there has to be a sort shuffle, or we should set the sort order only when 
 sort is really needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7659) Unnecessary sort in query plan

2014-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096897#comment-14096897
 ] 

Hive QA commented on HIVE-7659:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12661682/HIVE-7659-spark.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 5844 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testConnection
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testProxyAuth
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/41/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/41/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-41/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12661682

 Unnecessary sort in query plan
 --

 Key: HIVE-7659
 URL: https://issues.apache.org/jira/browse/HIVE-7659
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7659-spark.patch


 For hive on spark.
 Currently we rely on the sort order in RS to decide whether we need a 
 sortByKey transformation. However a simple group by query will also have the 
 sort order set to '+'.
 Consider the query: select key from table group by key. The RS in the map 
 work will have sort order set to '+', thus requiring a sortByKey shuffle.
 To avoid the unnecessary sort, we should either use another way to decide if 
 there has to be a sort shuffle, or we should set the sort order only when 
 sort is really needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7659) Unnecessary sort in query plan [Spark Branch]

2014-08-14 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098077#comment-14098077
 ] 

Brock Noland commented on HIVE-7659:


+1 pending tests

 Unnecessary sort in query plan [Spark Branch]
 -

 Key: HIVE-7659
 URL: https://issues.apache.org/jira/browse/HIVE-7659
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7659-spark.patch, HIVE-7659.2-spark.patch


 For hive on spark.
 Currently we rely on the sort order in RS to decide whether we need a 
 sortByKey transformation. However a simple group by query will also have the 
 sort order set to '+'.
 Consider the query: select key from table group by key. The RS in the map 
 work will have sort order set to '+', thus requiring a sortByKey shuffle.
 To avoid the unnecessary sort, we should either use another way to decide if 
 there has to be a sort shuffle, or we should set the sort order only when 
 sort is really needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7659) Unnecessary sort in query plan [Spark Branch]

2014-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098146#comment-14098146
 ] 

Hive QA commented on HIVE-7659:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12661974/HIVE-7659.2-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5894 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/43/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/43/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-43/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12661974

 Unnecessary sort in query plan [Spark Branch]
 -

 Key: HIVE-7659
 URL: https://issues.apache.org/jira/browse/HIVE-7659
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7659-spark.patch, HIVE-7659.2-spark.patch


 For hive on spark.
 Currently we rely on the sort order in RS to decide whether we need a 
 sortByKey transformation. However a simple group by query will also have the 
 sort order set to '+'.
 Consider the query: select key from table group by key. The RS in the map 
 work will have sort order set to '+', thus requiring a sortByKey shuffle.
 To avoid the unnecessary sort, we should either use another way to decide if 
 there has to be a sort shuffle, or we should set the sort order only when 
 sort is really needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)