[jira] [Commented] (HIVE-17868) Make queries in spark_local_queries.q have deterministic output

Andrew Sherman (JIRA) Fri, 20 Oct 2017 15:22:52 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-17868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213374#comment-16213374
 ]


Andrew Sherman commented on HIVE-17868:
---------------------------------------

Thanks [~xuefuz] for the suggestion.  I looked at  --SORT_QUERY_RESULTS and it 
seems it sorts the output after the query has run. 
So with a query like
{noformat}
select key, count(*) from src group by key limit 10
{noformat}
--SORT_QUERY_RESULTS will sort the output, but the results of the query are not 
sorted before the limit is applied, so this is not enough to make the query 
deterministic. But using 
{noformat}
select key, count(*) from src group by key order by key limit 10
{noformat}
is, I think deterministic.

The queries in spark_local_queries.q are very small and adding the 'order by' 
does not seem to make a significant difference to elapsed time.


> Make queries in spark_local_queries.q have deterministic output
> ---------------------------------------------------------------
>
>                 Key: HIVE-17868
>                 URL: https://issues.apache.org/jira/browse/HIVE-17868
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Andrew Sherman
>            Assignee: Andrew Sherman
>
> Add 'order by' to queries so that output is always the same



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17868) Make queries in spark_local_queries.q have deterministic output

Reply via email to