[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046080#comment-16046080
 ] 

Vineet Garg commented on HIVE-6348:
-----------------------------------

[~ashutoshc] Plan generated after subquery remove rule/de-correlation doesn't 
generate HiveSortLimit on HiveSortLimit e.g. for query {code:sql} select * from 
part where p_size IN (select p_size from part p where p.p_type <> part.p_name 
order by p_size) {code} plan just after decorrelation looks like
{code:sql}
HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], 
p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8])
  HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], 
p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8], 
BLOCK__OFFSET__INSIDE__FILE=[$9], INPUT__FILE__NAME=[$10], ROW__ID=[$11])
    LogicalJoin(condition=[AND(<>($1, $13), =($5, $12))], joinType=[inner])
      HiveTableScan(table=[[default.part]], table:alias=[part])
      HiveAggregate(group=[{0, 1}])
        HiveProject(p_size=[$0], p_type0=[$1])
          HiveProject(p_size=[$0], p_type0=[$13])
            HiveSortLimit(sort0=[$0], dir0=[ASC-nulls-first])
              HiveProject(p_size=[$5], p_partkey=[$0], p_name=[$1], 
p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size1=[$5], p_container=[$6], 
p_retailprice=[$7], p_comment=[$8], block__offset__inside__file=[$9], 
input__file__name=[$10], row__id=[$11], p_type0=[$4])
                LogicalFilter(condition=[IS NOT NULL($4)])
                  HiveTableScan(table=[[default.part]], table:alias=[p])
{code}
So you have one sort limit on right side of join.  One possible rule could be 
if top project doesn't project any column/expression from right side then 
remove HiveSortLimit from right side of join.

> Order by/Sort by in subquery
> ----------------------------
>
>                 Key: HIVE-6348
>                 URL: https://issues.apache.org/jira/browse/HIVE-6348
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Rui Li
>            Priority: Minor
>              Labels: sub-query
>         Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch, HIVE-6348.3.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to