[jira] [Commented] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)

Jesus Camacho Rodriguez (JIRA) Wed, 30 Sep 2015 03:10:36 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936666#comment-14936666
 ]


Jesus Camacho Rodriguez commented on HIVE-11634:
------------------------------------------------

[~hsubramaniyan], the last version of the patch does not seem to be working 
properly in some cases.

Some remarks:
* Changes to groupby_cube1.q do not seem part of this patch?
* In pcs.q.out, query in line 666:
{noformat}
explain extended select a.ds, b.key from pcs_t1 a, pcs_t1 b where struct(a.ds, 
a.key, b.ds) in (struct('2000-04-08',1, '2000-04-09'), struct('2000-04-09',2, 
'2000-04-08'))
{noformat}
Additional predicate is not derived, and thus partition pruning is not 
happening: we read partitions '2000-04-08', '2000-04-09', and '2000-04-10'. Any 
idea why this is happening? Could you check that case?
* We still do not seem to be removing the predicates that are used for 
partition pruning properly from the Filter predicates e.g. pointlookup2.q.out 
or pointlookup3.q.out. I think this patch should take care of that too?

In addition, there is a case that was added to PointLookupOptimizer and this 
patch does not seem to cover. Observe the change in line 179 of 
pointlookup.q.out: we were prepending a new conjunction to the original 
predicate for non-partition columns if we were reducing the NDV in the IN 
clause. Do you think it would be easy to extend your patch to cover this case 
too?

Thanks

> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> ------------------------------------------------------------------
>
>                 Key: HIVE-11634
>                 URL: https://issues.apache.org/jira/browse/HIVE-11634
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>         Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, 
> HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, 
> HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, 
> HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, 
> HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, 
> HIVE-11634.96.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
> where key < 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are 
> present in the filter predicate where as we can prune  partition 
> (ds='2000-04-10'). 
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where  (struct(ds)) IN 
> (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  
> is used by partition pruner to prune the columns which otherwise will not be 
> pruned.
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)

Reply via email to