[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010001#comment-13010001
 ] 

Ning Zhang commented on HIVE-2050:
----------------------------------

Note that this patch implements a simple API that passes a list of partition 
names rather than a range of partition names. My performance testing indicates 
that bottleneck is not in the JDO query itself. The JDO queries that getting 
the list of all MPartitions takes about 5 secs for a list of 20k partitions. 
However converting these 20k MPartitions to Partitions took about 3 mins. 
Committing the transaction took another 3 mins. 

Note that converting MPartitions to Partitions and committing transactions are 
common operations. Even though we use JDO pushdown (HIVE-2048) or use range 
queries, these costs are still there. We need to optimize these costs away in 
the next step. 

> batch processing partition pruning process
> ------------------------------------------
>
>                 Key: HIVE-2050
>                 URL: https://issues.apache.org/jira/browse/HIVE-2050
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to