[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-29 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012635#comment-13012635
 ] 

Namit Jain commented on HIVE-2050:
--

+1

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
> HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-29 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012543#comment-13012543
 ] 

Ning Zhang commented on HIVE-2050:
--

updated the review board. Also my tests passed.

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
> HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012339#comment-13012339
 ] 

Namit Jain commented on HIVE-2050:
--

can you update review board also ?

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
> HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012283#comment-13012283
 ] 

Namit Jain commented on HIVE-2050:
--

The test pcr.q is failing - can you take a look ?
The results look wrong.

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012167#comment-13012167
 ] 

Namit Jain commented on HIVE-2050:
--

+1

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012149#comment-13012149
 ] 

Namit Jain commented on HIVE-2050:
--

Comments posted on the review board

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.2.patch, HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-25 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011518#comment-13011518
 ] 

Namit Jain commented on HIVE-2050:
--

Based on an offline review, this may increase memory, we need to return the
partition names periodically to put a memory bound

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010001#comment-13010001
 ] 

Ning Zhang commented on HIVE-2050:
--

Note that this patch implements a simple API that passes a list of partition 
names rather than a range of partition names. My performance testing indicates 
that bottleneck is not in the JDO query itself. The JDO queries that getting 
the list of all MPartitions takes about 5 secs for a list of 20k partitions. 
However converting these 20k MPartitions to Partitions took about 3 mins. 
Committing the transaction took another 3 mins. 

Note that converting MPartitions to Partitions and committing transactions are 
common operations. Even though we use JDO pushdown (HIVE-2048) or use range 
queries, these costs are still there. We need to optimize these costs away in 
the next step. 

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009993#comment-13009993
 ] 

Ning Zhang commented on HIVE-2050:
--

passed all unit tests.

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira