[ 
https://issues.apache.org/jira/browse/DRILL-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600757#comment-14600757
 ] 

Steven Phillips commented on DRILL-3333:
----------------------------------------

When I was working on this, I originally had planned to simply include 
DRILL-1950 to handle the partition pruning. But I discussed with a couple 
others, and we decided not to do it that way, but instead use the Partition 
Pruning rules, exposing the column information. The main reason for choosing 
this is that it will allow pruning when the filter expression contains any 
arbitrary drill function on the data. This is because we use the drill function 
interpreter to evaluate whether the partition can be pruned.

The filter pushdown code in DRILL-1950 will of course still be useful for 
pruning more general data distributions. The pruning here was really designed 
specifically to work along-side the CTAS-partitioning feature, which uses 
single values for the partition columns.

> Add support for auto-partitioning in parquet writer
> ---------------------------------------------------
>
>                 Key: DRILL-3333
>                 URL: https://issues.apache.org/jira/browse/DRILL-3333
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Steven Phillips
>            Assignee: Steven Phillips
>         Attachments: DRILL-3333.patch, DRILL-3333.patch, 
> DRILL-3333_2015-06-22_15:22:11.patch, DRILL-3333_2015-06-23_17:38:32.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to