[ 
https://issues.apache.org/jira/browse/DRILL-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172040#comment-16172040
 ] 

ASF GitHub Bot commented on DRILL-5795:
---------------------------------------

GitHub user dprofeta opened a pull request:

    https://github.com/apache/drill/pull/949

    DRILL-5795: Parquet Filter push down at rowgroup level

    Before this commit, the filter was pruning complete files. When a file
    is composed of multiple rowgroups, it was not able to prune one
    rowgroup from the file. Now, when the filter find that a rowgroup
    doesn't match it will be remove from the scan.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dprofeta/drill drill-5795

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #949
    
----
commit eed3395647b10d06edf86ba4378995e9fd8da83d
Author: Damien Profeta <damien.prof...@amadeus.com>
Date:   2017-09-15T18:01:58Z

    Parquet Filter push down now work at rowgroup level
    
    Before this commit, the filter was pruning complete files. When a file
    is composed of multiple rowgroups, it was not able to prune one
    rowgroup from the file. Now, when the filter find that a rowgroup
    doesn't match it will be remove from the scan.

----


> Filter pushdown for parquet handles multi rowgroup file
> -------------------------------------------------------
>
>                 Key: DRILL-5795
>                 URL: https://issues.apache.org/jira/browse/DRILL-5795
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: Damien Profeta
>
> DRILL-1950 implemented the filter pushdown for parquet file but only in the 
> case of one rowgroup per parquet file. In the case of multiple rowgroups per 
> files, it detects that the rowgroup can be pruned but then tell to the 
> drillbit to read the whole file which leads to performance issue.
> Having multiple rowgroup per file helps to handle partitioned dataset and 
> still read only the relevant subset of data without ending with more file 
> than really needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to