[GitHub] drill pull request #597: DRILL-4905: Push down the LIMIT to the parquet read...

ppadma Mon, 26 Sep 2016 17:48:57 -0700

GitHub user ppadma opened a pull request:

    https://github.com/apache/drill/pull/597


    DRILL-4905: Push down the LIMIT to the parquet reader scan.

    For limit N query, where N is less than current default record batchSize 
(256K for all fixedlength, 32K otherwise), we still end up reading all 256K/32K 
rows from disk if rowGroup has that many rows. This  causes performance 
degradation especially when there are large number of columns. 
    This fix tries to address this problem by changing the record batchSize 
parquet record reader uses so we don't read more than what is needed.
    Also, added a sys option (store.parquet.record_batch_size) to be able to 
set record batch size.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ppadma/drill DRILL-4905

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/597.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #597
    
----
commit cd665ebdba11f8685ba446f5ec535c81ddd6edc7
Author: Padma Penumarthy <[email protected]>
Date:   2016-09-26T17:51:07Z

    DRILL-4905: Push down the LIMIT to the parquet reader scan to limit the 
numbers of records read

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #597: DRILL-4905: Push down the LIMIT to the parquet read...

Reply via email to