[
https://issues.apache.org/jira/browse/DRILL-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161021#comment-17161021
]
ASF GitHub Bot commented on DRILL-7763:
---------------------------------------
vvysotskyi commented on pull request #2092:
URL: https://github.com/apache/drill/pull/2092#issuecomment-660886165
@cgivre, there already was a similar functionality for Parquet:
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java#L53.
Please take a look at
[`AbstractGroupScanWithMetadata.applyLimit()`](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java#L453)
method - it contains required logic for pruning files. With your changes, you
are overriding it, so it breaks this functionality. To coexist with this
feature, please take a look at the implementation of this method for the
parquet group scan, move common logic to AbstractGroupScanWithMetadata and use
it in the easy group scan.
The behavior with the metastore usage is the following: if for example, we
have 10 files with 100 records, and query with limit 5 is submitted, only the
single file would be left in the group scan, since it contains all required
records.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add Limit Pushdown to File Based Storage Plugins
> ------------------------------------------------
>
> Key: DRILL-7763
> URL: https://issues.apache.org/jira/browse/DRILL-7763
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.17.0
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.18.0
>
>
> As currently implemented, when querying a file, Drill will read the entire
> file even if a limit is specified in the query. This PR does a few things:
> # Refactors the EasyGroupScan, EasySubScan, and EasyFormatConfig to allow
> the option of pushing down limits.
> # Applies this to all the EVF based format plugins which are: LogRegex,
> PCAP, SPSS, Esri, Excel and Text (CSV).
> Due to JSON's fluid schema, it would be unwise to adopt the limit pushdown as
> it could result in very inconsistent schemata.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)