[ https://issues.apache.org/jira/browse/SPARK-37933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-37933: ------------------------------------ Assignee: Jackey Lee (was: Apache Spark) > Limit push down for parquet datasource v2 > ----------------------------------------- > > Key: SPARK-37933 > URL: https://issues.apache.org/jira/browse/SPARK-37933 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Jackey Lee > Assignee: Jackey Lee > Priority: Major > Fix For: 3.3.0 > > > Based on SPARK-37020, we can support limit push down to parquet datasource v2 > reader. It can stop scanning parquet early, and reduce network and disk IO. > Current limit parse status for parquet > {code:java} > == Parsed Logical Plan == > GlobalLimit 10 > +- LocalLimit 10 > +- RelationV2[a#0, b#1] parquet file:/datasources.db/test_push_down > == Analyzed Logical Plan == > a: int, b: int > GlobalLimit 10 > +- LocalLimit 10 > +- RelationV2[a#0, b#1] parquet file:/datasources.db/test_push_down > == Optimized Logical Plan == > GlobalLimit 10 > +- LocalLimit 10 > +- RelationV2[a#0, b#1] parquet file:/datasources.db/test_push_down > == Physical Plan == > CollectLimit 10 > +- *(1) ColumnarToRow > +- BatchScan[a#0, b#1] ParquetScan DataFilters: [], Format: parquet, > Location: InMemoryFileIndex(1 > paths)[file:/datasources.db/test_push_down/par..., PartitionFilters: [], > PushedAggregation: [], PushedFilters: [], PushedGroupBy: [], ReadSchema: > struct<a:int,b:int>, PushedFilters: [], PushedAggregation: [], PushedGroupBy: > [] RuntimeFilters: [] {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org