[ https://issues.apache.org/jira/browse/SPARK-40775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-40775: ----------------------------------- Assignee: Adam Binford > V2 file scans have duplicative descriptions > ------------------------------------------- > > Key: SPARK-40775 > URL: https://issues.apache.org/jira/browse/SPARK-40775 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0 > Reporter: Adam Binford > Assignee: Adam Binford > Priority: Major > > V2 file scans have duplication in the description. This is because FileScan > uses the metadata to create the description, but each file type overrides > metadata and the description adding the same metadata. > Example from a parquet agg pushdown explain: > {{ *+- BatchScan parquet file:/...[min(_3)#814, max(_3)#815, min(_1)#816, > max(_1)#817, count(*)#818L, count(_1)#819L, count(_2)#820L, count(_3)#821L] > ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 > paths)[file:/..., PartitionFilters: [], PushedAggregation: [MIN(_3), MAX(_3), > MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedFilters: > [], PushedGroupBy: [], ReadSchema: > struct<min(_3):int,max(_3):int,min(_1):int,max(_1):int,count(*):bigint,count(_1):bigint,count(_2)..., > PushedFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), > COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedGroupBy: [] RuntimeFilters: > []*}} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org