gengliangwang commented on issue #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#issuecomment-497913250 @dongjoon-hyun I think Spark needs to read the actual physical schema for getting the exact names and data types for pushing down filters. If the names or data types are not matched when performing filter push down, it might cause regression. @rdblue has explained this in https://github.com/apache/spark/pull/21696#discussion_r199979463 . With the current DSV2 design, I think we have to implement Parquet V2 in this way. Suggestions are welcome.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org