[GitHub] [spark] peter-toth opened a new pull request, #38885: [SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

GitBox Fri, 02 Dec 2022 05:27:42 -0800


peter-toth opened a new pull request, #38885:
URL: https://github.com/apache/spark/pull/38885


   ### What changes were proposed in this pull request?
   Currently the config `spark.sql.sources.useV1SourceList` doesn't work with 
V2 file tables in session catalog, it is always the V1 path that is used. This 
PR enables V2 file tables in read paths via session catalog and fixes a few 
issues where V2 behaves differently to V1.
   
   ### Why are the changes needed?
   It would be good if we could use the already available V2 file source 
implmenentaions with the session catalog. We ran into a few problems with V2 
optimization paths that want to fix in the future. But, currently Spark don't 
have built-in catalog support for any of the V2 file table implementations. As 
a first step this PR enables V2 for the select query plans only. All commands 
and `InsertIntoStatement` remain using V1 implementations.
   
   The PR also contains some test changes:
   - `SQLQuerySuite` is splitted into V1 and V2 versions.
   - V2 versions of `OrcPartitionDiscoverySuite` and 
`ParquetPartitionDiscoverySuite` are modified to behave like the V1 versions 
do. Basically the order of output columns changed in the edge case when 
partitioning and data columns overlap.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, see order of output columns when partitioning and data columns overlap.
   
   ### How was this patch tested?
   Existing and new UTs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] peter-toth opened a new pull request, #38885: [SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

Reply via email to