cloud-fan commented on PR #38640:
URL: https://github.com/apache/spark/pull/38640#issuecomment-1325947062

   > Actually, I'm happy to work on making parquet v2 tables available in a 
separate ticket/PR if you can give my some guidance.
   
   I tried to do it long time ago but failed as there are some design issues. 
We need to fully understand the use cases of `CREATE TABLE ... USING v1Source` 
and see how to make it work for v2 sources:
   1. Just a name mapping, so that people can use table name instead of 
providing all the data source information every time. JDBC data source is a 
good example of it.
   2. Schema cache. The data source may have a large cost to infer the data 
schema and need the Spark catalog to cache it. File source is a good example 
here.
   
   We also need to think about the semantic of `ALTER TABLE`, `REFRESH TABLE`, 
`df.write.mode("overwrite").saveAsTable`, etc.
   
   Some code references. For v1 source, we have a rule `FindDataSourceTable` to 
resolve table with v1 source. For v2 source, we probably should have a similar 
rule to resolve v2 source to `TableProvider`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to