[ https://issues.apache.org/jira/browse/SPARK-25530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717830#comment-16717830 ]
ASF GitHub Bot commented on SPARK-25530: ---------------------------------------- rdblue commented on a change in pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write) URL: https://github.com/apache/spark/pull/23208#discussion_r240756984 ########## File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java ########## @@ -25,7 +25,10 @@ * The base interface for v2 data sources which don't have a real catalog. Implementations must * have a public, 0-arg constructor. * <p> - * The major responsibility of this interface is to return a {@link Table} for read/write. + * The major responsibility of this interface is to return a {@link Table} for read/write. If you + * want to allow end-users to write data to non-existing tables via write APIs in `DataFrameWriter` + * with `SaveMode`, you must return a {@link Table} instance even if the table doesn't exist. The + * table schema can be empty in this case. Review comment: I suggest we use the v1 file source as the basis for the behavior for v2. You can see an implementation of that behavior in my other comment. If the table exists, overwrite is a dynamic partition overwrite, append is an append, and ignore does nothing. If the table doesn't exist, then the operation is a CTAS. (Note that we can also check properties to correctly mirror the behavior for static overwrite.) Your concern is addressed by not using the `Append` plan when the file source would have needed to create the table. The critical difference is that this behavior is all implemented in Spark instead of passing `SaveMode` to the source. If you pass `SaveMode` to the source, Spark can't guarantee that it is consistent across sources. We are trying to fix inconsistent behavior in v2. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > data source v2 API refactor (batch write) > ----------------------------------------- > > Key: SPARK-25530 > URL: https://issues.apache.org/jira/browse/SPARK-25530 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.0.0 > Reporter: Wenchen Fan > Priority: Major > > Adjust the batch write API to match the read API after refactor -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org