[jira] [Commented] (SPARK-25530) data source v2 API refactor (batch write)

ASF GitHub Bot (JIRA) Tue, 11 Dec 2018 11:25:16 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-25530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717830#comment-16717830
 ]


ASF GitHub Bot commented on SPARK-25530:
----------------------------------------

rdblue commented on a change in pull request #23208: [SPARK-25530][SQL] data 
source v2 API refactor (batch write)
URL: https://github.com/apache/spark/pull/23208#discussion_r240756984
 
 

 ##########
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java
 ##########
 @@ -25,7 +25,10 @@
  * The base interface for v2 data sources which don't have a real catalog. 
Implementations must
  * have a public, 0-arg constructor.
  * <p>
- * The major responsibility of this interface is to return a {@link Table} for 
read/write.
+ * The major responsibility of this interface is to return a {@link Table} for 
read/write. If you
+ * want to allow end-users to write data to non-existing tables via write APIs 
in `DataFrameWriter`
+ * with `SaveMode`, you must return a {@link Table} instance even if the table 
doesn't exist. The
+ * table schema can be empty in this case.
 
 Review comment:
   I suggest we use the v1 file source as the basis for the behavior for v2. 
You can see an implementation of that behavior in my other comment. If the 
table exists, overwrite is a dynamic partition overwrite, append is an append, 
and ignore does nothing. If the table doesn't exist, then the operation is a 
CTAS. (Note that we can also check properties to correctly mirror the behavior 
for static overwrite.)
   
   Your concern is addressed by not using the `Append` plan when the file 
source would have needed to create the table.
   
   The critical difference is that this behavior is all implemented in Spark 
instead of passing `SaveMode` to the source. If you pass `SaveMode` to the 
source, Spark can't guarantee that it is consistent across sources. We are 
trying to fix inconsistent behavior in v2.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> data source v2 API refactor (batch write)
> -----------------------------------------
>
>                 Key: SPARK-25530
>                 URL: https://issues.apache.org/jira/browse/SPARK-25530
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Wenchen Fan
>            Priority: Major
>
> Adjust the batch write API to match the read API after refactor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25530) data source v2 API refactor (batch write)

Reply via email to