[jira] [Commented] (SPARK-25530) data source v2 API refactor (batch write)

ASF GitHub Bot (JIRA) Mon, 10 Dec 2018 13:51:26 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-25530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715617#comment-16715617
 ]


ASF GitHub Bot commented on SPARK-25530:
----------------------------------------

rdblue commented on a change in pull request #23208: [SPARK-25530][SQL] data 
source v2 API refactor (batch write)
URL: https://github.com/apache/spark/pull/23208#discussion_r240394058
 
 

 ##########
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java
 ##########
 @@ -25,7 +25,10 @@
  * The base interface for v2 data sources which don't have a real catalog. 
Implementations must
  * have a public, 0-arg constructor.
  * <p>
- * The major responsibility of this interface is to return a {@link Table} for 
read/write.
+ * The major responsibility of this interface is to return a {@link Table} for 
read/write. If you
+ * want to allow end-users to write data to non-existing tables via write APIs 
in `DataFrameWriter`
+ * with `SaveMode`, you must return a {@link Table} instance even if the table 
doesn't exist. The
+ * table schema can be empty in this case.
 
 Review comment:
   `SaveMode` is incompatible with the SPIP to standarize behavior that was 
voted on and accepted. The save mode in `DataFrameWriter` must be used to 
create v2 plans that have well-defined behavior and cannot be passed to 
implementations in the final version of the v2 read/write API.
   
   I see no reason to put off removing `SaveMode` from the API. If we remove it 
now, we will avoid having more versions of this API that are **fundamentally 
broken**. We will avoid more implementations that rely on it, not aware that it 
will be removed.
   
   To your point about whether it is safe: the only case where this is actually 
used is `SaveMode.Overwrite` and `SaveMode.Append`. To replace those, all that 
needs to happen is to define what kind of overwrite should happen here (dynamic 
or truncate).
   
   I can supply the logical plan and physical implementation in a follow-up PR 
because I already have all this written and waiting to go in. Or, I can add a 
PR to merge first if you'd like to have these changes depend on that 
implementation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> data source v2 API refactor (batch write)
> -----------------------------------------
>
>                 Key: SPARK-25530
>                 URL: https://issues.apache.org/jira/browse/SPARK-25530
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Wenchen Fan
>            Priority: Major
>
> Adjust the batch write API to match the read API after refactor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25530) data source v2 API refactor (batch write)

Reply via email to