[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

rdblue Thu, 06 Dec 2018 11:13:34 -0800

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23208#discussion_r239578059
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java ---
    @@ -25,7 +25,10 @@
      * The base interface for v2 data sources which don't have a real catalog. 
Implementations must
      * have a public, 0-arg constructor.
      * <p>
    - * The major responsibility of this interface is to return a {@link Table} 
for read/write.
    + * The major responsibility of this interface is to return a {@link Table} 
for read/write. If you
    + * want to allow end-users to write data to non-existing tables via write 
APIs in `DataFrameWriter`
    + * with `SaveMode`, you must return a {@link Table} instance even if the 
table doesn't exist. The
    + * table schema can be empty in this case.
    --- End diff --
    
    @jose-torres, create on write is done by CTAS. It should not be left up to 
the source whether to fail or create.
    
    I think the confusion here is that this is a degenerate case where Spark 
has no ability to interact with the table's metadata. Spark must assume that it 
exists because the caller is writing to it.
    
    The caller is indicating that a table exists, is identified by some 
configuration, and that a specific implementation can be used to write to it. 
That's what happens today when source implementations are directly specified.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

Reply via email to