The context on this is that it was confusing that the mode changed, which
introduced different behaviors for the same user code when moving from v1
to v2. Burak pointed this out and I agree that it's weird that if your
dependency changes from v1 to v2, your compiled Spark job starts appending
instead of erroring out when the table exists.

The work-around is to implement a new trait, SupportsCatalogOptions, that
allows you to extract a table identifier and catalog name from the options
in the DataFrameReader. That way, you can re-route to your catalog so that
Spark correctly uses a CreateTableAsSelect statement for ErrorIfExists
mode.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsCatalogOptions.java

On Wed, May 20, 2020 at 2:50 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

>
> While the ScalaDocs for DataFrameWriter say
>
> /**
>  * Specifies the behavior when data or table already exists. Options include:
>  * <ul>
>  * <li>`SaveMode.Overwrite`: overwrite the existing data.</li>
>  * <li>`SaveMode.Append`: append the data.</li>
>  * <li>`SaveMode.Ignore`: ignore the operation (i.e. no-op).</li>
>  * <li>`SaveMode.ErrorIfExists`: throw an exception at runtime.</li>
>  * </ul>
>  * <p>
>  * When writing to data source v1, the default option is `ErrorIfExists`. 
> When writing to data
>  * source v2, the default option is `Append`.
>  *
>  * @since 1.4.0
>  */
>
>
> As far as I can tell, using DataFrame writer with a TableProviding
> DataSource V2 will still default to ErrorIfExists which breaks existing
> code since DSV2 cannot support ErrorIfExists mode. I noticed in the history
> of DataframeWriter there were versions which differentiated between DSV2
> and DSV1 and set the mode accordingly but this seems to no longer be the
> case. Was this intentional? I feel like if we could
> have the default be based on the Source then upgrading code from DSV1 ->
> DSV2 would be much easier for users.
>
> I'm currently testing this on RC2
>
>
> Any thoughts?
>
> Thanks for your time as usual,
> Russ
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to