Re: [DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

2020-05-21 Thread Russell Spitzer
Another related issue for backwards compatibility, In Datasource.scala https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L415-L416 Will get triggered even when the class is a Valid DatasourceV2 but being used in a

Re: [DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

2020-05-20 Thread Russell Spitzer
I think those are fair concerns, I was mostly just updating tests for RC2 and adding in "append" everywhere Code like spark.sql(s"SELECT a, b from $ks.test1") .write .format("org.apache.spark.sql.cassandra") .options(Map("table" -> "test_insert1", "keyspace" -> ks)) .save() Now fails at

Re: [DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

2020-05-20 Thread Ryan Blue
The context on this is that it was confusing that the mode changed, which introduced different behaviors for the same user code when moving from v1 to v2. Burak pointed this out and I agree that it's weird that if your dependency changes from v1 to v2, your compiled Spark job starts appending

Re: [DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

2020-05-20 Thread Burak Yavuz
Hey Russell, Great catch on the documentation. It seems out of date. I honestly am against having different DataSources having different default SaveModes. Users will have no clue if a DataSource implementation is V1 or V2. It seems weird that the default value can change for something that I

[DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

2020-05-20 Thread Russell Spitzer
While the ScalaDocs for DataFrameWriter say /** * Specifies the behavior when data or table already exists. Options include: * * `SaveMode.Overwrite`: overwrite the existing data. * `SaveMode.Append`: append the data. * `SaveMode.Ignore`: ignore the operation (i.e. no-op). *