Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209698964 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java --- @@ -21,33 +21,39 @@ import org.apache.spark.annotation.InterfaceStability; import org.apache.spark.sql.SaveMode; -import org.apache.spark.sql.sources.v2.writer.DataSourceWriter; +import org.apache.spark.sql.sources.v2.writer.BatchWriteSupport; import org.apache.spark.sql.types.StructType; /** * A mix-in interface for {@link DataSourceV2}. Data sources can implement this interface to - * provide data writing ability and save the data to the data source. + * provide data writing ability for batch processing. + * + * This interface is used when end users want to use a data source implementation directly, e.g. + * {@code Dataset.write.format(...).option(...).save()}. */ @InterfaceStability.Evolving -public interface WriteSupport extends DataSourceV2 { +public interface BatchWriteSupportProvider extends DataSourceV2 { /** - * Creates an optional {@link DataSourceWriter} to save the data to this data source. Data + * Creates an optional {@link BatchWriteSupport} to save the data to this data source. Data * sources can return None if there is no writing needed to be done according to the save mode. * * If this method fails (by throwing an exception), the action will fail and no Spark job will be * submitted. * - * @param writeUUID A unique string for the writing job. It's possible that there are many writing - * jobs running at the same time, and the returned {@link DataSourceWriter} can - * use this job id to distinguish itself from other jobs. + * @param queryId A unique string for the writing query. It's possible that there are many + * writing queries running at the same time, and the returned + * {@link BatchWriteSupport} can use this id to distinguish itself from others. * @param schema the schema of the data to be written. * @param mode the save mode which determines what to do when the data are already in this data * source, please refer to {@link SaveMode} for more details. * @param options the options for the returned data source writer, which is an immutable * case-insensitive string-to-string map. - * @return a writer to append data to this data source + * @return a write support to write data to this data source. */ - Optional<DataSourceWriter> createWriter( - String writeUUID, StructType schema, SaveMode mode, DataSourceOptions options); + Optional<BatchWriteSupport> createBatchWriteSupport( + String queryId, + StructType schema, + SaveMode mode, --- End diff -- @cloud-fan: this is a bad idea **because** it enables save modes other than append for DataSourceV2 without using the new logical plans. This leads to undefined behavior and is why we proposed standard logical plans in the first place. Using a data source implementation directly should only support appending and scanning anything more complex must require `DeleteSupport` (see #21308) or `TableCatalog` (see #21306) and the new logical plans. Otherwise, this will allow sources to expose behavior that we are trying to fix. I'm -1 on this.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org