Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21571#discussion_r195875712
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala 
---
    @@ -322,6 +338,45 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
         this
       }
     
    +  /**
    +   * :: Experimental ::
    +   *
    +   * (Scala-specific) Sets the output of the streaming query to be 
processed using the provided
    +   * function. This is supported only the in the micro-batch execution 
modes (that is, when the
    +   * trigger is not continuous). In every micro-batch, the provided 
function will be called in
    +   * every micro-batch with (i) the output rows as a Dataset and (ii) the 
batch identifier.
    +   * The batchId can be used deduplicate and transactionally write the 
output
    +   * (that is, the provided Dataset) to external systems. The output 
Dataset is guaranteed
    +   * to exactly same for the same batchId (assuming all operations are 
deterministic in the query).
    +   *
    +   * @since 2.4.0
    +   */
    +  @InterfaceStability.Evolving
    +  def foreachBatch(function: (Dataset[T], Long) => Unit): 
DataStreamWriter[T] = {
    --- End diff --
    
    it's unclear that only can one of `foreachBatch` and `foreach` be set. 
Reading from the doc, the user may think he can set both of them. Maybe we 
should disallow this case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to