zsxwing commented on a change in pull request #25407: [SPARK-28650][SS][DOC] 
Correct explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407#discussion_r315442383
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala
 ##########
 @@ -50,14 +50,10 @@ import org.apache.spark.annotation.Evolving
  *
  * Important points to note:
  * <ul>
- * <li>The `partitionId` and `epochId` can be used to deduplicate generated 
data when failures
- *     cause reprocessing of some input data. This depends on the execution 
mode of the query. If
- *     the streaming query is being executed in the micro-batch mode, then 
every partition
- *     represented by a unique tuple (partitionId, epochId) is guaranteed to 
have the same data.
- *     Hence, (partitionId, epochId) can be used to deduplicate and/or 
transactionally commit data
- *     and achieve exactly-once guarantees. However, if the streaming query is 
being executed in the
- *     continuous mode, then this guarantee does not hold and therefore should 
not be used for
- *     deduplication.
+ * <li>Spark doesn't guarantee same output for (partitionId, epochId) on 
failure, so deduplication
 
 Review comment:
   ditto

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to