GitHub user c-horn opened a pull request: https://github.com/apache/spark/pull/21676
[SPARK-24699][SQL][WIP] Watermark / Append mode should work with Trigger.Once ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-24699 Structured streaming using `Trigger.Once` does not persist watermark state between batches, causing streams to never yield output. I will attach some scripts that reproduce the issue in the Jira issue. It seems like the microbatcher only calculates the watermark off of the previous batch's input and emits new aggs based off of that timestamp. I believe the issue here is that the previous batch state is not persisted to the checkpoint, and therefore cannot be used when the stream is started again with `Trigger.Once`. I will investigate ways of fixing this but I am definitely interested in input from anyone who worked on SS. ## How was this patch tested? Failing unit test provided. You can merge this pull request into a Git repository by running: $ git pull https://github.com/c-horn/spark SPARK-24699 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21676 ---- commit 1b42cc4a449248da65402a6ea2112c55a3bb8501 Author: Chris Horn <chorn4033@...> Date: 2018-06-29T22:54:45Z a failing test case ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org