GitHub user c-horn opened a pull request:

    https://github.com/apache/spark/pull/21676

    [SPARK-24699][SQL][WIP] Watermark / Append mode should work with 
Trigger.Once

    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-24699
    
    Structured streaming using `Trigger.Once` does not persist watermark state 
between batches, causing streams to never yield output. I will attach some 
scripts that reproduce the issue in the Jira issue.
    
    It seems like the microbatcher only calculates the watermark off of the 
previous batch's input and emits new aggs based off of that timestamp. I 
believe the issue here is that the previous batch state is not persisted to the 
checkpoint, and therefore cannot be used when the stream is started again with 
`Trigger.Once`.
    
    I will investigate ways of fixing this but I am definitely interested in 
input from anyone who worked on SS.
    
    ## How was this patch tested?
    
    Failing unit test provided.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/c-horn/spark SPARK-24699

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21676
    
----
commit 1b42cc4a449248da65402a6ea2112c55a3bb8501
Author: Chris Horn <chorn4033@...>
Date:   2018-06-29T22:54:45Z

    a failing test case

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to