[ 
https://issues.apache.org/jira/browse/SPARK-38078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709925#comment-17709925
 ] 

Jungtaek Lim commented on SPARK-38078:
--------------------------------------

If you use time window, the boundary Spark will hold is the end of window, not 
event time of the individual event. It's unclear reporter is reporting the same.

> Aggregation with Watermark in AppendMode is holding data beyong water mark 
> boundary.
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-38078
>                 URL: https://issues.apache.org/jira/browse/SPARK-38078
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.2.0
>            Reporter: krishna
>            Priority: Major
>
>  I am struggling with a unique issue. I am not sure if my understanding is 
> wrong or this is a bug with spark.
>  
>  #  I am reading a stream from events hub/kafka ( Extract)
>  #  Pivoting and Aggregating the above dataframe ( Transformation). This is a 
> WATERMARKED aggregation.
>  #  writing the aggregation to Console/Delta table in APPEND  mode with a 
> Trigger . 
> However, the most recently published message to event hub is not writing to 
> console/delta even after falling out of the watermark time. 
>  
>  My understanding is the event should be inserted to  the Delta table after 
> Eventtime+Watermark.
>  
> Moreover, all the events in the memory stored must be flushed out to the sink 
> irrespective of the watermark before stopping to mark a graceful shutdown .
>  
> Please advise.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to