[ 
https://issues.apache.org/jira/browse/SPARK-38078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

krishna updated SPARK-38078:
----------------------------
    Description: 
 I am struggling with a unique issue. I am not sure if my understanding is 
wrong or this is a bug with spark.
 
 #  I am reading a stream from events hub/kafka ( Extract)
 #  Pivoting and Aggregating the above dataframe ( Transformation). This is a 
WATERMARKED aggregation.
 #  writing the aggregation to Console/Delta table in APPEND  mode with a 
Trigger . 

However, the most recently published message to event hub is not writing to 
console/delta even after falling out of the watermark time. 
 
 My understanding is the event should be inserted to  the Delta table after 
Eventtime+Watermark.
 

Moreover, all the events in the memory stored must be flushed out to the sink 
irrespective of the watermark before stopping to mark a graceful shutdown .

 

Please advise.

  was:
 I am struggling with a unique issue. I am not sure if my understanding is 
wrong or this is a bug with spark.
 
 #  I am reading a stream from events hub/kafka ( Extract)
 #  Pivoting and Aggregating the above dataframe ( Transformation). This is a 
WATERMARKED aggregation.
 #  writing the aggregation to Console/Delta table in APPEND  mode with a 
Trigger . 

However, the most recently published message to event hub is not writing to 
console/delta even after falling out of the watermark time. 
 
 My understanding is the event should be inserted to  the Delta table after 
Eventtime+Watermark.
 
Please advise.


> Aggregation with Watermark in AppendMode is holding data beyong water mark 
> boundary.
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-38078
>                 URL: https://issues.apache.org/jira/browse/SPARK-38078
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.2.0
>            Reporter: krishna
>            Priority: Major
>
>  I am struggling with a unique issue. I am not sure if my understanding is 
> wrong or this is a bug with spark.
>  
>  #  I am reading a stream from events hub/kafka ( Extract)
>  #  Pivoting and Aggregating the above dataframe ( Transformation). This is a 
> WATERMARKED aggregation.
>  #  writing the aggregation to Console/Delta table in APPEND  mode with a 
> Trigger . 
> However, the most recently published message to event hub is not writing to 
> console/delta even after falling out of the watermark time. 
>  
>  My understanding is the event should be inserted to  the Delta table after 
> Eventtime+Watermark.
>  
> Moreover, all the events in the memory stored must be flushed out to the sink 
> irrespective of the watermark before stopping to mark a graceful shutdown .
>  
> Please advise.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to