[ https://issues.apache.org/jira/browse/SPARK-38078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
krishna updated SPARK-38078: ---------------------------- Description: I am struggling with a unique issue. I am not sure if my understanding is wrong or this is a bug with spark. # I am reading a stream from events hub/kafka ( Extract) # Pivoting and Aggregating the above dataframe ( Transformation). This is a WATERMARKED aggregation. # writing the aggregation to Console/Delta table in APPEND mode with a Trigger . However, the most recently published message to event hub is not writing to console/delta even after falling out of the watermark time. My understanding is the event should be inserted to the Delta table after Eventtime+Watermark. Moreover, all the events in the memory stored must be flushed out to the sink irrespective of the watermark before stopping to mark a graceful shutdown . Please advise. was: I am struggling with a unique issue. I am not sure if my understanding is wrong or this is a bug with spark. # I am reading a stream from events hub/kafka ( Extract) # Pivoting and Aggregating the above dataframe ( Transformation). This is a WATERMARKED aggregation. # writing the aggregation to Console/Delta table in APPEND mode with a Trigger . However, the most recently published message to event hub is not writing to console/delta even after falling out of the watermark time. My understanding is the event should be inserted to the Delta table after Eventtime+Watermark. Please advise. > Aggregation with Watermark in AppendMode is holding data beyong water mark > boundary. > ------------------------------------------------------------------------------------ > > Key: SPARK-38078 > URL: https://issues.apache.org/jira/browse/SPARK-38078 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 3.2.0 > Reporter: krishna > Priority: Major > > I am struggling with a unique issue. I am not sure if my understanding is > wrong or this is a bug with spark. > > # I am reading a stream from events hub/kafka ( Extract) > # Pivoting and Aggregating the above dataframe ( Transformation). This is a > WATERMARKED aggregation. > # writing the aggregation to Console/Delta table in APPEND mode with a > Trigger . > However, the most recently published message to event hub is not writing to > console/delta even after falling out of the watermark time. > > My understanding is the event should be inserted to the Delta table after > Eventtime+Watermark. > > Moreover, all the events in the memory stored must be flushed out to the sink > irrespective of the watermark before stopping to mark a graceful shutdown . > > Please advise. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org