[ 
https://issues.apache.org/jira/browse/SPARK-23503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419136#comment-16419136
 ] 

Efim Poberezkin commented on SPARK-23503:
-----------------------------------------

[~joseph.torres] Good day Jose. From what I've figured about Continuous 
Execution implementation an epoch coordinator is created per streaming query 
run and is able to store state. I've added tracking of last committed epoch and 
of waiting epochs to it to enforce epoch sequencing. Could you please correct 
me if my understanding of your implementation is not correct and take a look at 
my approach?

> continuous execution should sequence committed epochs
> -----------------------------------------------------
>
>                 Key: SPARK-23503
>                 URL: https://issues.apache.org/jira/browse/SPARK-23503
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Jose Torres
>            Priority: Major
>
> Currently, the EpochCoordinator doesn't enforce a commit order. If a message 
> for epoch n gets lost in the ether, and epoch n + 1 happens to be ready for 
> commit earlier, epoch n + 1 will be committed.
>  
> This is either incorrect or needlessly confusing, because it's not safe to 
> start from the end offset of epoch n + 1 until epoch n is committed. 
> EpochCoordinator should enforce this sequencing.
>  
> Note that this is not actually a problem right now, because the commit 
> messages go through the same RPC channel from the same place. But we 
> shouldn't implicitly bake this assumption in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to