[ https://issues.apache.org/jira/browse/SPARK-23503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419136#comment-16419136 ]
Efim Poberezkin commented on SPARK-23503: ----------------------------------------- [~joseph.torres] Good day Jose. From what I've figured about Continuous Execution implementation an epoch coordinator is created per streaming query run and is able to store state. I've added tracking of last committed epoch and of waiting epochs to it to enforce epoch sequencing. Could you please correct me if my understanding of your implementation is not correct and take a look at my approach? > continuous execution should sequence committed epochs > ----------------------------------------------------- > > Key: SPARK-23503 > URL: https://issues.apache.org/jira/browse/SPARK-23503 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming > Affects Versions: 2.4.0 > Reporter: Jose Torres > Priority: Major > > Currently, the EpochCoordinator doesn't enforce a commit order. If a message > for epoch n gets lost in the ether, and epoch n + 1 happens to be ready for > commit earlier, epoch n + 1 will be committed. > > This is either incorrect or needlessly confusing, because it's not safe to > start from the end offset of epoch n + 1 until epoch n is committed. > EpochCoordinator should enforce this sequencing. > > Note that this is not actually a problem right now, because the commit > messages go through the same RPC channel from the same place. But we > shouldn't implicitly bake this assumption in. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org