[
https://issues.apache.org/jira/browse/FLINK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060619#comment-18060619
]
Yi Zhang commented on FLINK-38845:
----------------------------------
Thanks [~martijnvisser]
I've looked into the logic of
testDispatcherRecoversAfterLosingAndRegainingLeadership and don't believe it's
necessary to set SHUTDOWN_ON_APPLICATION_FINISH to false in this case. The test
scenario—particularly around the stack trace you referenced—doesn't actually
trigger an application finish or cluster shutdown, so the default behavior
shouldn't interfere with the leadership recovery flow being tested.
That said, I haven't been able to reproduce the failure you're seeing yet. I’ll
continue investigating on my end and also review the recent changes in my
related PR to check whether anything might have inadvertently caused this issue.
> Add ArchivedApplicationStore to manage terminated applications
> --------------------------------------------------------------
>
> Key: FLINK-38845
> URL: https://issues.apache.org/jira/browse/FLINK-38845
> Project: Flink
> Issue Type: Sub-task
> Reporter: Yi Zhang
> Assignee: Yi Zhang
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.3.0
>
>
> Replace ExecutionGraphInfoStore with ArchivedApplicationStore to manage
> terminated applications (rather than individual jobs) and handle their
> expiration.
> With the introduction of applications, every job is now explicitly associated
> with an application. Previously, the {{ExecutionGraphInfoStore}} was used to
> manage and expire completed jobs individually. However, this approach no
> longer works well in the application-centric model.
> If we continue using {{ExecutionGraphInfoStore}} to expire individual
> completed jobs, it’s possible that only some jobs within an application get
> expired and removed, while others remain. This leads to an incomplete view of
> the application’s state, because parts of its job history become unavailable.
> To preserve application-level consistency and completeness, we introduce the
> {{{}ArchivedApplicationStore{}}}. Instead of expiring jobs independently,
> this new store manages entire applications (including all their jobs) as a
> whole, ensuring complete, consistent, and queryable application state until
> explicitly discarded.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)