[ https://issues.apache.org/jira/browse/FLINK-23925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403746#comment-17403746 ]
Robert Metzger commented on FLINK-23925: ---------------------------------------- It seems that "Runtime / Coordination" and "Runtime / Web Frontend" are the two main components where tickets containing the string "history server" are located: https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20text%20~%20%22history%20server%22%20ORDER%20BY%20component%20ASC (with "Runtime / Coordination" actually being the more popular one). I would personally not create a new component, because this is a small sub-component, closely connected to the rest of the web frontend infrastructure. But I'm happy to create a new component if you have a different opinion. > HistoryServer: Archiving job with more than one attempt fails > ------------------------------------------------------------- > > Key: FLINK-23925 > URL: https://issues.apache.org/jira/browse/FLINK-23925 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.14.0, 1.13.2 > Reporter: Robert Metzger > Priority: Major > > Error: > {code} > 2021-08-23 16:26:01,953 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager > 00000000000000000000000000000...@akka.tcp://flink@localhost:6123/user/rpc/jobmanager_2 > for job ca9f6a073d311d60f457a1c4243e7dc3 from the resource manager. > 2021-08-23 16:26:02,137 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Could not > archive completed job > CarTopSpeedWindowingExample(ca9f6a073d311d60f457a1c4243e7dc3) to the history > server. > java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: > attempt does not exist > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > [?:1.8.0_252] > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643) > [?:1.8.0_252] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_252] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_252] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] > Caused by: java.lang.IllegalArgumentException: attempt does not exist > at > org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex.getPriorExecutionAttempt(ArchivedExecutionVertex.java:109) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > at > org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex.getPriorExecutionAttempt(ArchivedExecutionVertex.java:31) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > at > org.apache.flink.runtime.rest.handler.job.SubtaskExecutionAttemptDetailsHandler.archiveJsonWithPath(SubtaskExecutionAttemptDetailsHandler.java:140) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > at > org.apache.flink.runtime.webmonitor.history.OnlyExecutionGraphJsonArchivist.archiveJsonWithPath(OnlyExecutionGraphJsonArchivist.java:51) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > at > org.apache.flink.runtime.webmonitor.WebMonitorEndpoint.archiveJsonWithPath(WebMonitorEndpoint.java:1031) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > at > org.apache.flink.runtime.dispatcher.JsonResponseHistoryServerArchivist.lambda$archiveExecutionGraph$0(JsonResponseHistoryServerArchivist.java:61) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > at > org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:49) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > ~[?:1.8.0_252] > ... 3 more > {code} > Steps to reproduce: > - start a Flink reactive mode job manager: > {code} > mkdir usrlib > cp ./examples/streaming/TopSpeedWindowing.jar usrlib/ > # Submit Job in Reactive Mode > ./bin/standalone-job.sh start -Dscheduler-mode=reactive > -Dexecution.checkpointing.interval="10s" -j > org.apache.flink.streaming.examples.windowing.TopSpeedWindowing > # Start first TaskManager > ./bin/taskmanager.sh start > {code} > - Add another taskmanager to trigger a restart > - Cancel the job > See the failure in the jobmanager logs. -- This message was sent by Atlassian Jira (v8.3.4#803005)