[jira] [Closed] (FLINK-24053) stop with savepoint timeout

Chesnay Schepler (Jira) Mon, 30 Aug 2021 05:00:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-24053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chesnay Schepler closed FLINK-24053.
------------------------------------
    Resolution: Not A Problem

This is not a bug.

When a stop-with-savepoint operation is triggered then the clusters waits with 
the shutdown until the result from the savepoint operation (i.e., the final 
path the Savepoint was written to) was consumed through the [REST 
API|https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/rest_api/#jobs-jobid-savepoints-triggerid].
This is to ensure that users have some time-frame in which they are guaranteed 
to be able to consume said result.

> stop with savepoint timeout
> ---------------------------
>
>                 Key: FLINK-24053
>                 URL: https://issues.apache.org/jira/browse/FLINK-24053
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / REST
>    Affects Versions: 1.11.0, 1.12.0, 1.13.0
>            Reporter: 刘方奇
>            Priority: Major
>
> Hello, when we use the "stop with savepoint" feature, we always meet a bug.
> We will always cost 5 mins waiting the application to end, then the 
> application will throw a timeout exception.
>  
> {code:java}
> java.util.concurrent.TimeoutException: null 
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036)
>  ~[classes/:?] 
> at 
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211)
>  ~[classes/:?] 
> at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$14(FutureUtils.java:445)
>  ~[classes/:?] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_251] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_251] 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  ~[?:1.8.0_251] 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  ~[?:1.8.0_251] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_251] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_251] 
> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_251]
> {code}
> And we found there was always the function called 
> org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers.SavepointStatusHandler.closeHandlerAsync()
>  run timeout, and its timeout setting is 5mins.
> There was a question that the handler 's close may be not important, cause 
> the handler serves other handler called 
> org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers.StopWithSavepointHandler
>  which was already closed.So should we skip this close ?
> PS : There was no problem when we test the code that skip the handler 's 
> close.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-24053) stop with savepoint timeout

Reply via email to