[ https://issues.apache.org/jira/browse/FLINK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann updated FLINK-10309: ---------------------------------- Summary: Cancel with savepoint fails with java.net.ConnectException when using the per job-mode (was: Cancel flink job occurs java.net.ConnectException) > Cancel with savepoint fails with java.net.ConnectException when using the per > job-mode > -------------------------------------------------------------------------------------- > > Key: FLINK-10309 > URL: https://issues.apache.org/jira/browse/FLINK-10309 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, REST > Affects Versions: 1.5.3, 1.6.0, 1.7.0 > Reporter: vinoyang > Assignee: Gary Yao > Priority: Critical > Labels: pull-request-available > Fix For: 1.5.6, 1.6.3, 1.7.0 > > > The problem occurs when using the Yarn per-job detached mode. Trying to > cancel with savepoint fails with the following exception before being able to > retrieve the savepoint path: > exception stack trace : > {code:java} > org.apache.flink.util.FlinkException: Could not cancel job xxxx. > at > org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(CliFrontend.java:585) > at > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960) > at > org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:577) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1034) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not > complete the operation. Number of retries has been exhausted. > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.client.program.rest.RestClusterClient.cancelWithSavepoint(RestClusterClient.java:398) > at > org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(CliFrontend.java:583) > ... 6 more > Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: > Could not complete the operation. Number of retries has been exhausted. > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:274) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > ... 1 more > Caused by: java.util.concurrent.CompletionException: > java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) > ... 16 more > Caused by: java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281) > ... 7 more > {code} > some discussion in mailing list : > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Cancel-flink-job-occur-exception-td24056.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)