[jira] [Commented] (FLINK-19426) Streaming File Sink end-to-end test sometimes fails with "Could not assign resource ... to current execution ..."
[ https://issues.apache.org/jira/browse/FLINK-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205509#comment-17205509 ] Dian Fu commented on FLINK-19426: - [~azagrebin] Thanks a lot for the investigation! > Streaming File Sink end-to-end test sometimes fails with "Could not assign > resource ... to current execution ..." > - > > Key: FLINK-19426 > URL: https://issues.apache.org/jira/browse/FLINK-19426 > Project: Flink > Issue Type: Bug > Components: Runtime / Network, Tests >Affects Versions: 1.12.0 >Reporter: Dian Fu >Assignee: Robert Metzger >Priority: Major > Labels: pull-request-available, test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6983&view=logs&j=68a897ab-3047-5660-245a-cce8f83859f6&t=16ca2cca-2f63-5cce-12d2-d519b930a729 > {code} > 2020-09-26T22:16:26.9856525Z > org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException: > Connection for partition > 619775973ed0f282e20f9d55d13913ab#0@bc764cd8ddf7a0cff126f51c16239658_0_1 not > reachable. > 2020-09-26T22:16:26.9857848Z at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:159) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9859168Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:336) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9860449Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:308) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9861677Z at > org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:95) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9862861Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.requestPartitions(StreamTask.java:542) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9864018Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.readRecoveredChannelState(StreamTask.java:507) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9865284Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:498) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9866415Z at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9867500Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:492) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9868514Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:550) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9869450Z at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) > [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9870339Z at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) > [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9870869Z at java.lang.Thread.run(Thread.java:748) > [?:1.8.0_265] > 2020-09-26T22:16:26.9872060Z Caused by: java.io.IOException: > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connecting to remote task manager '/10.1.0.4:38905' has failed. This might > indicate that the remote task manager has been lost. > 2020-09-26T22:16:26.9873511Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9874788Z at > org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9876084Z at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:156) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9876567Z ... 12 more > 2020-09-26T22:16:26.9877477Z Caused by: > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connecting to remote task manager '/10.1.0.4:38905' has failed. This might > indicate that the re
[jira] [Commented] (FLINK-19426) Streaming File Sink end-to-end test sometimes fails with "Could not assign resource ... to current execution ..."
[ https://issues.apache.org/jira/browse/FLINK-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205370#comment-17205370 ] Andrey Zagrebin commented on FLINK-19426: - The error has been reproduced in the CI run [without|https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8428&view=results] FLINK-19388 fix and the CI run [with the fix|https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8429&view=results] failed with maximum allowed run time exceeded w/o the reported failure. Hence, I think this issue is a duplicate of FLINK-19388. > Streaming File Sink end-to-end test sometimes fails with "Could not assign > resource ... to current execution ..." > - > > Key: FLINK-19426 > URL: https://issues.apache.org/jira/browse/FLINK-19426 > Project: Flink > Issue Type: Bug > Components: Runtime / Network, Tests >Affects Versions: 1.12.0 >Reporter: Dian Fu >Assignee: Robert Metzger >Priority: Major > Labels: pull-request-available, test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6983&view=logs&j=68a897ab-3047-5660-245a-cce8f83859f6&t=16ca2cca-2f63-5cce-12d2-d519b930a729 > {code} > 2020-09-26T22:16:26.9856525Z > org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException: > Connection for partition > 619775973ed0f282e20f9d55d13913ab#0@bc764cd8ddf7a0cff126f51c16239658_0_1 not > reachable. > 2020-09-26T22:16:26.9857848Z at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:159) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9859168Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:336) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9860449Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:308) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9861677Z at > org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:95) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9862861Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.requestPartitions(StreamTask.java:542) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9864018Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.readRecoveredChannelState(StreamTask.java:507) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9865284Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:498) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9866415Z at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9867500Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:492) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9868514Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:550) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9869450Z at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) > [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9870339Z at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) > [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9870869Z at java.lang.Thread.run(Thread.java:748) > [?:1.8.0_265] > 2020-09-26T22:16:26.9872060Z Caused by: java.io.IOException: > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connecting to remote task manager '/10.1.0.4:38905' has failed. This might > indicate that the remote task manager has been lost. > 2020-09-26T22:16:26.9873511Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9874788Z at > org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9876084Z at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:156) > ~[flin
[jira] [Commented] (FLINK-19426) Streaming File Sink end-to-end test sometimes fails with "Could not assign resource ... to current execution ..."
[ https://issues.apache.org/jira/browse/FLINK-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204738#comment-17204738 ] Robert Metzger commented on FLINK-19426: Thanks a lot for looking into this. FLINK-19388 really sounds similar to this problem. > Streaming File Sink end-to-end test sometimes fails with "Could not assign > resource ... to current execution ..." > - > > Key: FLINK-19426 > URL: https://issues.apache.org/jira/browse/FLINK-19426 > Project: Flink > Issue Type: Bug > Components: Runtime / Network, Tests >Affects Versions: 1.12.0 >Reporter: Dian Fu >Assignee: Robert Metzger >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6983&view=logs&j=68a897ab-3047-5660-245a-cce8f83859f6&t=16ca2cca-2f63-5cce-12d2-d519b930a729 > {code} > 2020-09-26T22:16:26.9856525Z > org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException: > Connection for partition > 619775973ed0f282e20f9d55d13913ab#0@bc764cd8ddf7a0cff126f51c16239658_0_1 not > reachable. > 2020-09-26T22:16:26.9857848Z at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:159) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9859168Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:336) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9860449Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:308) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9861677Z at > org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:95) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9862861Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.requestPartitions(StreamTask.java:542) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9864018Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.readRecoveredChannelState(StreamTask.java:507) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9865284Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:498) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9866415Z at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9867500Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:492) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9868514Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:550) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9869450Z at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) > [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9870339Z at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) > [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9870869Z at java.lang.Thread.run(Thread.java:748) > [?:1.8.0_265] > 2020-09-26T22:16:26.9872060Z Caused by: java.io.IOException: > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connecting to remote task manager '/10.1.0.4:38905' has failed. This might > indicate that the remote task manager has been lost. > 2020-09-26T22:16:26.9873511Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9874788Z at > org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9876084Z at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:156) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-09-26T22:16:26.9876567Z ... 12 more > 2020-09-26T22:16:26.9877477Z Caused by: > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connecting to remote task manager '/10.1.0.4:38905' has failed. This