[jira] [Commented] (FLINK-19426) Streaming File Sink end-to-end test sometimes fails with "Could not assign resource ... to current execution ..."

2020-10-01 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205509#comment-17205509
 ] 

Dian Fu commented on FLINK-19426:
-

[~azagrebin] Thanks a lot for the investigation!

> Streaming File Sink end-to-end test sometimes fails with "Could not assign 
> resource ... to current execution ..."
> -
>
> Key: FLINK-19426
> URL: https://issues.apache.org/jira/browse/FLINK-19426
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network, Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6983&view=logs&j=68a897ab-3047-5660-245a-cce8f83859f6&t=16ca2cca-2f63-5cce-12d2-d519b930a729
> {code}
> 2020-09-26T22:16:26.9856525Z 
> org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException:
>  Connection for partition 
> 619775973ed0f282e20f9d55d13913ab#0@bc764cd8ddf7a0cff126f51c16239658_0_1 not 
> reachable.
> 2020-09-26T22:16:26.9857848Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:159)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9859168Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:336)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9860449Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:308)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9861677Z  at 
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:95)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9862861Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.requestPartitions(StreamTask.java:542)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9864018Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.readRecoveredChannelState(StreamTask.java:507)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9865284Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:498)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9866415Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9867500Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:492)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9868514Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:550)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9869450Z  at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9870339Z  at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9870869Z  at java.lang.Thread.run(Thread.java:748) 
> [?:1.8.0_265]
> 2020-09-26T22:16:26.9872060Z Caused by: java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '/10.1.0.4:38905' has failed. This might 
> indicate that the remote task manager has been lost.
> 2020-09-26T22:16:26.9873511Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9874788Z  at 
> org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9876084Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:156)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9876567Z  ... 12 more
> 2020-09-26T22:16:26.9877477Z Caused by: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '/10.1.0.4:38905' has failed. This might 
> indicate that the re

[jira] [Commented] (FLINK-19426) Streaming File Sink end-to-end test sometimes fails with "Could not assign resource ... to current execution ..."

2020-10-01 Thread Andrey Zagrebin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205370#comment-17205370
 ] 

Andrey Zagrebin commented on FLINK-19426:
-

The error has been reproduced in the CI run 
[without|https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8428&view=results]
 FLINK-19388 fix and the CI run [with the 
fix|https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8429&view=results]
 failed with maximum allowed run time exceeded w/o the reported failure.

Hence, I think this issue is a duplicate of FLINK-19388.

> Streaming File Sink end-to-end test sometimes fails with "Could not assign 
> resource ... to current execution ..."
> -
>
> Key: FLINK-19426
> URL: https://issues.apache.org/jira/browse/FLINK-19426
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network, Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6983&view=logs&j=68a897ab-3047-5660-245a-cce8f83859f6&t=16ca2cca-2f63-5cce-12d2-d519b930a729
> {code}
> 2020-09-26T22:16:26.9856525Z 
> org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException:
>  Connection for partition 
> 619775973ed0f282e20f9d55d13913ab#0@bc764cd8ddf7a0cff126f51c16239658_0_1 not 
> reachable.
> 2020-09-26T22:16:26.9857848Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:159)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9859168Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:336)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9860449Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:308)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9861677Z  at 
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:95)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9862861Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.requestPartitions(StreamTask.java:542)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9864018Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.readRecoveredChannelState(StreamTask.java:507)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9865284Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:498)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9866415Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9867500Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:492)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9868514Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:550)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9869450Z  at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9870339Z  at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9870869Z  at java.lang.Thread.run(Thread.java:748) 
> [?:1.8.0_265]
> 2020-09-26T22:16:26.9872060Z Caused by: java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '/10.1.0.4:38905' has failed. This might 
> indicate that the remote task manager has been lost.
> 2020-09-26T22:16:26.9873511Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9874788Z  at 
> org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9876084Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:156)
>  ~[flin

[jira] [Commented] (FLINK-19426) Streaming File Sink end-to-end test sometimes fails with "Could not assign resource ... to current execution ..."

2020-09-30 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204738#comment-17204738
 ] 

Robert Metzger commented on FLINK-19426:


Thanks a lot for looking into this. FLINK-19388 really sounds similar to this 
problem.

> Streaming File Sink end-to-end test sometimes fails with "Could not assign 
> resource ... to current execution ..."
> -
>
> Key: FLINK-19426
> URL: https://issues.apache.org/jira/browse/FLINK-19426
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network, Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6983&view=logs&j=68a897ab-3047-5660-245a-cce8f83859f6&t=16ca2cca-2f63-5cce-12d2-d519b930a729
> {code}
> 2020-09-26T22:16:26.9856525Z 
> org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException:
>  Connection for partition 
> 619775973ed0f282e20f9d55d13913ab#0@bc764cd8ddf7a0cff126f51c16239658_0_1 not 
> reachable.
> 2020-09-26T22:16:26.9857848Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:159)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9859168Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:336)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9860449Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:308)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9861677Z  at 
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:95)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9862861Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.requestPartitions(StreamTask.java:542)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9864018Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.readRecoveredChannelState(StreamTask.java:507)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9865284Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:498)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9866415Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9867500Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:492)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9868514Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:550)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9869450Z  at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9870339Z  at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9870869Z  at java.lang.Thread.run(Thread.java:748) 
> [?:1.8.0_265]
> 2020-09-26T22:16:26.9872060Z Caused by: java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '/10.1.0.4:38905' has failed. This might 
> indicate that the remote task manager has been lost.
> 2020-09-26T22:16:26.9873511Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9874788Z  at 
> org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9876084Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:156)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-26T22:16:26.9876567Z  ... 12 more
> 2020-09-26T22:16:26.9877477Z Caused by: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '/10.1.0.4:38905' has failed. This