[ 
https://issues.apache.org/jira/browse/FLINK-30727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682801#comment-17682801
 ] 

Yunhong Zheng commented on FLINK-30727:
---------------------------------------

Hi, [~mapohl] , The root cause of this error may be I didn't set a parallelism 
for TableEnvironment in this ITCase, so this ITCase used default parallelism 
which equals to CPU cores (In azure CI, CPU cores equals to 32) as the 
parallelism. For setting parallelism as 32 with the complex job graph in this 
case, network memory may be insufficient.

The solution to this error is to set the parallelism manually. I will verify it 
on a machine with a large number of cpu cores.

> JoinReorderITCase.testBushyTreeJoinReorder failed due to IOException
> --------------------------------------------------------------------
>
>                 Key: FLINK-30727
>                 URL: https://issues.apache.org/jira/browse/FLINK-30727
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network, Table SQL / Planner
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Assignee: Yunhong Zheng
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>
> IOException due to timeout occurring while requesting exclusive NetworkBuffer 
> caused JoinReorderITCase.testBushyTreeJoinReorder to fail:
> {code}
> [...]
> Jan 18 01:11:27 Caused by: java.io.IOException: Timeout triggered when 
> requesting exclusive buffers: The total number of network buffers is 
> currently set to 2048 of 32768 bytes each. You can increase this number by 
> setting the configuration keys 'taskmanager.memory.network.fraction', 
> 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max',  or 
> you may increase the timeout which is 30000ms by setting the key 
> 'taskmanager.network.memory.exclusive-buffers-request-timeout-ms'.
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.internalRequestMemorySegments(NetworkBufferPool.java:256)
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.requestPooledMemorySegmentsBlocking(NetworkBufferPool.java:179)
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.reserveSegments(LocalBufferPool.java:262)
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.setupChannels(SingleInputGate.java:517)
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.setup(SingleInputGate.java:277)
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setup(InputGateWithMetrics.java:105)
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.taskmanager.Task.setupPartitionsAndGates(Task.java:962)
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:648)
> Jan 18 01:11:27       at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:556)
> Jan 18 01:11:27       at java.lang.Thread.run(Thread.java:748)
> {code}
> Same build, 2 failures:
> * 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44987&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4&l=14300
> * 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44987&view=logs&j=ce3801ad-3bd5-5f06-d165-34d37e757d90&t=5e4d9387-1dcc-5885-a901-90469b7e6d2f&l=14362



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to