Hi Luke, Yes, it is really not very friendly for users to set the number of network buffers. And it may cause OOM exception if not set the proper amount for large scale job. The proper amount should be calculated automatically by framework based on the number of input and output channels for each task, and @Nico Kruber is working on this feature now. You can check this jira for some details "https://issues.apache.org/jira/browse/FLINK-4545". Cheers Zhijiang ------------------------------------------------------------------发件人:Luke Hutchison (JIRA) <j...@apache.org>发送时间:2017年3月15日(星期三) 17:01收件人:dev <dev@flink.apache.org>主 题:[jira] [Created] (FLINK-6057) Better default needed for num network buffers Luke Hutchison created FLINK-6057: -------------------------------------
Summary: Better default needed for num network buffers Key: FLINK-6057 URL: https://issues.apache.org/jira/browse/FLINK-6057 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.2.0 Reporter: Luke Hutchison Using the default environment, {code} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); {code} my code will sometimes fail with an error that Flink ran out of network buffers. To fix this, I have to do: {code} int numTasks = Runtime.getRuntime().availableProcessors(); config.setInteger(ConfigConstants.DEFAULT_PARALLELISM_KEY, numTasks); config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numTasks); config.setInteger(ConfigConstants.TASK_MANAGER_NETWORK_NUM_BUFFERS_KEY, numTasks * 2048); {code} The default value of 2048 fails when I increase the degree of parallelism for a large Flink pipeline (hence the fix to set the number of buffers to numTasks * 2048). This is particularly problematic because a pipeline can work fine on one machine, and when you start the pipeline on a machine with more cores, it can fail. The default execution environment should pick a saner default based on the level of parallelism (or whatever is needed to ensure that the number of network buffers is not going to be exceeded for a given execution environment). -- This message was sent by Atlassian JIRA (v6.3.15#6346)