Github user zhijiangW commented on a diff in the pull request: https://github.com/apache/flink/pull/4509#discussion_r141794108 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java --- @@ -390,7 +390,63 @@ public BufferProvider getBufferProvider() throws IOException { return inputGate.getBufferProvider(); } - public void onBuffer(Buffer buffer, int sequenceNumber) { + /** + * Requests buffer from input channel directly for receiving network data. + * It should always return an available buffer in credit-based mode. + * + * @return The available buffer. + */ + public Buffer requestBuffer() { + synchronized (availableBuffers) { + return availableBuffers.poll(); + } + } + + /** + * Receives the backlog from producer's buffer response. If the number of available + * buffers is less than the backlog length, it will request floating buffers from buffer + * pool, and then notify unannounced credits to the producer. + * + * @param backlog The number of unsent buffers in the producer's sub partition. + */ + private void onSenderBacklog(int backlog) { + int numRequestedBuffers = 0; + + synchronized (availableBuffers) { + // Important: the isReleased check should be inside the synchronized block. + if (!isReleased.get()) { + senderBacklog.set(backlog); + + while (senderBacklog.get() > availableBuffers.size() && !isWaitingForFloatingBuffers.get()) { --- End diff -- Actually I implemented this strategy in two different ways on our production. On `LocalBufferPool` side, it has the ability to assign available buffers among all the listeners in round-robin fair way because it can gather all the listeners within some time. But it may bring delay by triggering assignment on `LocalBufferPool` side. On `RemoteInputChannel` side, we currently implement another complicated way to request buffers in a relatively fair way. That is : 1. Define a parameter `numBuffersPerAllocation` to indicate how many buffers at most to request from `LocalBufferPool` each time. 2. `min(numBuffersPerAllocation, backlog)` is the actual value to request from `LocalBufferPool`, so one channel will not occupy all the floating buffers, even though its backlog is really large. 3. In general `numBuffersPerAllocation` should be larger than 1 to avoid throughput decline. For example, if the floating buffers in `LocalBufferPool` can satisfy all the requirements of `RemoteInputChannel`, it is better to notify the producer batch of credits each time than one credit at a time by many times. 4. On `LocalBufferPool` side, the `RemoteInputChannel` may still register as listener after already requested `numBuffersPerAllocation` buffers when the number of available buffers plus `numBuffersPerAllocation` is less than `backlog`. Then it has to wait for `LocalBufferPool#recycle()` to trigger distributing the left available buffers among all the listeners. BTW, I did not understand clearly of the formula you mentioned above `backlog + initialCredit - currentCredit`. I think the initial credit should not be considered in the following interactions. `backlog-currentCredit` can reflect the number of extra buffers needed in real time for each interaction. I know `backlog-currentCredit` is not very accurate because some credits may be already in-flight notification. But it can be balanced in the long run. What do you think of this way?
---