Github user zhijiangW commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4509#discussion_r141794108
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
 ---
    @@ -390,7 +390,63 @@ public BufferProvider getBufferProvider() throws 
IOException {
                return inputGate.getBufferProvider();
        }
     
    -   public void onBuffer(Buffer buffer, int sequenceNumber) {
    +   /**
    +    * Requests buffer from input channel directly for receiving network 
data.
    +    * It should always return an available buffer in credit-based mode.
    +    *
    +    * @return The available buffer.
    +    */
    +   public Buffer requestBuffer() {
    +           synchronized (availableBuffers) {
    +                   return availableBuffers.poll();
    +           }
    +   }
    +
    +   /**
    +    * Receives the backlog from producer's buffer response. If the number 
of available
    +    * buffers is less than the backlog length, it will request floating 
buffers from buffer
    +    * pool, and then notify unannounced credits to the producer.
    +    *
    +    * @param backlog The number of unsent buffers in the producer's sub 
partition.
    +    */
    +   private void onSenderBacklog(int backlog) {
    +           int numRequestedBuffers = 0;
    +
    +           synchronized (availableBuffers) {
    +                   // Important: the isReleased check should be inside the 
synchronized block.
    +                   if (!isReleased.get()) {
    +                           senderBacklog.set(backlog);
    +
    +                           while (senderBacklog.get() > 
availableBuffers.size() && !isWaitingForFloatingBuffers.get()) {
    --- End diff --
    
    Actually I implemented this strategy in two different ways on our 
production.
    
    On `LocalBufferPool` side, it has the ability to assign available buffers 
among all the listeners in round-robin fair way because it can gather all the 
listeners within some time. But it may bring delay by triggering assignment on 
`LocalBufferPool` side.
    
    On `RemoteInputChannel` side, we currently implement another complicated 
way to request buffers in a relatively fair way. That is : 
    
    1. Define a parameter `numBuffersPerAllocation` to indicate how many 
buffers at most to request from `LocalBufferPool` each time.
    2. `min(numBuffersPerAllocation, backlog)` is the actual value to request 
from `LocalBufferPool`, so one channel will not occupy all the floating 
buffers, even though its backlog is really large.
    3. In general `numBuffersPerAllocation` should be larger than 1 to avoid 
throughput decline. For example, if the floating buffers in `LocalBufferPool` 
can satisfy all the requirements of `RemoteInputChannel`, it is better to 
notify the producer batch of credits each time than one credit at a time by 
many times.
    4. On `LocalBufferPool` side, the `RemoteInputChannel` may still register 
as listener after already requested `numBuffersPerAllocation` buffers when the 
number of available buffers plus `numBuffersPerAllocation` is less than 
`backlog`. Then it has to wait for `LocalBufferPool#recycle()` to trigger 
distributing the left available buffers among all the listeners.
    
    BTW, I did not understand clearly of the formula you mentioned above 
`backlog + initialCredit - currentCredit`.  I think the initial credit should 
not be considered in the following interactions. `backlog-currentCredit` can 
reflect the number of extra buffers needed in real time for each interaction. I 
know `backlog-currentCredit` is not very accurate because some credits may be 
already in-flight notification. But it can be balanced in the long run.
    
    What do you think of this way? 


---

Reply via email to