[ 
https://issues.apache.org/jira/browse/FLINK-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185389#comment-16185389
 ] 

ASF GitHub Bot commented on FLINK-7406:
---------------------------------------

Github user zhijiangW commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4509#discussion_r141794108
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
 ---
    @@ -390,7 +390,63 @@ public BufferProvider getBufferProvider() throws 
IOException {
                return inputGate.getBufferProvider();
        }
     
    -   public void onBuffer(Buffer buffer, int sequenceNumber) {
    +   /**
    +    * Requests buffer from input channel directly for receiving network 
data.
    +    * It should always return an available buffer in credit-based mode.
    +    *
    +    * @return The available buffer.
    +    */
    +   public Buffer requestBuffer() {
    +           synchronized (availableBuffers) {
    +                   return availableBuffers.poll();
    +           }
    +   }
    +
    +   /**
    +    * Receives the backlog from producer's buffer response. If the number 
of available
    +    * buffers is less than the backlog length, it will request floating 
buffers from buffer
    +    * pool, and then notify unannounced credits to the producer.
    +    *
    +    * @param backlog The number of unsent buffers in the producer's sub 
partition.
    +    */
    +   private void onSenderBacklog(int backlog) {
    +           int numRequestedBuffers = 0;
    +
    +           synchronized (availableBuffers) {
    +                   // Important: the isReleased check should be inside the 
synchronized block.
    +                   if (!isReleased.get()) {
    +                           senderBacklog.set(backlog);
    +
    +                           while (senderBacklog.get() > 
availableBuffers.size() && !isWaitingForFloatingBuffers.get()) {
    --- End diff --
    
    Actually I implemented this strategy in two different ways on our 
production.
    
    On `LocalBufferPool` side, it has the ability to assign available buffers 
among all the listeners in round-robin fair way because it can gather all the 
listeners within some time. But it may bring delay by triggering assignment on 
`LocalBufferPool` side.
    
    On `RemoteInputChannel` side, we currently implement another complicated 
way to request buffers in a relatively fair way. That is : 
    
    1. Define a parameter `numBuffersPerAllocation` to indicate how many 
buffers at most to request from `LocalBufferPool` each time.
    2. `min(numBuffersPerAllocation, backlog)` is the actual value to request 
from `LocalBufferPool`, so one channel will not occupy all the floating 
buffers, even though its backlog is really large.
    3. In general `numBuffersPerAllocation` should be larger than 1 to avoid 
throughput decline. For example, if the floating buffers in `LocalBufferPool` 
can satisfy all the requirements of `RemoteInputChannel`, it is better to 
notify the producer batch of credits each time than one credit at a time by 
many times.
    4. On `LocalBufferPool` side, the `RemoteInputChannel` may still register 
as listener after already requested `numBuffersPerAllocation` buffers when the 
number of available buffers plus `numBuffersPerAllocation` is less than 
`backlog`. Then it has to wait for `LocalBufferPool#recycle()` to trigger 
distributing the left available buffers among all the listeners.
    
    BTW, I did not understand clearly of the formula you mentioned above 
`backlog + initialCredit - currentCredit`.  I think the initial credit should 
not be considered in the following interactions. `backlog-currentCredit` can 
reflect the number of extra buffers needed in real time for each interaction. I 
know `backlog-currentCredit` is not very accurate because some credits may be 
already in-flight notification. But it can be balanced in the long run.
    
    What do you think of this way? 


> Implement Netty receiver incoming pipeline for credit-based
> -----------------------------------------------------------
>
>                 Key: FLINK-7406
>                 URL: https://issues.apache.org/jira/browse/FLINK-7406
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>            Reporter: zhijiang
>            Assignee: zhijiang
>             Fix For: 1.4.0
>
>
> This is a part of work for credit-based network flow control.
> Currently {{PartitionRequestClientHandler}} receives and reads 
> {{BufferResponse}} from producer. It will request buffer from {{BufferPool}} 
> for holding the message. If not got, the message is staged temporarily and 
> {{autoread}} for channel is set false.
> For credit-based mode, {{PartitionRequestClientHandler}} can always get 
> buffer from {{RemoteInputChannel}} for reading messages from producer.
> The related works are:
> * Add the backlog of producer in {{BufferResponse}} message structure
> * {{PartitionRequestClientHandler}} requests buffer from 
> {{RemoteInputChannel}} directly
> * {{PartitionRequestClientHandler}} updates backlog for 
> {{RemoteInputChannel}}, and it may trigger requests of floating buffers from 
> {{BufferPool}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to