[ 
https://issues.apache.org/jira/browse/FLINK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524228#comment-17524228
 ] 

Anton Kalashnikov commented on FLINK-26762:
-------------------------------------------

Thanks for such a fast job. I will take a look at PR. 
According to the idea of overdraft-memory-size. I supposed that it could be 
better because you don't need to count anything. For example, you know that 
your flatmap can produce 1Mb data at one time. So you configure this 1MB 
overdraft and you don't need to think about your current segment-memory-size or 
parallelism or other configuration. It looks transparent to the user. But if we 
use overdraft-buffers then we need to take into account a lot of other 
parameters(current segment-memory-size, parallelism, buffer-debloating) and 
also if we change any of these parameters overdraft-buffers need to be 
reconfigured. But again, it is just an idea I don't fully sure that it will be 
better I will try to discuss it widely.

About the benchmark. You can take a look at 
`CheckpointingTimeBenchmark`(https://github.com/apache/flink-benchmarks) 
perhaps you can use the same idea or expand it with your scenario. 

> Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being 
> blocked
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-26762
>                 URL: https://issues.apache.org/jira/browse/FLINK-26762
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing, Runtime / Network
>    Affects Versions: 1.13.0, 1.14.0, 1.15.0
>            Reporter: fanrui
>            Assignee: fanrui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.16.0
>
>         Attachments: image-2022-04-18-11-45-14-700.png, 
> image-2022-04-18-11-46-03-895.png
>
>
> In some past JIRAs of Unaligned Checkpoint, the community has added the  
> recordWriter.isAvaliable() to reduce block for single record write. But for 
> large record, flatmap or broadcast watermark, they may need more buffer.
> Can we add the overdraft buffer in BufferPool to reduce unaligned checkpoint 
> being blocked? 
> h2. Overdraft Buffer mechanism
> Add the configuration of 
> 'taskmanager.network.memory.overdraft-buffers-per-gate=5'. 
> When requestMemory is called and the bufferPool is insufficient, the 
> bufferPool will allow the Task to overdraw up to 5 MemorySegments. And 
> bufferPool will be unavailable until all overdrawn buffers are consumed by 
> downstream tasks. Then the task will wait for bufferPool being available.
> From the above, we have the following benefits:
>  * For scenarios that require multiple buffers, the Task releases the 
> Checkpoint lock, so the Unaligned Checkpoint can be completed quickly.
>  * We can control the memory usage to prevent memory leak.
>  * It just needs a litter memory, and can improve the stability of the Task 
> under back pressure.
>  * Users can increase the overdraft-buffers to adapt the scenarios that 
> require more buffers.
>  
> Masters, please correct me if I'm wrong, thanks a lot.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to