[ 
https://issues.apache.org/jira/browse/FLINK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524253#comment-17524253
 ] 

fanrui commented on FLINK-26762:
--------------------------------

[~akalashnikov]  thanks for your feedback.

I think I should understand why you recommend overdraft-memory-size. But I 
still have some questions about the code implementation.
 * Assuming that flatmap can produce 1Mb data at one time, they belong to 
different channels, do we apply for overdraft-buffer according to the actual 
data size?
 * If the data is 100 bytes, do we apply for a memory segment of 100 bytes?
 * If the same Subpartition has 5 records, we will apply for 5 100byte memory 
segments. Maybe they don't perform well and take up more credit downstream?

I have an idea, help to check if it works. We still use 
max-overdraft-buffers-per-gate.
 * If = 0, disable overdraft buffer.
 * If > 0, the user configuration takes effect.
 * If = -1, flink automatically infers and will use the number of Subpartitions 
as overdraft-buffers. Not only can cover flatmap, window, join scenes, but also 
cover scenes broadcast by Watermark. It will make users more convenient to use.

 

Thanks a lot for your share the benchmark, I will do it asap.

> Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being 
> blocked
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-26762
>                 URL: https://issues.apache.org/jira/browse/FLINK-26762
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing, Runtime / Network
>    Affects Versions: 1.13.0, 1.14.0, 1.15.0
>            Reporter: fanrui
>            Assignee: fanrui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.16.0
>
>         Attachments: image-2022-04-18-11-45-14-700.png, 
> image-2022-04-18-11-46-03-895.png
>
>
> In some past JIRAs of Unaligned Checkpoint, the community has added the  
> recordWriter.isAvaliable() to reduce block for single record write. But for 
> large record, flatmap or broadcast watermark, they may need more buffer.
> Can we add the overdraft buffer in BufferPool to reduce unaligned checkpoint 
> being blocked? 
> h2. Overdraft Buffer mechanism
> Add the configuration of 
> 'taskmanager.network.memory.overdraft-buffers-per-gate=5'. 
> When requestMemory is called and the bufferPool is insufficient, the 
> bufferPool will allow the Task to overdraw up to 5 MemorySegments. And 
> bufferPool will be unavailable until all overdrawn buffers are consumed by 
> downstream tasks. Then the task will wait for bufferPool being available.
> From the above, we have the following benefits:
>  * For scenarios that require multiple buffers, the Task releases the 
> Checkpoint lock, so the Unaligned Checkpoint can be completed quickly.
>  * We can control the memory usage to prevent memory leak.
>  * It just needs a litter memory, and can improve the stability of the Task 
> under back pressure.
>  * Users can increase the overdraft-buffers to adapt the scenarios that 
> require more buffers.
>  
> Masters, please correct me if I'm wrong, thanks a lot.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to