[ https://issues.apache.org/jira/browse/FLINK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524253#comment-17524253 ]
fanrui commented on FLINK-26762: -------------------------------- [~akalashnikov] thanks for your feedback. I think I should understand why you recommend overdraft-memory-size. But I still have some questions about the code implementation. * Assuming that flatmap can produce 1Mb data at one time, they belong to different channels, do we apply for overdraft-buffer according to the actual data size? * If the data is 100 bytes, do we apply for a memory segment of 100 bytes? * If the same Subpartition has 5 records, we will apply for 5 100byte memory segments. Maybe they don't perform well and take up more credit downstream? I have an idea, help to check if it works. We still use max-overdraft-buffers-per-gate. * If = 0, disable overdraft buffer. * If > 0, the user configuration takes effect. * If = -1, flink automatically infers and will use the number of Subpartitions as overdraft-buffers. Not only can cover flatmap, window, join scenes, but also cover scenes broadcast by Watermark. It will make users more convenient to use. Thanks a lot for your share the benchmark, I will do it asap. > Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being > blocked > ----------------------------------------------------------------------------------- > > Key: FLINK-26762 > URL: https://issues.apache.org/jira/browse/FLINK-26762 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing, Runtime / Network > Affects Versions: 1.13.0, 1.14.0, 1.15.0 > Reporter: fanrui > Assignee: fanrui > Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > Attachments: image-2022-04-18-11-45-14-700.png, > image-2022-04-18-11-46-03-895.png > > > In some past JIRAs of Unaligned Checkpoint, the community has added the > recordWriter.isAvaliable() to reduce block for single record write. But for > large record, flatmap or broadcast watermark, they may need more buffer. > Can we add the overdraft buffer in BufferPool to reduce unaligned checkpoint > being blocked? > h2. Overdraft Buffer mechanism > Add the configuration of > 'taskmanager.network.memory.overdraft-buffers-per-gate=5'. > When requestMemory is called and the bufferPool is insufficient, the > bufferPool will allow the Task to overdraw up to 5 MemorySegments. And > bufferPool will be unavailable until all overdrawn buffers are consumed by > downstream tasks. Then the task will wait for bufferPool being available. > From the above, we have the following benefits: > * For scenarios that require multiple buffers, the Task releases the > Checkpoint lock, so the Unaligned Checkpoint can be completed quickly. > * We can control the memory usage to prevent memory leak. > * It just needs a litter memory, and can improve the stability of the Task > under back pressure. > * Users can increase the overdraft-buffers to adapt the scenarios that > require more buffers. > > Masters, please correct me if I'm wrong, thanks a lot. -- This message was sent by Atlassian Jira (v8.20.1#820001)