Hi Piotr, As I said in the original email, you cannot delete folders recursively for incremental checkpoints. And If you take a close look at the original email, I have shared the experimental results, which proved 29x improvement: "A simple experiment shows that deleting 1000 objects with each 5MB size, will cost 39494ms with for-loop single delete operations, and the result will drop to 1347ms if using multi-delete API in Tencent Cloud."
I think I can leverage some ideas from Dawid's work. And as I said, I would introduce the multi-delete API to the original FileSystem class instead of introducing another BulkDeletingFileSystem, which makes the file system abstraction closer to the modern cloud-based environment. Best Yun Tang ________________________________ From: Piotr Nowojski <pnowoj...@apache.org> Sent: Thursday, June 30, 2022 18:25 To: dev <dev@flink.apache.org>; Dawid Wysakowicz <dwysakow...@apache.org> Subject: Re: [DISCUSS] Introduce multi delete API to Flink's FileSystem class Hi, I presume this would mostly supersede the recursive deletes [1]? I remember an argument that the recursive deletes were not obviously better, even if the underlying FS was supporting it. I'm not saying that this would have been a counter argument against this effort, since every FileSystem could decide on its own whether to use the multi delete call or not. But I think at the very least it should be benchmarked/compared whether implementing it for a particular FS makes sense or not. Also there seems to be some similar (abandoned?) effort from Dawid, with named bulk deletes, with "BulkDeletingFileSystem"? [2] Isn't this basically the same thing that you are proposing Yun Tang? Best, Piotrek [1] https://issues.apache.org/jira/browse/FLINK-13856 [2] https://issues.apache.org/jira/browse/FLINK-13856?focusedCommentId=17481712&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17481712 czw., 30 cze 2022 o 11:45 Zakelly Lan <zakelly....@gmail.com> napisał(a): > Hi Yun, > > Thanks for bringing this into discussion. > I'm +1 to this idea. > And IIUC, Flink implements the OSS and S3 filesystem based on the hadoop > filesystem interface, which does not provide the multi-delete API, it may > take some effort to implement this. > > Best, > Zakelly > > On Thu, Jun 30, 2022 at 5:36 PM Martijn Visser <martijnvis...@apache.org> > wrote: > > > Hi Yun Tang, > > > > +1 for addressing this problem and your approach. > > > > Best regards, > > > > Martijn > > > > Op do 30 jun. 2022 om 11:12 schreef Feifan Wang <zoltar9...@163.com>: > > > > > Thanks a lot for the proposal @Yun Tang ! It sounds great and I can't > > > find any reason not to make this improvement. > > > > > > > > > —————————————— > > > Name: Feifan Wang > > > Email: zoltar9...@163.com > > > > > > > > > ---- Replied Message ---- > > > | From | Yun Tang<myas...@live.com> | > > > | Date | 06/30/2022 16:56 | > > > | To | dev@flink.apache.org<dev@flink.apache.org> | > > > | Subject | [DISCUSS] Introduce multi delete API to Flink's FileSystem > > > class | > > > Hi guys, > > > > > > As more and more teams move to cloud-based environments. Cloud object > > > storage has become the factual technical standard for big data > > ecosystems. > > > From our experience, the performance of writing/deleting objects in > > object > > > storage could vary in each call, the FLIP of changelog state-backend > had > > > ever taken experiments to verify the performance of writing the same > data > > > with multi times [1], and it proves that p999 latency could be 8x than > > p50 > > > latency. This is also true for delete operations. > > > > > > Currently, after introducing the checkpoint backpressure mechanism[2], > > the > > > newly triggered checkpoint could be delayed due to not cleaning > > checkpoints > > > as fast as possible [3]. > > > Moreover, Flink's checkpoint cleanup mechanism cannot leverage deleting > > > folder API to speed up the procedure with incremental checkpoints[4]. > > > This is extremely obvious in cloud object storage, and all most all > > object > > > storage SDKs have multi-delete API to accelerate the performance, e.g. > > AWS > > > S3 [5], Aliyun OSS [6], and Tencentyun COS [7]. > > > A simple experiment shows that deleting 1000 objects with each 5MB > size, > > > will cost 39494ms with for-loop single delete operations, and the > result > > > will drop to 1347ms if using multi-delete API in Tencent Cloud. > > > > > > However, Flink's FileSystem API refers to the HDFS's FileSystem API and > > > lacks such a multi-delete API, which is somehow outdated currently in > > > cloud-based environments. > > > Thus I suggest adding such a multi-delete API to Flink's FileSystem[8] > > > class and file systems that do not support such a multi-delete feature > > will > > > roll back to a for-loop single delete. > > > By doing so, we can at least accelerate the speed of discarding > > > checkpoints in cloud environments. > > > > > > WDYT? > > > > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints#FLIP158:Generalizedincrementalcheckpoints-DFSwritelatency > > > [2] https://issues.apache.org/jira/browse/FLINK-17073 > > > [3] https://issues.apache.org/jira/browse/FLINK-26590 > > > [4] > > > > > > https://github.com/apache/flink/blob/1486fee1acd9cd1e340f6d2007f723abd20294e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java#L315 > > > [5] > > > > > > https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-multiple-objects.html > > > [6] > > > > > > https://www.alibabacloud.com/help/en/object-storage-service/latest/delete-objects-8#section-v6n-zym-tax > > > [7] > > > > > > https://intl.cloud.tencent.com/document/product/436/44018#delete-objects-in-batch > > > [8] > > > > > > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java > > > > > > > > > Best > > > Yun Tang > > > > > > > > >