[ https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380016#comment-17380016 ]
ASF GitHub Bot commented on HUDI-1483: -------------------------------------- codope commented on pull request #3142: URL: https://github.com/apache/hudi/pull/3142#issuecomment-879223675 > Hi @codope Just want to know, is this Async clustering function can handle the following scenarios and losing no data: > > There are 3 small file groups named fg1, fg2 and fg3 contained file slice1, file slice2 and file slices3 separately. > > When async schedule **start to make a cluster plan but not finished**, there is an inflight or requested commit for fg1 which will create file slice 11 based on file slice1. In other words **file slice11 is creating but not committed** ---> I believe this scene is similar to multi writers. > > What does this async clustering function will do? > Will this clustering plan contains file slice1? if contained, I think the new data in file slice11 will be lost. > > Looking forward to your reply, thanks a lot. @zhangyue19921010 It will depend on what point of time during clustering planning file slice11 is created. If it is before the `ClusteringPlanStrategy#getFileSlicesEligibleForClustering` is invoked then clustering plan will not contain file slice1. So, just like multi writers there is a race condition here. However, while actually clustering, the default (and currently only) strategy is to reject updates. So, it will throw exception after seeing that there is an a filegroup with update (in this case fg1). This should get picked up in the next run of clustering. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > async clustering for deltastreamer > ---------------------------------- > > Key: HUDI-1483 > URL: https://issues.apache.org/jira/browse/HUDI-1483 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: liwei > Assignee: liwei > Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)