[ 
https://issues.apache.org/jira/browse/HUDI-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378970#comment-17378970
 ] 

ASF GitHub Bot commented on HUDI-1483:
--------------------------------------

zhangyue19921010 edited a comment on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-878004113


   Hi @codope Just want to know, is this Async clustering function can handle 
the following scenarios and losing no data:
   
   There are 3 small file groups named fg1, fg2 and fg3 contained file slice1, 
file slice2 and file slices3 separately.
   
   When async schedule **start to make a cluster plan but not finished**, there 
is an inflight or requested commit for fg1 which will create file slice 11 
based on file slice1. In other words **file slice11 is creating but not 
committed**  ---> I believe this scene is similar to multi writer.
   
   What does this async clustering function will do? 
   Will this clustering plan contains file slice1? if contained, I think the 
new data in file slice11 will be lost.
   
   Looking forward to your reply, thanks a lot.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> async clustering for deltastreamer
> ----------------------------------
>
>                 Key: HUDI-1483
>                 URL: https://issues.apache.org/jira/browse/HUDI-1483
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: liwei
>            Assignee: liwei
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to