[jira] [Commented] (HUDI-1045) Support updates during clustering
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842466#comment-17842466 ] Vinoth Chandar commented on HUDI-1045: -- [WIP] Approach 2 : Introduce pointer data blocks into storage format > Support updates during clustering > - > > Key: HUDI-1045 > URL: https://issues.apache.org/jira/browse/HUDI-1045 > Project: Apache Hudi > Issue Type: Task > Components: clustering, table-service >Reporter: leesf >Assignee: Vinoth Chandar >Priority: Blocker > Fix For: 1.0.0 > > > We need to allow a writer w writing to file groups f1, f2, f3, concurrently > while a clustering service C reclusters them into f4, f5. > * Writes can be either updates, deletes or inserts. > * Either clustering C or the writer W can finish first > * Both W and C need to be able to complete their actions without much > redoing of work. > * The number of output file groups for C can be higher or lower than input > file groups. > * Need to work across and be oblivious to whether the writers are operating > in OCC or NBCC modes > * Needs to interplay well with cleaning and compaction services. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-1045) Support updates during clustering
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465 ] Vinoth Chandar commented on HUDI-1045: -- h3. [WIP] Approach 1 : Redistribute records from the conflicting file groups > Support updates during clustering > - > > Key: HUDI-1045 > URL: https://issues.apache.org/jira/browse/HUDI-1045 > Project: Apache Hudi > Issue Type: Task > Components: clustering, table-service >Reporter: leesf >Assignee: Vinoth Chandar >Priority: Blocker > Fix For: 1.0.0 > > > We need to allow a writer w writing to file groups f1, f2, f3, concurrently > while a clustering service C reclusters them into f4, f5. > * Writes can be either updates, deletes or inserts. > * Either clustering C or the writer W can finish first > * Both W and C need to be able to complete their actions without much > redoing of work. > * The number of output file groups for C can be higher or lower than input > file groups. > * Need to work across and be oblivious to whether the writers are operating > in OCC or NBCC modes > * Needs to interplay well with cleaning and compaction services. -- This message was sent by Atlassian Jira (v8.20.10#820010)