[jira] [Commented] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842466#comment-17842466
 ] 

Vinoth Chandar commented on HUDI-1045:
--

[WIP] Approach 2 : Introduce pointer data blocks into storage format

> Support updates during clustering
> -
>
> Key: HUDI-1045
> URL: https://issues.apache.org/jira/browse/HUDI-1045
> Project: Apache Hudi
>  Issue Type: Task
>  Components: clustering, table-service
>Reporter: leesf
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 1.0.0
>
>
> We need to allow a writer w writing to file groups f1, f2, f3, concurrently 
> while a clustering service C  reclusters them into  f4, f5. 
>  * Writes can be either updates, deletes or inserts. 
>  * Either clustering C or the writer W can finish first
>  * Both W and C need to be able to complete their actions without much 
> redoing of work. 
>  * The number of output file groups for C can be higher or lower than input 
> file groups. 
>  * Need to work across and be oblivious to whether the writers are operating 
> in OCC or NBCC modes
>  * Needs to interplay well with cleaning and compaction services.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465
 ] 

Vinoth Chandar commented on HUDI-1045:
--

h3.  [WIP] Approach 1 :  Redistribute records from the conflicting file groups 

> Support updates during clustering
> -
>
> Key: HUDI-1045
> URL: https://issues.apache.org/jira/browse/HUDI-1045
> Project: Apache Hudi
>  Issue Type: Task
>  Components: clustering, table-service
>Reporter: leesf
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 1.0.0
>
>
> We need to allow a writer w writing to file groups f1, f2, f3, concurrently 
> while a clustering service C  reclusters them into  f4, f5. 
>  * Writes can be either updates, deletes or inserts. 
>  * Either clustering C or the writer W can finish first
>  * Both W and C need to be able to complete their actions without much 
> redoing of work. 
>  * The number of output file groups for C can be higher or lower than input 
> file groups. 
>  * Need to work across and be oblivious to whether the writers are operating 
> in OCC or NBCC modes
>  * Needs to interplay well with cleaning and compaction services.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)