[ 
https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1045:
---------------------------------
    Description: 
h4. We need to allow a writer w writing to file groups f1, f2, f3, concurrently 
while a clustering service C  reclusters them into  f4, f5. 

Goals
 * Writes can be either updates, deletes or inserts. 
 * Either clustering C or the writer W can finish first
 * Both W and C need to be able to complete their actions without much redoing 
of work. 
 * The number of output file groups for C can be higher or lower than input 
file groups. 
 * Need to work across and be oblivious to whether the writers are operating in 
OCC or NBCC modes
 * Needs to interplay well with cleaning and compaction services.



h4. Non-goals 
 * Strictly the sort order achieved by clustering, in face of updates (e.g 
updates change clustering field values, causing output clustering file groups 
to be not fully sorted by those fields)

  was:
We need to allow a writer w writing to file groups f1, f2, f3, concurrently 
while a clustering service C  reclusters them into  f4, f5. 
 * Writes can be either updates, deletes or inserts. 
 * Either clustering C or the writer W can finish first
 * Both W and C need to be able to complete their actions without much redoing 
of work. 
 * The number of output file groups for C can be higher or lower than input 
file groups. 
 * Need to work across and be oblivious to whether the writers are operating in 
OCC or NBCC modes
 * Needs to interplay well with cleaning and compaction services.


> Support updates during clustering
> ---------------------------------
>
>                 Key: HUDI-1045
>                 URL: https://issues.apache.org/jira/browse/HUDI-1045
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: clustering, table-service
>            Reporter: leesf
>            Assignee: Vinoth Chandar
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> h4. We need to allow a writer w writing to file groups f1, f2, f3, 
> concurrently while a clustering service C  reclusters them into  f4, f5. 
> Goals
>  * Writes can be either updates, deletes or inserts. 
>  * Either clustering C or the writer W can finish first
>  * Both W and C need to be able to complete their actions without much 
> redoing of work. 
>  * The number of output file groups for C can be higher or lower than input 
> file groups. 
>  * Need to work across and be oblivious to whether the writers are operating 
> in OCC or NBCC modes
>  * Needs to interplay well with cleaning and compaction services.
> h4. Non-goals 
>  * Strictly the sort order achieved by clustering, in face of updates (e.g 
> updates change clustering field values, causing output clustering file groups 
> to be not fully sorted by those fields)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to