[
https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900182#comment-17900182
]
Y Ethan Guo commented on HUDI-1045:
-----------------------------------
Related ticket: HUDI-8464.
When we support NBCC for concurrent updates and clustering (HUDI-1045), we also
need to keep this in mind for clustering since there is a time gap between
clustering request time generation and adding clustering requested file on
timeline after the clustering planning, i.e., clustering planning need to
consider the requested time of the clustering too when generating the list of
files to cluster.
> Support updates during clustering
> ---------------------------------
>
> Key: HUDI-1045
> URL: https://issues.apache.org/jira/browse/HUDI-1045
> Project: Apache Hudi
> Issue Type: Task
> Components: clustering, table-service
> Reporter: leesf
> Assignee: Vinoth Chandar
> Priority: Blocker
> Fix For: 1.1.0
>
>
> h4. We need to allow a writer w writing to file groups f1, f2, f3,
> concurrently while a clustering service C reclusters them into f4, f5.
> Goals
> * Writes can be either updates, deletes or inserts.
> * Either clustering C or the writer W can finish first
> * Both W and C need to be able to complete their actions without much
> redoing of work.
> * The number of output file groups for C can be higher or lower than input
> file groups.
> * Need to work across and be oblivious to whether the writers are operating
> in OCC or NBCC modes
> * Needs to interplay well with cleaning and compaction services.
> h4. Non-goals
> * Strictly the sort order achieved by clustering, in face of updates (e.g
> updates change clustering field values, causing output clustering file groups
> to be not fully sorted by those fields)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)