[ 
https://issues.apache.org/jira/browse/HUDI-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402001#comment-17402001
 ] 

ASF GitHub Bot commented on HUDI-2338:
--------------------------------------

hudi-bot edited a comment on pull request #3509:
URL: https://github.com/apache/hudi/pull/3509#issuecomment-902380160


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2511b6a222b821a06925f8e82257779529193fbc",
       "status" : "CANCELED",
       "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1825";,
       "triggerID" : "2511b6a222b821a06925f8e82257779529193fbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1ddc232a3cf45c731293cb5947f11c7eb8122ce",
       "status" : "PENDING",
       "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1828";,
       "triggerID" : "e1ddc232a3cf45c731293cb5947f11c7eb8122ce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2511b6a222b821a06925f8e82257779529193fbc Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1825)
 
   * e1ddc232a3cf45c731293cb5947f11c7eb8122ce Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1828)
 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hoodie data update reject clustering using SparkRejectClusteringStrategy
> ------------------------------------------------------------------------
>
>                 Key: HUDI-2338
>                 URL: https://issues.apache.org/jira/browse/HUDI-2338
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Yue Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> Hudi now support async clustering in HoodieDeltaStreamer and StructStreaming 
> and support offline clustering through HoodieClusteringJob.
> Data update conflicts with clustering is one of the more common scenarios. 
> And now hudi can only reject data using SparkRejectUpdateStrategy and failed 
> the ingestion.
> Sometimes, we think that clustering is an optimization service that runs in 
> the background, and data ingestion has a higher priority than it.
> So this tickets add a new UpdateStrategy named SparkRejectClusteringStrategy.
> This SparkRejectClusteringStrategy will reject and failing clustering job and 
> let data update success. 
> When update happened after clustering plan created and before clustering 
> executed.When update happened after clustering plan created and before 
> clustering executed.
>      1. There will be a request replace commit.  
>      2. SparkRejectClusteringStrategy will create a clustering reject file 
> under .tmp dir named xxx.replacement.request.reject.
>      3. Before perform clustering job, hudi can check this reject file using 
> SparkRejectClusteringStrategy.validateClustering() function.
>           3.1 if reject file is exists then abort this clustering plan and 
> remove reject file.
> When update happened after clustering executed but not finished.
>      1. There will be a inflight replace commit.
>      2. SparkRejectClusteringStrategy will create a clustering reject file 
> under .tmp dir named xxx.replacement.inflight.reject.
>      3. Before clustering job finished and committed, hudi can check this 
> reject file using SparkRejectClusteringStrategy.validateClustering() function.
>           3.1 if reject file is exists then failed this clustering execution 
> and remove reject file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to