[GitHub] [hudi] codope edited a comment on issue #3218: [SUPPORT]Failed to cluster the hudi COW table using HoodieClusteringJob

GitBox Mon, 05 Jul 2021 07:46:43 -0700


codope edited a comment on issue #3218:
URL: https://github.com/apache/hudi/issues/3218#issuecomment-874166812



   @Tandoy The `HoodieClusteringJob` has two steps:
   
   1. Schedule clustering (which is the `spark-submit` with `--schedule` 
option). As seen in the logs screenshot, this gives the instant for which 
clustering is scheduled.
   2. Run clustering (`spark-submit` with `--instant-time` option) where you 
give the instant obtained in step 1. This is the step which actually executes 
clustering. 
   More details on running `HoodieClusteringJob` is in this 
[RFC](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-SetupforAsyncclusteringJob).
   
   That said, we are going to enhance async clustering support in Hudi so that 
the clustering scheduling as well as execution keeps running in the background 
automatically if it is enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope edited a comment on issue #3218: [SUPPORT]Failed to cluster the hudi COW table using HoodieClusteringJob

Reply via email to