codope commented on issue #3218:
URL: https://github.com/apache/hudi/issues/3218#issuecomment-874166812


   @Tandoy The `HoodieClusteringJob` has two steps:
   
   1. Schedule clustering (which is the `spark-submit` with `--schedule` 
option). As seen in the logs screenshot, this gives the instant for which 
clustering is scheduled.
   2. Run clustering (`spark-submit` with `--instant-time` option) where you 
give the instant obtained in step 1. This is the steop which actually executes 
clustering. 
   More details on running `HoodieClusteringJob` is in this 
[RFC](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-SetupforAsyncclusteringJob).
   
   That said, we are going to enhance async clustering support in Hudi so that 
the clustering scheduling as well as execution keeps running in the background 
automatically is it is enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to