codope edited a comment on issue #3218: URL: https://github.com/apache/hudi/issues/3218#issuecomment-874166812
@Tandoy The `HoodieClusteringJob` has two steps: 1. Schedule clustering (which is the `spark-submit` with `--schedule` option). As seen in the logs screenshot, this gives the instant for which clustering is scheduled. 2. Run clustering (`spark-submit` with `--instant-time` option) where you give the instant obtained in step 1. This is the step which actually executes clustering. More details on running `HoodieClusteringJob` is in this [RFC](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-SetupforAsyncclusteringJob). That said, we are going to enhance async clustering support in Hudi so that the clustering scheduling as well as execution keeps running in the background automatically if it is enabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org