danny0405 commented on code in PR #8568: URL: https://github.com/apache/hudi/pull/8568#discussion_r1194859993
########## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java: ########## @@ -179,7 +179,7 @@ private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, List< TableServiceType.CLUSTER, writeMetadata.getCommitMetadata().get(), table, instant); // whether to clean up the input base parquet files used for clustering - if (!conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) { + if (!conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && !isCleaning) { LOG.info("Running inline clean"); Review Comment: After some analysis I find that there is an option `hoodie.clean.allow.multiple` for multiple writer cleaning, and by default it is true. I also find that the clean action executor would try to clean all the pending cleaning instants on the timeline if `hoodie.clean.allow.multiple` is enabled. But in flink streaming, the cleaning is scheduled and executed in a single worker thread pool, that means at most 1 cleaning task is running for streaming pipeline. But there is possibility for batch ingestion job, the `#open` method and `#doCommit` method can trigger the cleaning in separeate threads. Is that your case here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org