[ https://issues.apache.org/jira/browse/HUDI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zouxxyy closed HUDI-5289. ------------------------- Resolution: Fixed > WriteStatus RDD is recalculated in cluster > ------------------------------------------ > > Key: HUDI-5289 > URL: https://issues.apache.org/jira/browse/HUDI-5289 > Project: Apache Hudi > Issue Type: Improvement > Components: spark > Reporter: zouxxyy > Priority: Major > Attachments: image-2022-11-29-10-24-08-853.png, > image-2022-11-29-10-25-29-546.png, image-2022-11-29-10-26-22-050.png > > > Step: > {code:java} > spark-submit \ > --class org.apache.hudi.utilities.HoodieClusteringJob \ > --conf spark.driver.memory=40G \ > --conf spark.executor.instances=20 \ > --conf spark.executor.memory=40G \ > --conf spark.executor.cores=4 \ > hudi-utilities-bundle_2.11-0.12.0.jar \ > --props clusteringjob.properties \ > --mode scheduleAndExecute \ > --base-path xxx \ > --table-name xxx \ > --spark-memory 40g {code} > The following are the two stages about the job, they are all related to the > calculation of WriteStatus, but some tasks in stage96 have been recalculated > which taking more than ten minutes > !image-2022-11-29-10-24-08-853.png|width=1560,height=57! > here is stage 65 > !image-2022-11-29-10-25-29-546.png|width=640,height=515! > here is stage 96 > !image-2022-11-29-10-26-22-050.png|width=643,height=435! -- This message was sent by Atlassian Jira (v8.20.10#820010)