[ 
https://issues.apache.org/jira/browse/HUDI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy closed HUDI-5289.
-------------------------
    Resolution: Fixed

> WriteStatus RDD is recalculated in cluster
> ------------------------------------------
>
>                 Key: HUDI-5289
>                 URL: https://issues.apache.org/jira/browse/HUDI-5289
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: spark
>            Reporter: zouxxyy
>            Priority: Major
>         Attachments: image-2022-11-29-10-24-08-853.png, 
> image-2022-11-29-10-25-29-546.png, image-2022-11-29-10-26-22-050.png
>
>
> Step:
> {code:java}
> spark-submit \
> --class org.apache.hudi.utilities.HoodieClusteringJob \
> --conf spark.driver.memory=40G \
> --conf spark.executor.instances=20 \
> --conf spark.executor.memory=40G \
> --conf spark.executor.cores=4 \
> hudi-utilities-bundle_2.11-0.12.0.jar \
> --props clusteringjob.properties \
> --mode scheduleAndExecute \
> --base-path xxx \
> --table-name xxx \
> --spark-memory 40g {code}
> The following are the two stages about the job, they are all related to the 
> calculation of WriteStatus, but some tasks in stage96 have been recalculated 
> which taking more than ten minutes
> !image-2022-11-29-10-24-08-853.png|width=1560,height=57!
> here is stage 65
> !image-2022-11-29-10-25-29-546.png|width=640,height=515!
> here is stage 96
> !image-2022-11-29-10-26-22-050.png|width=643,height=435!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to