xushiyan commented on issue #3697: URL: https://github.com/apache/hudi/issues/3697#issuecomment-927248813
From the parameters you show, i see the problem is mostly caused by not utilizing the machines' resources efficiently. Let's do some math: Say you use 4 m5.4xlarge machines each has 16 cores and 64g memory. set the below confs should allow you to run 19 executors with 1 driver for the spark job. double check spark UI to confirm the executors you're getting ``` spark.driver.cores=3 spark.driver.memory=6g spark.driver.memoryOverhead=2g spark.executor.cores=3 spark.executor.memory=6g spark.executor.memoryOverhead=2g spark.executor.instances=19 spark.sql.shuffle.partitions=200 spark.default.parallelism=200 spark.task.cpus=1 ``` also set these hudi props in your spark writer options ``` "hoodie.upsert.shuffle.parallelism" = 200, "hoodie.insert.shuffle.parallelism" = 200, "hoodie.finalize.write.parallelism" = 200, "hoodie.bulkinsert.shuffle.parallelism" = 200, ``` also you don't need to construct hoodie_key and hoodie_partition yourself, please set the hoodie key generator class options properly in Spark options. Refer to [this blog](https://hudi.incubator.apache.org/blog/2021/02/13/hudi-key-generators/). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org