Rap70r commented on issue #3697:
URL: https://github.com/apache/hudi/issues/3697#issuecomment-926186766


   Hi @xushiyan,
   
   We did some tests using a different instance type (20 machines of type 
m5.2xlarge) and less partitions.
   Here's the job flow for an upsert of 130K records (330 MB) against a Hudi 
collection with 230 partitions and 60 million records (6.2 GB) sitting on S3:
   
![image](https://user-images.githubusercontent.com/22181358/134587957-e66771bd-5072-4fbb-977b-a2f1e4e90048.png)
   
   The job took ~6.3 min to finish. We would like to improve that time further. 
Seems like 6.3 minutes is too much for 130K records using 20 instances of type 
m5.2xlarge. And it seems like most of the time was taken by UpsertPartitioner 
step.
   Do you recommend any further modifications or configurations we could test 
with to reduce the time?
   
   Thank you 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to