zuyanton commented on issue #1829:
URL: https://github.com/apache/hudi/issues/1829#issuecomment-659836676


   @bvaradar we dont see similar issue with regular non hudi tables saved to s3 
in parquet format. for regular tables "overhead" is the same and under one 
minute despite the number of partitions. Regular tables with 20k partitions as 
well as 100 partition take the same time to "load" before spark starts running 
its jobs where is hudi table on s3  becomes slow with 5k+ partitions. Although 
we use EMR 5.28 which comes with EMRFS s3 optimized committer enabled in spark 
by default ,so I assume whatever bottlenecks s3 has, are addressed in the 
committer. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to