zuyanton commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-659836676
@bvaradar we dont see similar issue with regular non hudi tables saved to s3 in parquet format. for regular tables "overhead" is the same and under one minute despite the number of partitions. Regular tables with 20k partitions as well as 100 partition take the same time to "load" before spark starts running its jobs where is hudi table on s3 becomes slow with 5k+ partitions. Although we use EMR 5.28 which comes with EMRFS s3 optimized committer enabled in spark by default ,so I assume whatever bottlenecks s3 has, are addressed in the committer. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org