bvaradar commented on issue #2269: URL: https://github.com/apache/hudi/issues/2269#issuecomment-731656591
@AakashPradeep : I can quickly tell that the number of partitions is really high relative to the file size in each partition. It looks like each partition has only very little records (Parquet size ~ 400K). S3 listing becomes a huge bottleneck in this case. I have seen S3 listing taking really long time to perform listing for ~100K partitions. With 0.7.0 (next release), we are going to have 0-listing writes supported which will avoid this bottleneck. But generally, you have too many partitions relative to your dataset size. If possible, keeping lower cardinality column as partition would help. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org