bvaradar commented on issue #2269:
URL: https://github.com/apache/hudi/issues/2269#issuecomment-731656591


   @AakashPradeep : I can quickly tell that the number of partitions is really 
high relative to the file size in each partition. It looks like each partition 
has only very little records (Parquet size ~ 400K).  S3 listing becomes a huge 
bottleneck in this case. I have seen S3 listing taking really long time to 
perform listing for ~100K partitions.  With 0.7.0 (next release), we are going 
to have 0-listing writes supported which will avoid this bottleneck.  
   
   But generally, you have too many partitions relative to your dataset size. 
If possible, keeping lower cardinality column as partition would help.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to