[GitHub] [incubator-hudi] Antauri commented on issue #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

GitBox Wed, 08 Apr 2020 08:40:47 -0700

Antauri commented on issue #1394: [HUDI-656][Performance] Return a dummy Spark 
relation after writing the DataFrame
URL: https://github.com/apache/incubator-hudi/pull/1394#issuecomment-611032718
 
 
   Present in 0.5.2-incubating which we're using. We're in development of a 
framework that does S3 to S3 ingestion using Hudi and using Spark SQL writers 
(not RDDs). We have year=x/month=y/day=z/bin=q partitioning. For 3 days and 575 
paths each it takes 3+ minutes between repetitive "listing leaf files and 
directories". In total some 9 minutes for just 3 days.
   
   Any idea when 0.6.0 will be released? And does adding "Hive" as the 
metastore helps in reducing this listing or it doesn't matter?
   
   Thank you kind!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] Antauri commented on issue #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

Reply via email to