bvaradar commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-658902047
@zuyanton : HoodieParquetInputFormat relies on hadoop-mapreduce FileInputFormat listing implementation to perform listing. There is a knob in base FileInputFormat to tune listing parallelism. "mapreduce.input.fileinputformat.list-status.num-threads" The above config is set to 1 by default. Can you try increasing it to achieve speedup. @zuyanton : We are also working on RFC-15 https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements to holistically eliminate file listing and improve query performance. cc @umehrot2 for any other suggestions. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org