Balaji Varadarajan created HUDI-637: ---------------------------------------
Summary: Investigate slower hudi queries in S3 vs HDFS Key: HUDI-637 URL: https://issues.apache.org/jira/browse/HUDI-637 Project: Apache Hudi (incubating) Issue Type: Task Components: Performance Reporter: Balaji Varadarajan Fix For: 0.5.2 Hudi queries in S3 takes abnormally longer time compared to AWS. S3 listing itself is not taking that long of time. PERFORMANCE BUG: the metadata list performance is likely causing performance issues with hudi. {{scala> stopwatch(\{ sql("SELECT * FROM ap_invoices_all_compacted_s3").count})}} {{Elapsed time: 1m 55.078473113s res2: Long = xxxxxxxxxxxx}} {{}} {{scala> stopwatch(\{ sql("SELECT * FROM ap_invoices_all_compacted").count}) -- this is the exact same table in hdfs}} {{Elapsed time: 6.581217052s res3: Long = xxxxxxxxxxx}} -- This message was sent by Atlassian Jira (v8.3.4#803005)