Balaji Varadarajan created HUDI-637:
---------------------------------------

             Summary: Investigate slower hudi queries in S3 vs HDFS
                 Key: HUDI-637
                 URL: https://issues.apache.org/jira/browse/HUDI-637
             Project: Apache Hudi (incubating)
          Issue Type: Task
          Components: Performance
            Reporter: Balaji Varadarajan
             Fix For: 0.5.2


Hudi queries in S3 takes abnormally longer time compared to AWS. 

S3 listing itself is not taking that long of time. 

PERFORMANCE BUG:

the metadata list performance is likely causing performance issues with hudi.

 

{{scala> stopwatch(\{  sql("SELECT * FROM 
ap_invoices_all_compacted_s3").count})}}

{{Elapsed time: 1m 55.078473113s                                                
  
res2: Long = xxxxxxxxxxxx}}

{{}}

{{scala> stopwatch(\{  sql("SELECT * FROM ap_invoices_all_compacted").count})  
-- this is the exact same table in hdfs}}

{{Elapsed time: 6.581217052s                                                    
  
res3: Long = xxxxxxxxxxx}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to