rubenssoto commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-678550642
Hi Guys Aws Support answer me, its the same topic that we debate here. Hello, Thank you for your patience. I have heard back from the Service team, and here's why such behavior has been observed when querying Apache Hudi tables: When running 'SELECT COUNT(1)' queries on Hudi tables using HoodieParquetInputFormat, Athena has to bypass it's own implementation of S3 file listing. Thus Hudi tables can be much less efficient in a query where the bottleneck is the speed at which files are listed. The Apache Hudi community is already aware of there being a performance impact caused by their S3 listing logic[1], as also has been rightly suggested on the thread you created. Further, 'SELECT COUNT(1)' queries over either format are nearly instantaneous to process on the Query Engine and measure how quickly the S3 listing completes. If you instead compare performance on more complex queries (that require meaningful work on both sides), you should see a less pronounced difference in the results. I hope this information helps. Feel free to reach out to me with any additional queries you may have on this topic. I will be glad to assist you! References: [1]. S3 slow file listing (Hudi) - https://github.com/apache/hudi/issues/1829 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org