HEPBO3AH commented on issue #9612: URL: https://github.com/apache/hudi/issues/9612#issuecomment-1709232835
>May I know which Hudi version you're using? We are on version 0.11. > Also, can you confirm whether multiple GET requests for the same object are due to different byte ranges. Each byte range request counts as a separate GET request. I'll get back to you on the range question, but I can understand that part on the parquet files. What I'm more interested in is the large amount of calls made to files/objects in `/.hoodie` folder **and** on the /.hoodie` object itself which is completely unnecessary in S3 type of solution which doesn't have concept of folders. I created the example above to demonstrate the pattern, but in the production we had an issue where queries were throttled on S3 calls. The number of request made to `/.hoodie` as HEAD was in **several hundred thousands**. 1000x more calls than partition level `hoodie_partition_metadata` calls: ``` |my_table/.hoodie/ |HEAD |701832| |my_table/.hoodie |HEAD |60334 | ``` > I am assuming, based on objects listed above, that the metadata table is disabled. Did you also try with metadata enabled? Metadata table is enabled. However we use it in very limited capacity. Mostly for partition discovery. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org