HEPBO3AH commented on issue #9612:
URL: https://github.com/apache/hudi/issues/9612#issuecomment-1709232835

   >May I know which Hudi version you're using?
   
   We are on version 0.11.
   
   > Also, can you confirm whether multiple GET requests for the same object 
are due to different byte ranges. Each byte range request counts as a separate 
GET request.
   
   I'll get back to you on the range question, but I can understand that part 
on the parquet files.  
   What I'm more interested in is the large amount of calls made to 
files/objects in `/.hoodie` folder **and** on the /.hoodie` object itself which 
is completely unnecessary in S3 type of solution which doesn't have concept of 
folders.
   I created the example above to demonstrate the pattern, but in the 
production we had an issue where queries were throttled on S3 calls. The number 
of request made to `/.hoodie` as HEAD was in **several hundred thousands**. 
1000x more calls than partition level `hoodie_partition_metadata` calls:
   
   ```
   |my_table/.hoodie/      |HEAD      |701832|
   |my_table/.hoodie       |HEAD      |60334 |
   ```
   
   >  I am assuming, based on objects listed above, that the metadata table is 
disabled. Did you also try with metadata enabled?  
   
   Metadata table is enabled. However we use it in very limited capacity. 
Mostly for partition discovery.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to