Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2024-04-08 Thread via GitHub


nsivabalan commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-2044036786

   hey @njalan @BruceKellan : any follow ups on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1919303669

   @ad1happy2go I did internal benchmarks with different versions of hudi here. 
With metadata enabled between various version, I didn't saw significant 
increase in S3 calls.
   
   @njalan @BruceKellan Did you tried 0.14.X release? Do you still see high S3 
calls only with metadata enabled?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2024-01-10 Thread via GitHub


BruceKellan commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1884566335

   Any updates?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2023-10-24 Thread via GitHub


ad1happy2go commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1776827378

   @njalan Didn't got much time yet to look into this yet. I will prioritize 
this one this week. Thanks. Will update. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2023-10-20 Thread via GitHub


njalan commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1773018752

   @ad1happy2go May I know any updates from you? If can't reduce object list , 
can we cache these metadatas on driver? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2023-10-13 Thread via GitHub


njalan commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1761879043

   @ad1happy2go  Thanks a lot for your help. Just let me know if you want any 
other information from me.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2023-10-13 Thread via GitHub


ad1happy2go commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1761827180

   Thanks a lot for your effort here. @njalan . Really appreciate it. Looks 
like in your case metadata table got more list calls.
   I will work on this. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2023-10-12 Thread via GitHub


njalan commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1759783515

   @ad1happy2go Below are the list count for one spark streaming micro batch:
   bleow are top list opreations(**first line is list count**) for table with 
hudi 0.13.1 and metadata enabled:
   329 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/.hoodie/),
   229 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/),
   50 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/files/),
   42 
(hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/.aux/.bootstrap/.partitions/-----0_1-0-1_01.hfile/),
   33 (hive/warehouse/ods_xxx.db/testing_hudi13/),
   14 
(hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/.hoodie/.temp/),
   10 
(hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/.temp/20231010140342361/),
9 
(hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/.temp/20231010140158325/),
7 
(hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/.hoodie/.temp/20231010140509929/),
7 
(hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/.hoodie/.temp/20231010140342361/),
   
   bleow are top list opreations(**first line is list count**) for table with 
hudi 0.9 and metadata disabled:
   274 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/),
   188 
(hive/warehouse/ods_.db/testing_hudi09/.hoodie/.aux/.bootstrap/.partitions/-----0_1-0-1_01.hfile/),
48 (hive/warehouse/ods_.db/testing_hudi09/),
 9 
(hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140501/),
 9 
(hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140401/),
 9 
(hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140301/),
 9 
(hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140201/),
 9 
(hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140101/),
 5 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/),
 5 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.heartbeat/),
   
   
   Is there any way the reduce the list operation? If one table can reduce 50% 
list operation it can reduce workload significantly where there are  thousands 
of of tables with local deployed object storage cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2023-10-03 Thread via GitHub


ad1happy2go commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1744541419

   @njalan Do you also see similar behaviour for the tables which got written 
with later versions of hudi (0.13) only and not 0.9.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org