Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
nsivabalan commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-2044036786 hey @njalan @BruceKellan : any follow ups on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
ad1happy2go commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-1919303669 @ad1happy2go I did internal benchmarks with different versions of hudi here. With metadata enabled between various version, I didn't saw significant increase in S3 calls. @njalan @BruceKellan Did you tried 0.14.X release? Do you still see high S3 calls only with metadata enabled? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
BruceKellan commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-1884566335 Any updates? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
ad1happy2go commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-1776827378 @njalan Didn't got much time yet to look into this yet. I will prioritize this one this week. Thanks. Will update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
njalan commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-1773018752 @ad1happy2go May I know any updates from you? If can't reduce object list , can we cache these metadatas on driver? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
njalan commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-1761879043 @ad1happy2go Thanks a lot for your help. Just let me know if you want any other information from me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
ad1happy2go commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-1761827180 Thanks a lot for your effort here. @njalan . Really appreciate it. Looks like in your case metadata table got more list calls. I will work on this. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
njalan commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-1759783515 @ad1happy2go Below are the list count for one spark streaming micro batch: bleow are top list opreations(**first line is list count**) for table with hudi 0.13.1 and metadata enabled: 329 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/.hoodie/), 229 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/), 50 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/files/), 42 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/.aux/.bootstrap/.partitions/-----0_1-0-1_01.hfile/), 33 (hive/warehouse/ods_xxx.db/testing_hudi13/), 14 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/.hoodie/.temp/), 10 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/.temp/20231010140342361/), 9 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/.temp/20231010140158325/), 7 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/.hoodie/.temp/20231010140509929/), 7 (hive/warehouse/ods_xxx.db/testing_hudi13/.hoodie/metadata/.hoodie/.temp/20231010140342361/), bleow are top list opreations(**first line is list count**) for table with hudi 0.9 and metadata disabled: 274 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/), 188 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.aux/.bootstrap/.partitions/-----0_1-0-1_01.hfile/), 48 (hive/warehouse/ods_.db/testing_hudi09/), 9 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140501/), 9 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140401/), 9 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140301/), 9 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140201/), 9 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/20231010140101/), 5 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.temp/), 5 (hive/warehouse/ods_.db/testing_hudi09/.hoodie/.heartbeat/), Is there any way the reduce the list operation? If one table can reduce 50% list operation it can reduce workload significantly where there are thousands of of tables with local deployed object storage cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]
ad1happy2go commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-1744541419 @njalan Do you also see similar behaviour for the tables which got written with later versions of hudi (0.13) only and not 0.9. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org