Re: [PR] [HUDI-7156] Abstract an independent hoodie table filesystem view lock [hudi]
zhuanshenbsj1 closed pull request #10197: [HUDI-7156] Abstract an independent hoodie table filesystem view lock URL: https://github.com/apache/hudi/pull/10197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]The number of tasks in each distinct stage of building workload profile is always 60 [hudi]
ad1happy2go commented on issue #10972: URL: https://github.com/apache/hudi/issues/10972#issuecomment-2041903783 @MrAladdin Can you provide the writer configurations you are using? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7559] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]
hudi-bot commented on PR #10947: URL: https://github.com/apache/hudi/pull/10947#issuecomment-2041902574 ## CI report: * 0c84a761b5f5378bcd51d987c9f29b1f649cf820 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23121) * eb6439ab3e1f95e90411c64afd1e5ef636dbeacc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23146) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Could not sync using the meta sync class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool [hudi]
ad1happy2go commented on issue #10968: URL: https://github.com/apache/hudi/issues/10968#issuecomment-2041901636 @mattssll Looks like hudi bundle jar is not in class path. Can you let us know the details of the hudi jars configured on the EKS cluster? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7559] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]
hudi-bot commented on PR #10947: URL: https://github.com/apache/hudi/pull/10947#issuecomment-2041895291 ## CI report: * 0c84a761b5f5378bcd51d987c9f29b1f649cf820 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23121) * eb6439ab3e1f95e90411c64afd1e5ef636dbeacc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7230] stream read supports skipping insert overwrite instant [hudi]
hudi-bot commented on PR #10328: URL: https://github.com/apache/hudi/pull/10328#issuecomment-2041886839 ## CI report: * cc1e51661db71367549360a1e89b6bd22ea24d8a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23143) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]
hudi-bot commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041848631 ## CI report: * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23142) * 7dd1a19f792e3c0b8708de63e0b83810af709ce3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23145) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]
the-other-tim-brown commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041848520 > Just from the `File` notion, don't think we should take any partition related variables into it. Is there any solution for the optimization without modifying these two abstractions? Can you explain why? There is other metadata already associated with these objects like file group and commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]
hudi-bot commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041842959 ## CI report: * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23142) * 7dd1a19f792e3c0b8708de63e0b83810af709ce3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
danny0405 commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2041837537 Can you check the CI failures? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]
hudi-bot commented on PR #10976: URL: https://github.com/apache/hudi/pull/10976#issuecomment-2041837370 ## CI report: * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23144) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]
hudi-bot commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041837346 ## CI report: * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23142) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7230] stream read supports skipping insert overwrite instant [hudi]
hudi-bot commented on PR #10328: URL: https://github.com/apache/hudi/pull/10328#issuecomment-2041836494 ## CI report: * c1d0db68c7e167de36d6090348bee9864191ac1d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21514) * cc1e51661db71367549360a1e89b6bd22ea24d8a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23143) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6330) Update user document to introduce this feature
[ https://issues.apache.org/jira/browse/HUDI-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6330: - Labels: pull-request-available (was: ) > Update user document to introduce this feature > -- > > Key: HUDI-6330 > URL: https://issues.apache.org/jira/browse/HUDI-6330 > Project: Apache Hudi > Issue Type: Sub-task > Components: docs, flink >Reporter: Jing Zhang >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-6330][DOCS] Update user doc to show how to use consistent bucket index for Flink engine [hudi]
beyond1920 opened a new pull request, #10977: URL: https://github.com/apache/hudi/pull/10977 ### Change Logs Update user doc to show how to use consistent bucket index for Flink engine ### Impact None ### Risk level (write none, low medium or high below) None ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]
hudi-bot commented on PR #10976: URL: https://github.com/apache/hudi/pull/10976#issuecomment-2041804155 ## CI report: * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7230] stream read supports skipping insert overwrite instant [hudi]
hudi-bot commented on PR #10328: URL: https://github.com/apache/hudi/pull/10328#issuecomment-2041803428 ## CI report: * c1d0db68c7e167de36d6090348bee9864191ac1d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21514) * cc1e51661db71367549360a1e89b6bd22ea24d8a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]
hudi-bot commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041804102 ## CI report: * 51e070c635f7207a2b77ba1896ecd29694f54eae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23141) * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23142) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7572] Avoid to schedule empty compaction plan without log files [hudi]
hudi-bot commented on PR #10974: URL: https://github.com/apache/hudi/pull/10974#issuecomment-2041804074 ## CI report: * 451064a8002cf5544e764624a31efe7d64671406 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23140) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]
hudi-bot commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041798819 ## CI report: * 51e070c635f7207a2b77ba1896ecd29694f54eae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23141) * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7230] stream read supports skipping insert overwrite instant [hudi]
zhuanshenbsj1 commented on PR #10328: URL: https://github.com/apache/hudi/pull/10328#issuecomment-2041798064 cc @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7575) Avoid recomputing list of pending replacecommits in FSView code
[ https://issues.apache.org/jira/browse/HUDI-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7575: - Labels: pull-request-available (was: ) > Avoid recomputing list of pending replacecommits in FSView code > --- > > Key: HUDI-7575 > URL: https://issues.apache.org/jira/browse/HUDI-7575 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > > When checking if a base file is part of a pending clustering, the code will > construct the same list repeatedly leading to unnecessary overhead. The class > should gather this list once and persist it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
pt657407064 commented on code in PR #10973: URL: https://github.com/apache/hudi/pull/10973#discussion_r1555183250 ## hudi-cli/src/main/resources/application.yml: ## @@ -20,4 +20,7 @@ spring: shell: history: enabled: true - name: hoodie-cmd.log \ No newline at end of file + name: hoodie-cmd.log +command: + version: +template: "classpath:version.txt" Review Comment: It was working until this file got removed since the hudi released 0.11.1. https://github.com/apache/hudi/blob/release-0.11.0/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieSplashScreen.java#L60 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]
the-other-tim-brown opened a new pull request, #10976: URL: https://github.com/apache/hudi/pull/10976 ### Change Logs Avoids repeatedly creating a timeline with pending replace instants when creating the FSView. ### Impact - Lowers the overhead of creating the FSView in terms of objects created ### Risk level (write none, low medium or high below) Low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
pt657407064 commented on code in PR #10973: URL: https://github.com/apache/hudi/pull/10973#discussion_r1555183250 ## hudi-cli/src/main/resources/application.yml: ## @@ -20,4 +20,7 @@ spring: shell: history: enabled: true - name: hoodie-cmd.log \ No newline at end of file + name: hoodie-cmd.log +command: + version: +template: "classpath:version.txt" Review Comment: It was working until this file got removed from the hudi released. https://github.com/apache/hudi/blob/release-0.11.0/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieSplashScreen.java#L60 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]
hudi-bot commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041793570 ## CI report: * 51e070c635f7207a2b77ba1896ecd29694f54eae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
pt657407064 commented on code in PR #10973: URL: https://github.com/apache/hudi/pull/10973#discussion_r1555181705 ## hudi-cli/src/main/resources/application.yml: ## @@ -20,4 +20,7 @@ spring: shell: history: enabled: true - name: hoodie-cmd.log \ No newline at end of file + name: hoodie-cmd.log +command: + version: +template: "classpath:version.txt" Review Comment: Yes, The `version` command in hudi CLI is not showing the current Hudi version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
pt657407064 commented on code in PR #10973: URL: https://github.com/apache/hudi/pull/10973#discussion_r1555181705 ## hudi-cli/src/main/resources/application.yml: ## @@ -20,4 +20,7 @@ spring: shell: history: enabled: true - name: hoodie-cmd.log \ No newline at end of file + name: hoodie-cmd.log +command: + version: +template: "classpath:version.txt" Review Comment: The version command in CLI is not showing the current Hudi version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7576) Add partitionPath to the HoodieBaseFile and HoodieLogFile objects
[ https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7576: - Labels: pull-request-available (was: ) > Add partitionPath to the HoodieBaseFile and HoodieLogFile objects > - > > Key: HUDI-7576 > URL: https://issues.apache.org/jira/browse/HUDI-7576 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > > Adding this field to the classes will allow us to avoid repeatedly computing > the partition path per file in other parts of the code. This can cut down on > the CPU overhead associated with creating the FS View. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]
the-other-tim-brown opened a new pull request, #10975: URL: https://github.com/apache/hudi/pull/10975 ### Change Logs - Adds partitionPath string to the HoodieBaseFile and HoodieLogFile to avoid computing it multiple times for a single instance of these files. - Minor optimization on partition path computation in case where CachingPath is not used ### Impact - Reduces overhead of constructing FSViews. We see a non-negligible amount of CPU time spent on computing the partition path for each file when it can actually simply be taken in as an input in some cases. ### Risk level (write none, low medium or high below) Low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7572] Avoid to schedule empty compaction plan without log files [hudi]
hudi-bot commented on PR #10974: URL: https://github.com/apache/hudi/pull/10974#issuecomment-2041758073 ## CI report: * 451064a8002cf5544e764624a31efe7d64671406 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23140) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
hudi-bot commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2041758041 ## CI report: * 6827a922b5eae447c97a294cc9f5f9520761bb10 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23139) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-6854) Change default payload type to HOODIE_AVRO_DEFAULT
[ https://issues.apache.org/jira/browse/HUDI-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-6854. -- > Change default payload type to HOODIE_AVRO_DEFAULT > -- > > Key: HUDI-6854 > URL: https://issues.apache.org/jira/browse/HUDI-6854 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Current default is OVERWRITE_LATEST which instantiates > OverwriteWithLatestAvroPayload but it's not intuitive when latest gets > written and user sets some precombine field and expects to merge records > based on that field. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-6854) Change default payload type to HOODIE_AVRO_DEFAULT
[ https://issues.apache.org/jira/browse/HUDI-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6854. Resolution: Fixed Fixed via master branch: 5519e9c13b3563760e44712112f9bf93faa4b40e > Change default payload type to HOODIE_AVRO_DEFAULT > -- > > Key: HUDI-6854 > URL: https://issues.apache.org/jira/browse/HUDI-6854 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Current default is OVERWRITE_LATEST which instantiates > OverwriteWithLatestAvroPayload but it's not intuitive when latest gets > written and user sets some precombine field and expects to merge records > based on that field. -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated (b487e9826d0 -> 5519e9c13b3)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from b487e9826d0 [MINOR] Removed FSUtils.makeBaseFileName without fileExt param (#10967) add 5519e9c13b3 [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT (#10949) No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/hudi/config/HoodiePayloadConfig.java | 4 ++-- .../src/main/java/org/apache/hudi/config/HoodieWriteConfig.java | 6 +++--- .../org/apache/hudi/common/model/DefaultHoodieRecordPayload.java | 4 +++- .../apache/hudi/common/model/OverwriteWithLatestAvroPayload.java | 2 -- .../main/java/org/apache/hudi/common/model/RecordPayloadType.java | 2 +- .../main/java/org/apache/hudi/common/table/HoodieTableConfig.java | 6 +++--- .../scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala| 8 .../org/apache/hudi/functional/TestHiveTableSchemaEvolution.java | 3 ++- .../org/apache/hudi/functional/TestBasicSchemaEvolution.scala | 6 -- .../org/apache/spark/sql/hudi/common/TestHoodieOptionConfig.scala | 4 ++-- .../test/scala/org/apache/spark/sql/hudi/ddl/TestSpark3DDL.scala | 7 ++- 11 files changed, 30 insertions(+), 22 deletions(-)
Re: [PR] [HUDI-7572] Avoid to schedule empty compaction plan without log files [hudi]
hudi-bot commented on PR #10974: URL: https://github.com/apache/hudi/pull/10974#issuecomment-2041752216 ## CI report: * 451064a8002cf5544e764624a31efe7d64671406 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
danny0405 merged PR #10949: URL: https://github.com/apache/hudi/pull/10949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1555159066 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -991,9 +1001,9 @@ private void updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata, private HoodieData getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String indexPartition, String instantTime) throws Exception { HoodieFunctionalIndexDefinition indexDefinition = getFunctionalIndexDefinition(indexPartition); List> partitionFileSlicePairs = new ArrayList<>(); -HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient); +HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(dataMetaClient); commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) -> { - List fileSlices = getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, Option.ofNullable(fsView), dataPartition); + List fileSlices = getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, Option.ofNullable(fsView), dataPartition); Review Comment: > Not following you here. In multi-writer scenario, there is a new latest file slice due to instant t1 How about this client triggers cleaning for t0 with a very radical strategy while we do this loading check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] upgrade to maven-surefire-plugin 3.2.5 [hudi]
danny0405 commented on PR #10969: URL: https://github.com/apache/hudi/pull/10969#issuecomment-2041747438 It looks like the 3.2.5 surefire plugin has some validation check: ```java Caused by: org.apache.maven.plugin.MojoFailureException: No tests matching pattern "skipJavaTests" were executed! (Set -Dsurefire.failIfNoSpecifiedTests=false to ignore this error.) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute (AbstractSurefireMojo.java:902) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (56142a0ff61 -> b487e9826d0)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 56142a0ff61 [MINOR] use Temurin jdk (#10948) add b487e9826d0 [MINOR] Removed FSUtils.makeBaseFileName without fileExt param (#10967) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/client/BaseHoodieClient.java | 28 ++- .../hudi/client/BaseHoodieTableServiceClient.java | 57 ++ .../apache/hudi/client/BaseHoodieWriteClient.java | 24 - .../apache/hudi/client/HoodieJavaWriteClient.java | 22 - 4 files changed, 43 insertions(+), 88 deletions(-)
Re: [PR] [MINOR] Removed code duplicates in HoodieClients [hudi]
danny0405 merged PR #10967: URL: https://github.com/apache/hudi/pull/10967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
danny0405 commented on code in PR #10973: URL: https://github.com/apache/hudi/pull/10973#discussion_r1555154882 ## hudi-cli/src/main/resources/application.yml: ## @@ -20,4 +20,7 @@ spring: shell: history: enabled: true - name: hoodie-cmd.log \ No newline at end of file + name: hoodie-cmd.log +command: + version: +template: "classpath:version.txt" Review Comment: Not very familiar with the Spring stuff, are we fixing the spring cmd? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7577) Avoid MDT compaction instant time conflicts
Danny Chen created HUDI-7577: Summary: Avoid MDT compaction instant time conflicts Key: HUDI-7577 URL: https://issues.apache.org/jira/browse/HUDI-7577 Project: Apache Hudi Issue Type: Improvement Components: core Reporter: Danny Chen Assignee: Danny Chen Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7577) Avoid MDT compaction instant time conflicts
[ https://issues.apache.org/jira/browse/HUDI-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7577: - Sprint: Sprint 2024-03-25 > Avoid MDT compaction instant time conflicts > --- > > Key: HUDI-7577 > URL: https://issues.apache.org/jira/browse/HUDI-7577 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7572) Avoid to schedule empty compaction plan without log files
[ https://issues.apache.org/jira/browse/HUDI-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7572: - Status: Patch Available (was: In Progress) > Avoid to schedule empty compaction plan without log files > - > > Key: HUDI-7572 > URL: https://issues.apache.org/jira/browse/HUDI-7572 > Project: Apache Hudi > Issue Type: Improvement > Components: table-service >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > After change to [loosen the compaction for > MDT|https://issues.apache.org/jira/browse/HUDI-7572], there is rare case the > same compaction instant time got used to schedule for multiple times, we > better optimize the compactor to avoid empty compaction plan generation. > Note: although we have a active timeline check to avoid the repetative > scheduling, there is still little chance the compaction already got archived. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7572) Avoid to schedule empty compaction plan without log files
[ https://issues.apache.org/jira/browse/HUDI-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7572: - Labels: pull-request-available (was: ) > Avoid to schedule empty compaction plan without log files > - > > Key: HUDI-7572 > URL: https://issues.apache.org/jira/browse/HUDI-7572 > Project: Apache Hudi > Issue Type: Improvement > Components: table-service >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > After change to [loosen the compaction for > MDT|https://issues.apache.org/jira/browse/HUDI-7572], there is rare case the > same compaction instant time got used to schedule for multiple times, we > better optimize the compactor to avoid empty compaction plan generation. > Note: although we have a active timeline check to avoid the repetative > scheduling, there is still little chance the compaction already got archived. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7572] Avoid to schedule empty compaction plan without log files [hudi]
danny0405 opened a new pull request, #10974: URL: https://github.com/apache/hudi/pull/10974 ### Change Logs If there is no log files in the compaction plan, skip the compaction. ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
hudi-bot commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2041711400 ## CI report: * 6827a922b5eae447c97a294cc9f5f9520761bb10 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23139) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]
hudi-bot commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2041705454 ## CI report: * 6827a922b5eae447c97a294cc9f5f9520761bb10 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7572) Avoid to schedule empty compaction plan without log files
[ https://issues.apache.org/jira/browse/HUDI-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7572: - Status: In Progress (was: Open) > Avoid to schedule empty compaction plan without log files > - > > Key: HUDI-7572 > URL: https://issues.apache.org/jira/browse/HUDI-7572 > Project: Apache Hudi > Issue Type: Improvement > Components: table-service >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 1.0.0 > > > After change to [loosen the compaction for > MDT|https://issues.apache.org/jira/browse/HUDI-7572], there is rare case the > same compaction instant time got used to schedule for multiple times, we > better optimize the compactor to avoid empty compaction plan generation. > Note: although we have a active timeline check to avoid the repetative > scheduling, there is still little chance the compaction already got archived. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [MINOR] Hudi CLI version command output empty string [hudi]
pt657407064 opened a new pull request, #10973: URL: https://github.com/apache/hudi/pull/10973 ### Change Logs Hudi Cli version command output empty string. Adding property files to output the version number according with the hudi parent project version. ### Impact Will cause confusion for client without knowing the version number ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] [hudi]
MrAladdin opened a new issue, #10972: URL: https://github.com/apache/hudi/issues/10972 **Describe the problem you faced** spark structured streaming upsert hudi(mor、RECORD_INDEX) --- very time consuming : 1、The number of tasks in each distinct stage of building workload profile is always 60, and there is a severe data skew. I want to know why it's always 60, how to adjust, the reasons for data skew and optimization solutions. I have done my best. **Environment Description** * Hudi version :0.14.1 * Spark version :3.4.1 * Hive version :3.1.2 * Hadoop version :3.1.3 * Storage (HDFS/S3/GCS..) :hdfs * Running on Docker? (yes/no) :no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] Update blogs [hudi]
bhasudha commented on PR #10971: URL: https://github.com/apache/hudi/pull/10971#issuecomment-2041639002 Tested locally. The images wont be loaded until site is published. https://github.com/apache/hudi/assets/2179254/c15c0646-cea5-47dd-bc20-b3d4224a9845;> https://github.com/apache/hudi/assets/2179254/78cef6c8-1075-49d8-8282-d0d0273ce87d;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [DOCS] Update blogs [hudi]
bhasudha opened a new pull request, #10971: URL: https://github.com/apache/hudi/pull/10971 ### Change Logs added new blogs to site ### Impact low. site updates ### Risk level (write none, low medium or high below) none. site updates ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7576) Add partitionPath to the HoodieBaseFile and HoodieLogFile objects
Timothy Brown created HUDI-7576: --- Summary: Add partitionPath to the HoodieBaseFile and HoodieLogFile objects Key: HUDI-7576 URL: https://issues.apache.org/jira/browse/HUDI-7576 Project: Apache Hudi Issue Type: Improvement Reporter: Timothy Brown Adding this field to the classes will allow us to avoid repeatedly computing the partition path per file in other parts of the code. This can cut down on the CPU overhead associated with creating the FS View. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7575) Avoid recomputing list of pending replacecommits in FSView code
Timothy Brown created HUDI-7575: --- Summary: Avoid recomputing list of pending replacecommits in FSView code Key: HUDI-7575 URL: https://issues.apache.org/jira/browse/HUDI-7575 Project: Apache Hudi Issue Type: Improvement Reporter: Timothy Brown When checking if a base file is part of a pending clustering, the code will construct the same list repeatedly leading to unnecessary overhead. The class should gather this list once and persist it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7576) Add partitionPath to the HoodieBaseFile and HoodieLogFile objects
[ https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-7576: --- Assignee: Timothy Brown > Add partitionPath to the HoodieBaseFile and HoodieLogFile objects > - > > Key: HUDI-7576 > URL: https://issues.apache.org/jira/browse/HUDI-7576 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > > Adding this field to the classes will allow us to avoid repeatedly computing > the partition path per file in other parts of the code. This can cut down on > the CPU overhead associated with creating the FS View. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7575) Avoid recomputing list of pending replacecommits in FSView code
[ https://issues.apache.org/jira/browse/HUDI-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-7575: --- Assignee: Timothy Brown > Avoid recomputing list of pending replacecommits in FSView code > --- > > Key: HUDI-7575 > URL: https://issues.apache.org/jira/browse/HUDI-7575 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > > When checking if a base file is part of a pending clustering, the code will > construct the same list repeatedly leading to unnecessary overhead. The class > should gather this list once and persist it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2041536022 ## CI report: * 1b65081255315b4c5129b2d5ccea4c097ca15649 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23137) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]
hudi-bot commented on PR #10635: URL: https://github.com/apache/hudi/pull/10635#issuecomment-2041535838 ## CI report: * a6b4e7f80ed04f25241504c833f9b85b4331f1fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23138) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]
hudi-bot commented on PR #10635: URL: https://github.com/apache/hudi/pull/10635#issuecomment-2041523639 ## CI report: * 52eacd02a772c9a06d92784c1b325e6ac0f66da9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22412) * a6b4e7f80ed04f25241504c833f9b85b4331f1fd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23138) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]
hudi-bot commented on PR #10635: URL: https://github.com/apache/hudi/pull/10635#issuecomment-2041521846 ## CI report: * 52eacd02a772c9a06d92784c1b325e6ac0f66da9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22412) * a6b4e7f80ed04f25241504c833f9b85b4331f1fd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2041508051 ## CI report: * 1b65081255315b4c5129b2d5ccea4c097ca15649 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23137) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2041505684 ## CI report: * 1b65081255315b4c5129b2d5ccea4c097ca15649 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]
wombatu-kun opened a new pull request, #10970: URL: https://github.com/apache/hudi/pull/10970 ### Change Logs Hudi callback URL's doesn't support passing the custom headers as of now. Implemented a way to pass them and use it for callback: - added config param `hoodie.write.commit.callback.http.custom.headers` to HoodieWriteConfig (HoodieWriteCommitCallbackConfig); - in this config param user can set all his custom headers in the form: `header_name1:value 1;header_name2:value2`; - this string is parsed and send as http headers with callback request. ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update Documentation update: add config property `hoodie.write.commit.callback.http.custom.headers` - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6441) Passing custom Headers with Hudi Callback URL
[ https://issues.apache.org/jira/browse/HUDI-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6441: - Labels: pull-request-available (was: ) > Passing custom Headers with Hudi Callback URL > - > > Key: HUDI-6441 > URL: https://issues.apache.org/jira/browse/HUDI-6441 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: Aditya Goenka >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0, 0.15.0 > > > Hudi callback URL's doesn't support passing the custom headers as of now. > Implement a way to pass them and use it for callback. > Github Issue - [https://github.com/apache/hudi/issues/8834] -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7395] Fix computation for metrics in HoodieMetadataMetrics [hudi]
nsivabalan commented on PR #10641: URL: https://github.com/apache/hudi/pull/10641#issuecomment-2041490421 reviewed last 2 commits. LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]
nsivabalan commented on code in PR #10635: URL: https://github.com/apache/hudi/pull/10635#discussion_r1554986358 ## hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java: ## @@ -97,7 +97,7 @@ protected BaseTableMetadata(HoodieEngineContext engineContext, HoodieMetadataCon this.isMetadataTableInitialized = dataMetaClient.getTableConfig().isMetadataTableAvailable(); if (metadataConfig.enableMetrics()) { - this.metrics = Option.of(new HoodieMetadataMetrics(Registry.getRegistry("HoodieMetadata"))); + this.metrics = Option.of(new HoodieMetadataMetrics(HoodieMetricsConfig.newBuilder().fromProperties(metadataConfig.getProps()).build())); Review Comment: metadataConfig is not going to contain any metrics related props. this is on the reader side. What we have fixed in HoodieMetadataWriteUtils is applicable for metadata writer and not reader. We need some fixes here. if not, the metrics related props may not be carried over to this code snippet. ## hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java: ## @@ -97,7 +97,7 @@ protected BaseTableMetadata(HoodieEngineContext engineContext, HoodieMetadataCon this.isMetadataTableInitialized = dataMetaClient.getTableConfig().isMetadataTableAvailable(); if (metadataConfig.enableMetrics()) { - this.metrics = Option.of(new HoodieMetadataMetrics(Registry.getRegistry("HoodieMetadata"))); + this.metrics = Option.of(new HoodieMetadataMetrics(HoodieMetricsConfig.newBuilder().fromProperties(metadataConfig.getProps()).build())); Review Comment: but I am not sure if we can even get that. bcoz, the query engine is not going to set any writer props (for eg metrics related ones). So, its not feasible for us to instantiate this properly on the reader side :( ## hudi-common/src/main/java/org/apache/hudi/metrics/Metrics.java: ## @@ -166,6 +169,17 @@ public void registerGauge(String metricName, final long value) { } } + public HoodieGauge registerGauge(String metricName) { +try { Review Comment: why can't we call the other method here. ``` registerGauge(String metricName, 0L); ``` ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java: ## @@ -200,6 +200,11 @@ public static HoodieWriteConfig createMetadataWriteConfig( builder.withProperties(datadogConfig.build().getProps()); break; case PROMETHEUS: + HoodieMetricsPrometheusConfig prometheusConfig = HoodieMetricsPrometheusConfig.newBuilder() + .withPushgatewayLabels(writeConfig.getPushGatewayLabels()) + .withPrometheusPortNum(writeConfig.getPrometheusPort()).build(); Review Comment: why we are not setting other props like host, jobname, etc. ## hudi-common/src/main/java/org/apache/hudi/metrics/Metrics.java: ## @@ -176,4 +190,16 @@ public static boolean isInitialized(String basePath) { } return false; } + + /** + * Use the same base path as the hudi table so that Metrics instance is shared. + */ + private static String getBasePath(HoodieMetricsConfig metricsConfig) { +String basePath = metricsConfig.getBasePath(); +if (basePath.endsWith(HoodieTableMetaClient.METADATA_TABLE_FOLDER_PATH)) { Review Comment: can we introduce a utility for deducing metadata table. btw, we should check the entire dir name matches "metadata" and not just ends with. We could possible have a table named "customer_metadata" or something of those sorts. above check could actually match for this table path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7395] Fix computation for metrics in HoodieMetadataMetrics [hudi]
nsivabalan commented on PR #10641: URL: https://github.com/apache/hudi/pull/10641#issuecomment-2041483567 I guess this is stacked ontop of 10635. Can you add a link to PR description to the actual diff to review for this patch(ignoring the stacked PR changed). If I am not wrong, https://github.com/apache/hudi/pull/10641/files/adc183a351b8f15d671c0c6eefd1f999bed54774..fc072259ead8a9870a1b26b5aceb7882aabebb32 is the right link (last 2 commits). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041474216 ## CI report: * bbfbe38b86b5bd11972591a346ac9b847a7daa6a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23136) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041454934 ## CI report: * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098) * bbfbe38b86b5bd11972591a346ac9b847a7daa6a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23136) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041453013 ## CI report: * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098) * bbfbe38b86b5bd11972591a346ac9b847a7daa6a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1554955230 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -991,9 +1001,9 @@ private void updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata, private HoodieData getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String indexPartition, String instantTime) throws Exception { HoodieFunctionalIndexDefinition indexDefinition = getFunctionalIndexDefinition(indexPartition); List> partitionFileSlicePairs = new ArrayList<>(); -HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient); +HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(dataMetaClient); commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) -> { - List fileSlices = getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, Option.ofNullable(fsView), dataPartition); + List fileSlices = getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, Option.ofNullable(fsView), dataPartition); Review Comment: Not following you here. In multi-writer scenario, there is a new latest file slice due to instant t1 but `dataMetaClient` had already been initialized before t1 (say upto instant t0), then index will only be updated upto t0. In this case, `getPartitionLatestFileSlicesIncludingInflight` will only return file slices upto t0. The fsView API will return file slices from a consistent snapshot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1554954192 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java: ## @@ -107,6 +107,14 @@ interface SliceViewWithLatestSlice { */ Stream getLatestFileSlices(String partitionPath); +/** + * Get the latest file slices for a given partition including the inflight ones. + * + * @param partitionPath The partition path of interest + * @return Stream of latest {@link FileSlice} in the partition path. + */ +Stream getLatestFileSlicesIncludingInflight(String partitionPath); + Review Comment: No we don't need there. This is an uplevel of existing API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1554953956 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String initializationTime, List functionalIndexPartitionsToInit = getFunctionalIndexPartitionsToInit(); +if (functionalIndexPartitionsToInit.isEmpty()) { + continue; Review Comment: Adding to what Vinay said, going forward we will have more indexes where index type and index name (mdt partition name) will differ such as secondary index. I think we should get rid of `MetadataPartitionType`. This also enables removing the `MetadataRecordsGenerationParams` pojo which is deprecated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-6441) Passing custom Headers with Hudi Callback URL
[ https://issues.apache.org/jira/browse/HUDI-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-6441: --- Assignee: Vova Kolmakov > Passing custom Headers with Hudi Callback URL > - > > Key: HUDI-6441 > URL: https://issues.apache.org/jira/browse/HUDI-6441 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: Aditya Goenka >Assignee: Vova Kolmakov >Priority: Major > Fix For: 1.1.0, 0.15.0 > > > Hudi callback URL's doesn't support passing the custom headers as of now. > Implement a way to pass them and use it for callback. > Github Issue - [https://github.com/apache/hudi/issues/8834] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6441) Passing custom Headers with Hudi Callback URL
[ https://issues.apache.org/jira/browse/HUDI-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-6441: Status: In Progress (was: Open) > Passing custom Headers with Hudi Callback URL > - > > Key: HUDI-6441 > URL: https://issues.apache.org/jira/browse/HUDI-6441 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: Aditya Goenka >Assignee: Vova Kolmakov >Priority: Major > Fix For: 1.1.0, 0.15.0 > > > Hudi callback URL's doesn't support passing the custom headers as of now. > Implement a way to pass them and use it for callback. > Github Issue - [https://github.com/apache/hudi/issues/8834] -- This message was sent by Atlassian Jira (v8.20.10#820010)