Re: [PR] [HUDI-7156] Abstract an independent hoodie table filesystem view lock [hudi]

2024-04-07 Thread via GitHub


zhuanshenbsj1 closed pull request #10197: [HUDI-7156] Abstract an independent 
hoodie table filesystem view lock
URL: https://github.com/apache/hudi/pull/10197


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]The number of tasks in each distinct stage of building workload profile is always 60 [hudi]

2024-04-07 Thread via GitHub


ad1happy2go commented on issue #10972:
URL: https://github.com/apache/hudi/issues/10972#issuecomment-2041903783

   @MrAladdin Can you provide the writer configurations you are using?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7559] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10947:
URL: https://github.com/apache/hudi/pull/10947#issuecomment-2041902574

   
   ## CI report:
   
   * 0c84a761b5f5378bcd51d987c9f29b1f649cf820 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23121)
 
   * eb6439ab3e1f95e90411c64afd1e5ef636dbeacc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23146)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Could not sync using the meta sync class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool [hudi]

2024-04-07 Thread via GitHub


ad1happy2go commented on issue #10968:
URL: https://github.com/apache/hudi/issues/10968#issuecomment-2041901636

   @mattssll Looks like hudi bundle jar is not in class path. Can you let us 
know the details of the hudi jars configured on the EKS cluster?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7559] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10947:
URL: https://github.com/apache/hudi/pull/10947#issuecomment-2041895291

   
   ## CI report:
   
   * 0c84a761b5f5378bcd51d987c9f29b1f649cf820 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23121)
 
   * eb6439ab3e1f95e90411c64afd1e5ef636dbeacc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7230] stream read supports skipping insert overwrite instant [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10328:
URL: https://github.com/apache/hudi/pull/10328#issuecomment-2041886839

   
   ## CI report:
   
   * cc1e51661db71367549360a1e89b6bd22ea24d8a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23143)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10975:
URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041848631

   
   ## CI report:
   
   * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23142)
 
   * 7dd1a19f792e3c0b8708de63e0b83810af709ce3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23145)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-07 Thread via GitHub


the-other-tim-brown commented on PR #10975:
URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041848520

   > Just from the `File` notion, don't think we should take any partition 
related variables into it. Is there any solution for the optimization without 
modifying these two abstractions?
   
   Can you explain why? There is other metadata already associated with these 
objects like file group and commit. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10975:
URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041842959

   
   ## CI report:
   
   * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23142)
 
   * 7dd1a19f792e3c0b8708de63e0b83810af709ce3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


danny0405 commented on PR #10973:
URL: https://github.com/apache/hudi/pull/10973#issuecomment-2041837537

   Can you check the CI failures?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2041837370

   
   ## CI report:
   
   * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23144)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10975:
URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041837346

   
   ## CI report:
   
   * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23142)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7230] stream read supports skipping insert overwrite instant [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10328:
URL: https://github.com/apache/hudi/pull/10328#issuecomment-2041836494

   
   ## CI report:
   
   * c1d0db68c7e167de36d6090348bee9864191ac1d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21514)
 
   * cc1e51661db71367549360a1e89b6bd22ea24d8a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23143)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6330) Update user document to introduce this feature

2024-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6330:
-
Labels: pull-request-available  (was: )

> Update user document to introduce this feature
> --
>
> Key: HUDI-6330
> URL: https://issues.apache.org/jira/browse/HUDI-6330
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: docs, flink
>Reporter: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-6330][DOCS] Update user doc to show how to use consistent bucket index for Flink engine [hudi]

2024-04-07 Thread via GitHub


beyond1920 opened a new pull request, #10977:
URL: https://github.com/apache/hudi/pull/10977

   ### Change Logs
   
   Update user doc to show how to use consistent bucket index for Flink engine
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2041804155

   
   ## CI report:
   
   * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7230] stream read supports skipping insert overwrite instant [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10328:
URL: https://github.com/apache/hudi/pull/10328#issuecomment-2041803428

   
   ## CI report:
   
   * c1d0db68c7e167de36d6090348bee9864191ac1d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21514)
 
   * cc1e51661db71367549360a1e89b6bd22ea24d8a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10975:
URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041804102

   
   ## CI report:
   
   * 51e070c635f7207a2b77ba1896ecd29694f54eae Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23141)
 
   * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23142)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7572] Avoid to schedule empty compaction plan without log files [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10974:
URL: https://github.com/apache/hudi/pull/10974#issuecomment-2041804074

   
   ## CI report:
   
   * 451064a8002cf5544e764624a31efe7d64671406 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23140)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10975:
URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041798819

   
   ## CI report:
   
   * 51e070c635f7207a2b77ba1896ecd29694f54eae Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23141)
 
   * 422c9c52dbae120f9eb7498a7ae682003b8dcc22 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7230] stream read supports skipping insert overwrite instant [hudi]

2024-04-07 Thread via GitHub


zhuanshenbsj1 commented on PR #10328:
URL: https://github.com/apache/hudi/pull/10328#issuecomment-2041798064

   cc @danny0405


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7575) Avoid recomputing list of pending replacecommits in FSView code

2024-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7575:
-
Labels: pull-request-available  (was: )

> Avoid recomputing list of pending replacecommits in FSView code
> ---
>
> Key: HUDI-7575
> URL: https://issues.apache.org/jira/browse/HUDI-7575
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> When checking if a base file is part of a pending clustering, the code will 
> construct the same list repeatedly leading to unnecessary overhead. The class 
> should gather this list once and persist it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


pt657407064 commented on code in PR #10973:
URL: https://github.com/apache/hudi/pull/10973#discussion_r1555183250


##
hudi-cli/src/main/resources/application.yml:
##
@@ -20,4 +20,7 @@ spring:
   shell:
 history:
   enabled: true
-  name: hoodie-cmd.log
\ No newline at end of file
+  name: hoodie-cmd.log
+command:
+  version:
+template: "classpath:version.txt"

Review Comment:
   It was working until this file got removed since the hudi released 0.11.1. 
https://github.com/apache/hudi/blob/release-0.11.0/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieSplashScreen.java#L60



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-07 Thread via GitHub


the-other-tim-brown opened a new pull request, #10976:
URL: https://github.com/apache/hudi/pull/10976

   ### Change Logs
   
   Avoids repeatedly creating a timeline with pending replace instants when 
creating the FSView.
   
   ### Impact
   
   - Lowers the overhead of creating the FSView in terms of objects created
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


pt657407064 commented on code in PR #10973:
URL: https://github.com/apache/hudi/pull/10973#discussion_r1555183250


##
hudi-cli/src/main/resources/application.yml:
##
@@ -20,4 +20,7 @@ spring:
   shell:
 history:
   enabled: true
-  name: hoodie-cmd.log
\ No newline at end of file
+  name: hoodie-cmd.log
+command:
+  version:
+template: "classpath:version.txt"

Review Comment:
   It was working until this file got removed from the hudi released. 
https://github.com/apache/hudi/blob/release-0.11.0/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieSplashScreen.java#L60



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10975:
URL: https://github.com/apache/hudi/pull/10975#issuecomment-2041793570

   
   ## CI report:
   
   * 51e070c635f7207a2b77ba1896ecd29694f54eae UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


pt657407064 commented on code in PR #10973:
URL: https://github.com/apache/hudi/pull/10973#discussion_r1555181705


##
hudi-cli/src/main/resources/application.yml:
##
@@ -20,4 +20,7 @@ spring:
   shell:
 history:
   enabled: true
-  name: hoodie-cmd.log
\ No newline at end of file
+  name: hoodie-cmd.log
+command:
+  version:
+template: "classpath:version.txt"

Review Comment:
   Yes, The `version` command in hudi CLI is not showing the current Hudi 
version. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


pt657407064 commented on code in PR #10973:
URL: https://github.com/apache/hudi/pull/10973#discussion_r1555181705


##
hudi-cli/src/main/resources/application.yml:
##
@@ -20,4 +20,7 @@ spring:
   shell:
 history:
   enabled: true
-  name: hoodie-cmd.log
\ No newline at end of file
+  name: hoodie-cmd.log
+command:
+  version:
+template: "classpath:version.txt"

Review Comment:
   The version command in CLI is not showing the current Hudi version. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7576) Add partitionPath to the HoodieBaseFile and HoodieLogFile objects

2024-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7576:
-
Labels: pull-request-available  (was: )

> Add partitionPath to the HoodieBaseFile and HoodieLogFile objects
> -
>
> Key: HUDI-7576
> URL: https://issues.apache.org/jira/browse/HUDI-7576
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> Adding this field to the classes will allow us to avoid repeatedly computing 
> the partition path per file in other parts of the code. This can cut down on 
> the CPU overhead associated with creating the FS View.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-07 Thread via GitHub


the-other-tim-brown opened a new pull request, #10975:
URL: https://github.com/apache/hudi/pull/10975

   ### Change Logs
   
   - Adds partitionPath string to the HoodieBaseFile and HoodieLogFile to avoid 
computing it multiple times for a single instance of these files.
   - Minor optimization on partition path computation in case where CachingPath 
is not used
   
   ### Impact
   
   - Reduces overhead of constructing FSViews. We see a non-negligible amount 
of CPU time spent on computing the partition path for each file when it can 
actually simply be taken in as an input in some cases. 
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7572] Avoid to schedule empty compaction plan without log files [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10974:
URL: https://github.com/apache/hudi/pull/10974#issuecomment-2041758073

   
   ## CI report:
   
   * 451064a8002cf5544e764624a31efe7d64671406 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23140)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10973:
URL: https://github.com/apache/hudi/pull/10973#issuecomment-2041758041

   
   ## CI report:
   
   * 6827a922b5eae447c97a294cc9f5f9520761bb10 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23139)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (HUDI-6854) Change default payload type to HOODIE_AVRO_DEFAULT

2024-04-07 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-6854.
--

> Change default payload type to HOODIE_AVRO_DEFAULT
> --
>
> Key: HUDI-6854
> URL: https://issues.apache.org/jira/browse/HUDI-6854
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Current default is OVERWRITE_LATEST which instantiates 
> OverwriteWithLatestAvroPayload but it's not intuitive when latest gets 
> written and user sets some precombine field and expects to merge records 
> based on that field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6854) Change default payload type to HOODIE_AVRO_DEFAULT

2024-04-07 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6854.

Resolution: Fixed

Fixed via master branch: 5519e9c13b3563760e44712112f9bf93faa4b40e

> Change default payload type to HOODIE_AVRO_DEFAULT
> --
>
> Key: HUDI-6854
> URL: https://issues.apache.org/jira/browse/HUDI-6854
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Current default is OVERWRITE_LATEST which instantiates 
> OverwriteWithLatestAvroPayload but it's not intuitive when latest gets 
> written and user sets some precombine field and expects to merge records 
> based on that field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated (b487e9826d0 -> 5519e9c13b3)

2024-04-07 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from b487e9826d0 [MINOR] Removed FSUtils.makeBaseFileName without fileExt 
param (#10967)
 add 5519e9c13b3 [HUDI-6854] Change default payload type to 
HOODIE_AVRO_DEFAULT (#10949)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/config/HoodiePayloadConfig.java | 4 ++--
 .../src/main/java/org/apache/hudi/config/HoodieWriteConfig.java   | 6 +++---
 .../org/apache/hudi/common/model/DefaultHoodieRecordPayload.java  | 4 +++-
 .../apache/hudi/common/model/OverwriteWithLatestAvroPayload.java  | 2 --
 .../main/java/org/apache/hudi/common/model/RecordPayloadType.java | 2 +-
 .../main/java/org/apache/hudi/common/table/HoodieTableConfig.java | 6 +++---
 .../scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala| 8 
 .../org/apache/hudi/functional/TestHiveTableSchemaEvolution.java  | 3 ++-
 .../org/apache/hudi/functional/TestBasicSchemaEvolution.scala | 6 --
 .../org/apache/spark/sql/hudi/common/TestHoodieOptionConfig.scala | 4 ++--
 .../test/scala/org/apache/spark/sql/hudi/ddl/TestSpark3DDL.scala  | 7 ++-
 11 files changed, 30 insertions(+), 22 deletions(-)



Re: [PR] [HUDI-7572] Avoid to schedule empty compaction plan without log files [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10974:
URL: https://github.com/apache/hudi/pull/10974#issuecomment-2041752216

   
   ## CI report:
   
   * 451064a8002cf5544e764624a31efe7d64671406 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]

2024-04-07 Thread via GitHub


danny0405 merged PR #10949:
URL: https://github.com/apache/hudi/pull/10949


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1555159066


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -991,9 +1001,9 @@ private void 
updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata,
   private HoodieData 
getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String 
indexPartition, String instantTime) throws Exception {
 HoodieFunctionalIndexDefinition indexDefinition = 
getFunctionalIndexDefinition(indexPartition);
 List> partitionFileSlicePairs = new ArrayList<>();
-HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(dataMetaClient);
 commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) 
-> {
-  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, 
Option.ofNullable(fsView), dataPartition);
+  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, 
Option.ofNullable(fsView), dataPartition);

Review Comment:
   > Not following you here. In multi-writer scenario, there is a new latest 
file slice due to instant t1
   
   How about this client triggers cleaning for t0 with a very radical strategy 
while we do this loading check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] upgrade to maven-surefire-plugin 3.2.5 [hudi]

2024-04-07 Thread via GitHub


danny0405 commented on PR #10969:
URL: https://github.com/apache/hudi/pull/10969#issuecomment-2041747438

   It looks like the 3.2.5 surefire plugin has some validation check:
   
   ```java
   Caused by: org.apache.maven.plugin.MojoFailureException: No tests matching 
pattern "skipJavaTests" were executed! (Set 
-Dsurefire.failIfNoSpecifiedTests=false to ignore this error.)
   at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute 
(AbstractSurefireMojo.java:902)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (56142a0ff61 -> b487e9826d0)

2024-04-07 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 56142a0ff61 [MINOR] use Temurin jdk (#10948)
 add b487e9826d0 [MINOR] Removed FSUtils.makeBaseFileName without fileExt 
param (#10967)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/client/BaseHoodieClient.java   | 28 ++-
 .../hudi/client/BaseHoodieTableServiceClient.java  | 57 ++
 .../apache/hudi/client/BaseHoodieWriteClient.java  | 24 -
 .../apache/hudi/client/HoodieJavaWriteClient.java  | 22 -
 4 files changed, 43 insertions(+), 88 deletions(-)



Re: [PR] [MINOR] Removed code duplicates in HoodieClients [hudi]

2024-04-07 Thread via GitHub


danny0405 merged PR #10967:
URL: https://github.com/apache/hudi/pull/10967


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


danny0405 commented on code in PR #10973:
URL: https://github.com/apache/hudi/pull/10973#discussion_r1555154882


##
hudi-cli/src/main/resources/application.yml:
##
@@ -20,4 +20,7 @@ spring:
   shell:
 history:
   enabled: true
-  name: hoodie-cmd.log
\ No newline at end of file
+  name: hoodie-cmd.log
+command:
+  version:
+template: "classpath:version.txt"

Review Comment:
   Not very familiar with the Spring stuff, are we fixing the spring cmd?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7577) Avoid MDT compaction instant time conflicts

2024-04-07 Thread Danny Chen (Jira)
Danny Chen created HUDI-7577:


 Summary: Avoid MDT compaction instant time conflicts
 Key: HUDI-7577
 URL: https://issues.apache.org/jira/browse/HUDI-7577
 Project: Apache Hudi
  Issue Type: Improvement
  Components: core
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7577) Avoid MDT compaction instant time conflicts

2024-04-07 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7577:
-
Sprint: Sprint 2024-03-25

> Avoid MDT compaction instant time conflicts
> ---
>
> Key: HUDI-7577
> URL: https://issues.apache.org/jira/browse/HUDI-7577
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7572) Avoid to schedule empty compaction plan without log files

2024-04-07 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7572:
-
Status: Patch Available  (was: In Progress)

> Avoid to schedule empty compaction plan without log files
> -
>
> Key: HUDI-7572
> URL: https://issues.apache.org/jira/browse/HUDI-7572
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: table-service
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> After change to [loosen the compaction for 
> MDT|https://issues.apache.org/jira/browse/HUDI-7572], there is rare case the 
> same compaction instant time got used to schedule for multiple times, we 
> better optimize the compactor to avoid empty compaction plan generation.
> Note: although we have a active timeline check to avoid the repetative 
> scheduling, there is still little chance the compaction already got archived.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7572) Avoid to schedule empty compaction plan without log files

2024-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7572:
-
Labels: pull-request-available  (was: )

> Avoid to schedule empty compaction plan without log files
> -
>
> Key: HUDI-7572
> URL: https://issues.apache.org/jira/browse/HUDI-7572
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: table-service
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> After change to [loosen the compaction for 
> MDT|https://issues.apache.org/jira/browse/HUDI-7572], there is rare case the 
> same compaction instant time got used to schedule for multiple times, we 
> better optimize the compactor to avoid empty compaction plan generation.
> Note: although we have a active timeline check to avoid the repetative 
> scheduling, there is still little chance the compaction already got archived.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7572] Avoid to schedule empty compaction plan without log files [hudi]

2024-04-07 Thread via GitHub


danny0405 opened a new pull request, #10974:
URL: https://github.com/apache/hudi/pull/10974

   ### Change Logs
   
   If there is no log files in the compaction plan, skip the compaction.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10973:
URL: https://github.com/apache/hudi/pull/10973#issuecomment-2041711400

   
   ## CI report:
   
   * 6827a922b5eae447c97a294cc9f5f9520761bb10 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23139)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10973:
URL: https://github.com/apache/hudi/pull/10973#issuecomment-2041705454

   
   ## CI report:
   
   * 6827a922b5eae447c97a294cc9f5f9520761bb10 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7572) Avoid to schedule empty compaction plan without log files

2024-04-07 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7572:
-
Status: In Progress  (was: Open)

> Avoid to schedule empty compaction plan without log files
> -
>
> Key: HUDI-7572
> URL: https://issues.apache.org/jira/browse/HUDI-7572
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: table-service
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.0.0
>
>
> After change to [loosen the compaction for 
> MDT|https://issues.apache.org/jira/browse/HUDI-7572], there is rare case the 
> same compaction instant time got used to schedule for multiple times, we 
> better optimize the compactor to avoid empty compaction plan generation.
> Note: although we have a active timeline check to avoid the repetative 
> scheduling, there is still little chance the compaction already got archived.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [MINOR] Hudi CLI version command output empty string [hudi]

2024-04-07 Thread via GitHub


pt657407064 opened a new pull request, #10973:
URL: https://github.com/apache/hudi/pull/10973

   ### Change Logs
   Hudi Cli version command output empty string. Adding property files to 
output the version number according with the hudi parent project version. 
   
   ### Impact
   Will cause confusion for client without knowing the version number
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] [hudi]

2024-04-07 Thread via GitHub


MrAladdin opened a new issue, #10972:
URL: https://github.com/apache/hudi/issues/10972

   **Describe the problem you faced**
   
   spark structured streaming upsert hudi(mor、RECORD_INDEX) --- very time 
consuming  :
   1、The number of tasks in each distinct stage of building workload profile is 
always 60, and there is a severe data skew.
   
   I want to know why it's always 60, how to adjust, the reasons for data skew 
and optimization solutions.
   I have done my best.
   
   
   **Environment Description**
   
   * Hudi version :0.14.1
   
   * Spark version :3.4.1
   
   * Hive version :3.1.2
   
   * Hadoop version :3.1.3
   
   * Storage (HDFS/S3/GCS..) :hdfs
   
   * Running on Docker? (yes/no) :no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [DOCS] Update blogs [hudi]

2024-04-07 Thread via GitHub


bhasudha commented on PR #10971:
URL: https://github.com/apache/hudi/pull/10971#issuecomment-2041639002

   Tested locally. The images wont be loaded until site is published. 
   
   https://github.com/apache/hudi/assets/2179254/c15c0646-cea5-47dd-bc20-b3d4224a9845;>
   https://github.com/apache/hudi/assets/2179254/78cef6c8-1075-49d8-8282-d0d0273ce87d;>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [DOCS] Update blogs [hudi]

2024-04-07 Thread via GitHub


bhasudha opened a new pull request, #10971:
URL: https://github.com/apache/hudi/pull/10971

   ### Change Logs
   
   added new blogs to site
   
   ### Impact
   
   low. site updates
   
   ### Risk level (write none, low medium or high below)
   
   none. site updates
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7576) Add partitionPath to the HoodieBaseFile and HoodieLogFile objects

2024-04-07 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7576:
---

 Summary: Add partitionPath to the HoodieBaseFile and HoodieLogFile 
objects
 Key: HUDI-7576
 URL: https://issues.apache.org/jira/browse/HUDI-7576
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Adding this field to the classes will allow us to avoid repeatedly computing 
the partition path per file in other parts of the code. This can cut down on 
the CPU overhead associated with creating the FS View.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7575) Avoid recomputing list of pending replacecommits in FSView code

2024-04-07 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7575:
---

 Summary: Avoid recomputing list of pending replacecommits in 
FSView code
 Key: HUDI-7575
 URL: https://issues.apache.org/jira/browse/HUDI-7575
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


When checking if a base file is part of a pending clustering, the code will 
construct the same list repeatedly leading to unnecessary overhead. The class 
should gather this list once and persist it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7576) Add partitionPath to the HoodieBaseFile and HoodieLogFile objects

2024-04-07 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7576:
---

Assignee: Timothy Brown

> Add partitionPath to the HoodieBaseFile and HoodieLogFile objects
> -
>
> Key: HUDI-7576
> URL: https://issues.apache.org/jira/browse/HUDI-7576
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Adding this field to the classes will allow us to avoid repeatedly computing 
> the partition path per file in other parts of the code. This can cut down on 
> the CPU overhead associated with creating the FS View.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7575) Avoid recomputing list of pending replacecommits in FSView code

2024-04-07 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7575:
---

Assignee: Timothy Brown

> Avoid recomputing list of pending replacecommits in FSView code
> ---
>
> Key: HUDI-7575
> URL: https://issues.apache.org/jira/browse/HUDI-7575
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> When checking if a base file is part of a pending clustering, the code will 
> construct the same list repeatedly leading to unnecessary overhead. The class 
> should gather this list once and persist it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10970:
URL: https://github.com/apache/hudi/pull/10970#issuecomment-2041536022

   
   ## CI report:
   
   * 1b65081255315b4c5129b2d5ccea4c097ca15649 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23137)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10635:
URL: https://github.com/apache/hudi/pull/10635#issuecomment-2041535838

   
   ## CI report:
   
   * a6b4e7f80ed04f25241504c833f9b85b4331f1fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23138)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10635:
URL: https://github.com/apache/hudi/pull/10635#issuecomment-2041523639

   
   ## CI report:
   
   * 52eacd02a772c9a06d92784c1b325e6ac0f66da9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22412)
 
   * a6b4e7f80ed04f25241504c833f9b85b4331f1fd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23138)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10635:
URL: https://github.com/apache/hudi/pull/10635#issuecomment-2041521846

   
   ## CI report:
   
   * 52eacd02a772c9a06d92784c1b325e6ac0f66da9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22412)
 
   * a6b4e7f80ed04f25241504c833f9b85b4331f1fd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10970:
URL: https://github.com/apache/hudi/pull/10970#issuecomment-2041508051

   
   ## CI report:
   
   * 1b65081255315b4c5129b2d5ccea4c097ca15649 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23137)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10970:
URL: https://github.com/apache/hudi/pull/10970#issuecomment-2041505684

   
   ## CI report:
   
   * 1b65081255315b4c5129b2d5ccea4c097ca15649 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-07 Thread via GitHub


wombatu-kun opened a new pull request, #10970:
URL: https://github.com/apache/hudi/pull/10970

   ### Change Logs
   
   Hudi callback URL's doesn't support passing the custom headers as of now.  
   
   Implemented a way to pass them and use it for callback:  
   - added config param `hoodie.write.commit.callback.http.custom.headers` to 
HoodieWriteConfig (HoodieWriteCommitCallbackConfig);
   - in this config param user can set all his custom headers in the form: 
`header_name1:value 1;header_name2:value2`;  
   - this string is parsed and send as http headers with callback request.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   Documentation update: add config property 
`hoodie.write.commit.callback.http.custom.headers`
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6441) Passing custom Headers with Hudi Callback URL

2024-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6441:
-
Labels: pull-request-available  (was: )

> Passing custom Headers with Hudi Callback URL
> -
>
> Key: HUDI-6441
> URL: https://issues.apache.org/jira/browse/HUDI-6441
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Aditya Goenka
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0, 0.15.0
>
>
> Hudi callback URL's doesn't support passing the custom headers as of now. 
> Implement a way to pass them and use it for callback.
> Github Issue - [https://github.com/apache/hudi/issues/8834]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7395] Fix computation for metrics in HoodieMetadataMetrics [hudi]

2024-04-07 Thread via GitHub


nsivabalan commented on PR #10641:
URL: https://github.com/apache/hudi/pull/10641#issuecomment-2041490421

   reviewed last 2 commits. LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-07 Thread via GitHub


nsivabalan commented on code in PR #10635:
URL: https://github.com/apache/hudi/pull/10635#discussion_r1554986358


##
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##
@@ -97,7 +97,7 @@ protected BaseTableMetadata(HoodieEngineContext 
engineContext, HoodieMetadataCon
 this.isMetadataTableInitialized = 
dataMetaClient.getTableConfig().isMetadataTableAvailable();
 
 if (metadataConfig.enableMetrics()) {
-  this.metrics = Option.of(new 
HoodieMetadataMetrics(Registry.getRegistry("HoodieMetadata")));
+  this.metrics = Option.of(new 
HoodieMetadataMetrics(HoodieMetricsConfig.newBuilder().fromProperties(metadataConfig.getProps()).build()));

Review Comment:
   metadataConfig is not going to contain any metrics related props. this is on 
the reader side. 
   What we have fixed in HoodieMetadataWriteUtils is applicable for metadata 
writer and not reader. 
   We need some fixes here. if not, the metrics related props may not be 
carried over to this code snippet.



##
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##
@@ -97,7 +97,7 @@ protected BaseTableMetadata(HoodieEngineContext 
engineContext, HoodieMetadataCon
 this.isMetadataTableInitialized = 
dataMetaClient.getTableConfig().isMetadataTableAvailable();
 
 if (metadataConfig.enableMetrics()) {
-  this.metrics = Option.of(new 
HoodieMetadataMetrics(Registry.getRegistry("HoodieMetadata")));
+  this.metrics = Option.of(new 
HoodieMetadataMetrics(HoodieMetricsConfig.newBuilder().fromProperties(metadataConfig.getProps()).build()));

Review Comment:
   but I am not sure if we can even get that. bcoz, the query engine is not 
going to set any writer props (for eg metrics related ones). So, its not 
feasible for us to instantiate this properly on the reader side :( 
   



##
hudi-common/src/main/java/org/apache/hudi/metrics/Metrics.java:
##
@@ -166,6 +169,17 @@ public void registerGauge(String metricName, final long 
value) {
 }
   }
 
+  public HoodieGauge registerGauge(String metricName) {
+try {

Review Comment:
   why can't we call the other method here. 
   
   ```
   registerGauge(String metricName, 0L); 
   ```
   



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##
@@ -200,6 +200,11 @@ public static HoodieWriteConfig createMetadataWriteConfig(
   builder.withProperties(datadogConfig.build().getProps());
   break;
 case PROMETHEUS:
+  HoodieMetricsPrometheusConfig prometheusConfig = 
HoodieMetricsPrometheusConfig.newBuilder()
+  .withPushgatewayLabels(writeConfig.getPushGatewayLabels())
+  .withPrometheusPortNum(writeConfig.getPrometheusPort()).build();

Review Comment:
   why we are not setting other props like host, jobname, etc. 



##
hudi-common/src/main/java/org/apache/hudi/metrics/Metrics.java:
##
@@ -176,4 +190,16 @@ public static boolean isInitialized(String basePath) {
 }
 return false;
   }
+
+  /**
+   * Use the same base path as the hudi table so that Metrics instance is 
shared.
+   */
+  private static String getBasePath(HoodieMetricsConfig metricsConfig) {
+String basePath = metricsConfig.getBasePath();
+if (basePath.endsWith(HoodieTableMetaClient.METADATA_TABLE_FOLDER_PATH)) {

Review Comment:
   can we introduce a utility for deducing metadata table. 
   btw, we should check the entire dir name matches "metadata" and not just 
ends with. We could possible have a table named "customer_metadata" or 
something of those sorts. above check could actually match for this table path. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7395] Fix computation for metrics in HoodieMetadataMetrics [hudi]

2024-04-07 Thread via GitHub


nsivabalan commented on PR #10641:
URL: https://github.com/apache/hudi/pull/10641#issuecomment-2041483567

   I guess this is stacked ontop of 10635. Can you add a link to PR description 
to the actual diff to review for this patch(ignoring the stacked PR changed). 
   If I am not wrong, 
https://github.com/apache/hudi/pull/10641/files/adc183a351b8f15d671c0c6eefd1f999bed54774..fc072259ead8a9870a1b26b5aceb7882aabebb32
 
   is the right link (last 2 commits). 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041474216

   
   ## CI report:
   
   * bbfbe38b86b5bd11972591a346ac9b847a7daa6a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23136)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041454934

   
   ## CI report:
   
   * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098)
 
   * bbfbe38b86b5bd11972591a346ac9b847a7daa6a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23136)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041453013

   
   ## CI report:
   
   * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098)
 
   * bbfbe38b86b5bd11972591a346ac9b847a7daa6a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1554955230


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -991,9 +1001,9 @@ private void 
updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata,
   private HoodieData 
getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String 
indexPartition, String instantTime) throws Exception {
 HoodieFunctionalIndexDefinition indexDefinition = 
getFunctionalIndexDefinition(indexPartition);
 List> partitionFileSlicePairs = new ArrayList<>();
-HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(dataMetaClient);
 commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) 
-> {
-  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, 
Option.ofNullable(fsView), dataPartition);
+  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, 
Option.ofNullable(fsView), dataPartition);

Review Comment:
   Not following you here. In multi-writer scenario, there is a new latest file 
slice due to instant t1 but `dataMetaClient` had already been initialized 
before t1 (say upto instant t0), then index will only be updated upto t0. In 
this case, `getPartitionLatestFileSlicesIncludingInflight` will only return 
file slices upto t0. The fsView API will return file slices from a consistent 
snapshot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1554954192


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java:
##
@@ -107,6 +107,14 @@ interface SliceViewWithLatestSlice {
  */
 Stream getLatestFileSlices(String partitionPath);
 
+/**
+ * Get the latest file slices for a given partition including the inflight 
ones.
+ *
+ * @param partitionPath The partition path of interest
+ * @return Stream of latest {@link FileSlice} in the partition path.
+ */
+Stream getLatestFileSlicesIncludingInflight(String 
partitionPath);
+

Review Comment:
   No we don't need there. This is an uplevel of existing API. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1554953956


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String 
initializationTime, List functionalIndexPartitionsToInit = 
getFunctionalIndexPartitionsToInit();
+if (functionalIndexPartitionsToInit.isEmpty()) {
+  continue;

Review Comment:
   Adding to what Vinay said, going forward we will have more indexes where 
index type and index name (mdt partition name) will differ such as secondary 
index. I think we should get rid of `MetadataPartitionType`.  This also enables 
removing the `MetadataRecordsGenerationParams` pojo which is deprecated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-6441) Passing custom Headers with Hudi Callback URL

2024-04-07 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-6441:
---

Assignee: Vova Kolmakov

> Passing custom Headers with Hudi Callback URL
> -
>
> Key: HUDI-6441
> URL: https://issues.apache.org/jira/browse/HUDI-6441
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Aditya Goenka
>Assignee: Vova Kolmakov
>Priority: Major
> Fix For: 1.1.0, 0.15.0
>
>
> Hudi callback URL's doesn't support passing the custom headers as of now. 
> Implement a way to pass them and use it for callback.
> Github Issue - [https://github.com/apache/hudi/issues/8834]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6441) Passing custom Headers with Hudi Callback URL

2024-04-07 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov updated HUDI-6441:

Status: In Progress  (was: Open)

> Passing custom Headers with Hudi Callback URL
> -
>
> Key: HUDI-6441
> URL: https://issues.apache.org/jira/browse/HUDI-6441
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Aditya Goenka
>Assignee: Vova Kolmakov
>Priority: Major
> Fix For: 1.1.0, 0.15.0
>
>
> Hudi callback URL's doesn't support passing the custom headers as of now. 
> Implement a way to pass them and use it for callback.
> Github Issue - [https://github.com/apache/hudi/issues/8834]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)