Re: [PR] [HUDI-7720] Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11161:
URL: https://github.com/apache/hudi/pull/11161#issuecomment-2097554582

   
   ## CI report:
   
   * 920a8f421b2d6650c8b2451af2038346a2906343 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23725)
 
   * 9b7057d990f1d6de45a27c3f6b47b467c9007c5b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23732)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2097554439

   
   ## CI report:
   
   * e96cd9ce1f546e881806dbce71ff178ee89bc0f3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23722)
 
   * 7aee0f69c8f6bbdb8f3c070ea6549d646d50fc51 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23731)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Implement secondary index write path [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11146:
URL: https://github.com/apache/hudi/pull/11146#issuecomment-2097554344

   
   ## CI report:
   
   * f232b46fcd23d960efc587a624c2e9d69d3d7e9e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23728)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2097553714

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * a387882bde246df2a81e8aca30c63835180791c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23718)
 
   * 1b9d39facd9186697fb56e1e2e79ab3b3ded4ce5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23729)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7720] Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11161:
URL: https://github.com/apache/hudi/pull/11161#issuecomment-2097544407

   
   ## CI report:
   
   * 920a8f421b2d6650c8b2451af2038346a2906343 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23725)
 
   * 9b7057d990f1d6de45a27c3f6b47b467c9007c5b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11162:
URL: https://github.com/apache/hudi/pull/11162#issuecomment-2097544471

   
   ## CI report:
   
   * 7952ad87456d56a737eb43813807d64551ddc02b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23712)
 
   * 8e50a635f9dad16057b0c35121716712856a8511 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23730)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2097544272

   
   ## CI report:
   
   * e96cd9ce1f546e881806dbce71ff178ee89bc0f3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23722)
 
   * 7aee0f69c8f6bbdb8f3c070ea6549d646d50fc51 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2097543745

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * a387882bde246df2a81e8aca30c63835180791c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23718)
 
   * 1b9d39facd9186697fb56e1e2e79ab3b3ded4ce5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11162:
URL: https://github.com/apache/hudi/pull/11162#issuecomment-2097535475

   
   ## CI report:
   
   * 7952ad87456d56a737eb43813807d64551ddc02b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23712)
 
   * 8e50a635f9dad16057b0c35121716712856a8511 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Implement secondary index write path [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11146:
URL: https://github.com/apache/hudi/pull/11146#issuecomment-2097485263

   
   ## CI report:
   
   * 1468a9a72c2cfdda7dae6bb62ac14551f600d8df Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23711)
 
   * f232b46fcd23d960efc587a624c2e9d69d3d7e9e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23728)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11077:
URL: https://github.com/apache/hudi/pull/11077#issuecomment-2097485135

   
   ## CI report:
   
   * 0eff97cd517ed728a93eea9e8aaca05e6eb72650 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23708)
 
   * cc6fbe0c42aa3052994c8b19f5a60e8b2dab83ef Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23727)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11077:
URL: https://github.com/apache/hudi/pull/11077#issuecomment-2097476481

   
   ## CI report:
   
   * 0eff97cd517ed728a93eea9e8aaca05e6eb72650 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23708)
 
   * cc6fbe0c42aa3052994c8b19f5a60e8b2dab83ef UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Implement secondary index write path [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11146:
URL: https://github.com/apache/hudi/pull/11146#issuecomment-2097476674

   
   ## CI report:
   
   * 1468a9a72c2cfdda7dae6bb62ac14551f600d8df Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23711)
 
   * f232b46fcd23d960efc587a624c2e9d69d3d7e9e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7587] Make bundle dependencies for storage abstraction in correct order [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11131:
URL: https://github.com/apache/hudi/pull/11131#issuecomment-2097467747

   
   ## CI report:
   
   * 7c72471a1b9b5ad43ca63ab60da0f3d260f67cea Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23721)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7720] Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11161:
URL: https://github.com/apache/hudi/pull/11161#issuecomment-2097467988

   
   ## CI report:
   
   * 920a8f421b2d6650c8b2451af2038346a2906343 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23725)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2097467849

   
   ## CI report:
   
   * e96cd9ce1f546e881806dbce71ff178ee89bc0f3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23722)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7721) Fix broken build on master

2024-05-06 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7721.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

> Fix broken build on master
> --
>
> Key: HUDI-7721
> URL: https://issues.apache.org/jira/browse/HUDI-7721
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> TestHoodieDeltaStreamer is invalid due to 
> [https://github.com/apache/hudi/pull/11099.] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on PR #11164:
URL: https://github.com/apache/hudi/pull/11164#issuecomment-2097430511

   cc @codope to land it first because several travis builds succeed to unblock 
the master AFAP.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (fdb94192508 -> c359ecc971a)

2024-05-06 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from fdb94192508 [HUDI-7715] Partition TTL for Flink (#11156)
 add c359ecc971a [HUDI-7721] Fix broken build on master (#11164)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



Re: [PR] [HUDI-7720] Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11161:
URL: https://github.com/apache/hudi/pull/11161#issuecomment-2097430015

   
   ## CI report:
   
   * 50008938bd209ca6dedc42f7fa616d7df952df4b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23703)
 
   * 920a8f421b2d6650c8b2451af2038346a2906343 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


danny0405 merged PR #11164:
URL: https://github.com/apache/hudi/pull/11164


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7720] Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on code in PR #11161:
URL: https://github.com/apache/hudi/pull/11161#discussion_r1591700013


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java:
##
@@ -307,6 +307,10 @@ void 
removeFileGroupsInPendingClustering(Stream fetchAllStoredFileGroups(String partition) {
+if (!isPartitionAvailableInStore(partition)) {

Review Comment:
   We can `partitionToFileGroupsMap.get(partition)` first then decide if it is 
empty or null to eliminate additional lookup for the cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7404] Bloom execution improvements [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #10578:
URL: https://github.com/apache/hudi/pull/10578#issuecomment-2097429242

   
   ## CI report:
   
   * 86a6e24f202a76c316086b59fc69308c57631b4e UNKNOWN
   * 68bf61a85db16d50aa0663be7652874baf30489c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23720)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097428203

   Are you enabling the clustering then? The clustering would rewrite all the 
partitions.
   
   > I think increasing the parameters of retention cleanup will probably 
generate more files
   
   The small files are not affected by the cleaning strategy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7722) Add a GH CI check on the PR branch age

2024-05-06 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7722:

Description: 
We can add a CI check for the branch and make the merging protected based on 
the CI check, if the branch is older than a week, to prevent the build from 
breaking if the branch is not up to date.
 
 

> Add a GH CI check on the PR branch age
> --
>
> Key: HUDI-7722
> URL: https://issues.apache.org/jira/browse/HUDI-7722
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> We can add a CI check for the branch and make the merging protected based on 
> the CI check, if the branch is older than a week, to prevent the build from 
> breaking if the branch is not up to date.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7722) Add a GH CI check on the PR branch age

2024-05-06 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7722:

Fix Version/s: 1.0.0

> Add a GH CI check on the PR branch age
> --
>
> Key: HUDI-7722
> URL: https://issues.apache.org/jira/browse/HUDI-7722
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7722) Add a GH CI check on the PR branch age

2024-05-06 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7722:
---

Assignee: Ethan Guo

> Add a GH CI check on the PR branch age
> --
>
> Key: HUDI-7722
> URL: https://issues.apache.org/jira/browse/HUDI-7722
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7722) Add a GH CI check on the PR branch age

2024-05-06 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7722:
---

 Summary: Add a GH CI check on the PR branch age
 Key: HUDI-7722
 URL: https://issues.apache.org/jira/browse/HUDI-7722
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11164:
URL: https://github.com/apache/hudi/pull/11164#issuecomment-2097423449

   
   ## CI report:
   
   * 10292370450b75b41cd16f93072b3357e468ae85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23723)
 
   * 9721fce3395b53fede64e56fa806e54ca55b75ed Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23724)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7350] Create hudi io factory [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11163:
URL: https://github.com/apache/hudi/pull/11163#issuecomment-2097423416

   
   ## CI report:
   
   * 7376b451044473ce16aad09a1d356a9140442f9c UNKNOWN
   * d491f7ed864af5c291d365dcfe9392a5bbc8dd2d UNKNOWN
   * b7f31f230c12e285cd073bd619bd13f40df75c73 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23719)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11077:
URL: https://github.com/apache/hudi/pull/11077#issuecomment-2097423226

   
   ## CI report:
   
   * 0eff97cd517ed728a93eea9e8aaca05e6eb72650 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23708)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7720] Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups [hudi]

2024-05-06 Thread via GitHub


xuzifu666 commented on code in PR #11161:
URL: https://github.com/apache/hudi/pull/11161#discussion_r1591794126


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java:
##
@@ -307,6 +307,10 @@ void 
removeFileGroupsInPendingClustering(Stream fetchAllStoredFileGroups(String partition) {
+if (!isPartitionAvailableInStore(partition)) {

Review Comment:
   Had changed,do you mean modify like current way? @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11164:
URL: https://github.com/apache/hudi/pull/11164#issuecomment-2097417960

   
   ## CI report:
   
   * 10292370450b75b41cd16f93072b3357e468ae85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23723)
 
   * 9721fce3395b53fede64e56fa806e54ca55b75ed UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11077:
URL: https://github.com/apache/hudi/pull/11077#issuecomment-2097417772

   
   ## CI report:
   
   * 0eff97cd517ed728a93eea9e8aaca05e6eb72650 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix CI ERROR in TestHoodieDeltaStreamer [hudi]

2024-05-06 Thread via GitHub


xuzifu666 closed pull request #11165: [MINOR] Fix CI ERROR in 
TestHoodieDeltaStreamer
URL: https://github.com/apache/hudi/pull/11165


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] Fix CI ERROR in TestHoodieDeltaStreamer [hudi]

2024-05-06 Thread via GitHub


xuzifu666 opened a new pull request, #11165:
URL: https://github.com/apache/hudi/pull/11165

   ### Change Logs
   
   CI fix
   ### Impact
   
   none
   ### Risk level (write none, low medium or high below)
   
   none
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


jonvex commented on code in PR #11164:
URL: https://github.com/apache/hudi/pull/11164#discussion_r1591775499


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java:
##
@@ -2827,7 +2828,7 @@ public void testAutoGenerateRecordKeys() throws Exception 
{
 deltaStreamer.sync();
 assertRecordCount(parquetRecordsCount, tableBasePath, sqlContext);
 // validate that auto record keys are enabled.
-HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setBasePath(tableBasePath).setConf(jsc.hadoopConfiguration()).build();
+HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setBasePath(tableBasePath).setConf(new 
HadoopStorageConfiguration(jsc.hadoopConfiguration())).build();

Review Comment:
   sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


codope commented on code in PR #11164:
URL: https://github.com/apache/hudi/pull/11164#discussion_r1591770204


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java:
##
@@ -2827,7 +2828,7 @@ public void testAutoGenerateRecordKeys() throws Exception 
{
 deltaStreamer.sync();
 assertRecordCount(parquetRecordsCount, tableBasePath, sqlContext);
 // validate that auto record keys are enabled.
-HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setBasePath(tableBasePath).setConf(jsc.hadoopConfiguration()).build();
+HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setBasePath(tableBasePath).setConf(new 
HadoopStorageConfiguration(jsc.hadoopConfiguration())).build();

Review Comment:
   Can't we use the `HoodieTestUtils.getDefaultStorageConf`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11164:
URL: https://github.com/apache/hudi/pull/11164#issuecomment-2097379386

   
   ## CI report:
   
   * 10292370450b75b41cd16f93072b3357e468ae85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23723)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2097379271

   
   ## CI report:
   
   * dca94d809e6f517e82e7b4b41582995465c80676 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23701)
 
   * e96cd9ce1f546e881806dbce71ff178ee89bc0f3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23722)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7350] Create hudi io factory [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11163:
URL: https://github.com/apache/hudi/pull/11163#issuecomment-2097379359

   
   ## CI report:
   
   * 7376b451044473ce16aad09a1d356a9140442f9c UNKNOWN
   * d491f7ed864af5c291d365dcfe9392a5bbc8dd2d UNKNOWN
   * ea258fe4883b5612f52ce68a0c2c33ec2c0ef089 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23715)
 
   * b7f31f230c12e285cd073bd619bd13f40df75c73 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23719)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7587] Make bundle dependencies for storage abstraction in correct order [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11131:
URL: https://github.com/apache/hudi/pull/11131#issuecomment-2097379202

   
   ## CI report:
   
   * 1a5e0d9b0b3bd73b7034ef290fbeaf6aa7a66441 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23705)
 
   * 7c72471a1b9b5ad43ca63ab60da0f3d260f67cea Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23721)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7404] Bloom execution improvements [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #10578:
URL: https://github.com/apache/hudi/pull/10578#issuecomment-2097378545

   
   ## CI report:
   
   * 86a6e24f202a76c316086b59fc69308c57631b4e UNKNOWN
   * 7f76ebca55ef148c1497786bb45bbf9c50ecdee6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22416)
 
   * 68bf61a85db16d50aa0663be7652874baf30489c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23720)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11164:
URL: https://github.com/apache/hudi/pull/11164#issuecomment-2097373679

   
   ## CI report:
   
   * 10292370450b75b41cd16f93072b3357e468ae85 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7350] Create hudi io factory [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11163:
URL: https://github.com/apache/hudi/pull/11163#issuecomment-2097373636

   
   ## CI report:
   
   * 7376b451044473ce16aad09a1d356a9140442f9c UNKNOWN
   * d491f7ed864af5c291d365dcfe9392a5bbc8dd2d UNKNOWN
   * ea258fe4883b5612f52ce68a0c2c33ec2c0ef089 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23715)
 
   * b7f31f230c12e285cd073bd619bd13f40df75c73 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2097373536

   
   ## CI report:
   
   * dca94d809e6f517e82e7b4b41582995465c80676 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23701)
 
   * e96cd9ce1f546e881806dbce71ff178ee89bc0f3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7587] Make bundle dependencies for storage abstraction in correct order [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11131:
URL: https://github.com/apache/hudi/pull/11131#issuecomment-2097373459

   
   ## CI report:
   
   * 1a5e0d9b0b3bd73b7034ef290fbeaf6aa7a66441 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23705)
 
   * 7c72471a1b9b5ad43ca63ab60da0f3d260f67cea UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7404] Bloom execution improvements [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #10578:
URL: https://github.com/apache/hudi/pull/10578#issuecomment-2097372846

   
   ## CI report:
   
   * 86a6e24f202a76c316086b59fc69308c57631b4e UNKNOWN
   * 7f76ebca55ef148c1497786bb45bbf9c50ecdee6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22416)
 
   * 68bf61a85db16d50aa0663be7652874baf30489c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2097367499

   
   ## CI report:
   
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * fbb9dd5d64652ddec923dc7948f77adc61e823b3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23717)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097364669

   Then why are there always small files in the partition a few days ago that 
are constantly rebuilt and deleted? There is no more data written to these 
partitions.
   I think increasing the parameters of retention cleanup will probably 
generate more files
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097357620

   >  clean.retain_commits was 1
   
   That means each time a new version of file generated, the old one would be 
deleted, for "COW" table, there is very high possibility you would encouter 
file missing exception because files are being deleted for every new commits. 
Can you just keep the `clean.retain_commits` as default to give the streaming 
reader some buffer time to read the new files.
   
   > By the way,how to configue the clean.async.enabled
   
   Should be true if you do not want redundant legacy files on the filesystem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7721) Fix broken build on master

2024-05-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7721:
-
Labels: pull-request-available  (was: )

> Fix broken build on master
> --
>
> Key: HUDI-7721
> URL: https://issues.apache.org/jira/browse/HUDI-7721
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
>
> TestHoodieDeltaStreamer is invalid due to 
> [https://github.com/apache/hudi/pull/11099.] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7721] Fix broken build on master [hudi]

2024-05-06 Thread via GitHub


jonvex opened a new pull request, #11164:
URL: https://github.com/apache/hudi/pull/11164

   ### Change Logs
   
   Due to big changes from de-hadooping. A pr was merged that brakes the build 
on master.
   
   ### Impact
   
   Fixes master so it can build.
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7721) Fix broken build on master

2024-05-06 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7721:
-

 Summary: Fix broken build on master
 Key: HUDI-7721
 URL: https://issues.apache.org/jira/browse/HUDI-7721
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Jonathan Vexler
Assignee: Jonathan Vexler


TestHoodieDeltaStreamer is invalid due to 
[https://github.com/apache/hudi/pull/11099.] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2097317204

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * a387882bde246df2a81e8aca30c63835180791c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23718)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097307624

   Small files are continuously merged in the background until 600M. Are you 
sure that if we do this, will the downstream program read less data or read 
repeatedly?
   
   For example, my program is reading a newly generated, only 30M file, after a 
while this file will be merged with other files to 600M large file. So when my 
program reads this big file, doesn't it duplicate the data?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7404] Bloom execution improvements [hudi]

2024-05-06 Thread via GitHub


the-other-tim-brown commented on code in PR #10578:
URL: https://github.com/apache/hudi/pull/10578#discussion_r1591737966


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java:
##
@@ -143,19 +141,17 @@ protected  HoodieData> 
tagLocationInternal(
   protected HoodiePairData 
fetchRecordLocationsForAffectedPartitions(
   HoodieData hoodieKeys, HoodieEngineContext context, 
HoodieTable hoodieTable,
   int parallelism) {
-List affectedPartitionPathList =
-hoodieKeys.map(HoodieKey::getPartitionPath).distinct().collectAsList();
-List> latestBaseFiles =
+HoodieData affectedPartitionPathList =
+hoodieKeys.map(HoodieKey::getPartitionPath).distinct();
+HoodieData> latestBaseFiles =
 getLatestBaseFilesForAllPartitions(affectedPartitionPathList, context, 
hoodieTable);
-return fetchRecordLocations(context, hoodieTable, parallelism, 
latestBaseFiles);
+return fetchRecordLocations(hoodieTable, parallelism, latestBaseFiles);
   }
 
   protected HoodiePairData 
fetchRecordLocations(
-  HoodieEngineContext context, HoodieTable hoodieTable, int parallelism,
-  List> baseFiles) {
-int fetchParallelism = Math.max(1, Math.min(baseFiles.size(), 
parallelism));
-
-return context.parallelize(baseFiles, fetchParallelism)
+  HoodieTable hoodieTable, int parallelism,
+  HoodieData> baseFiles) {
+return baseFiles.repartition(Math.max(1, 
Math.min(baseFiles.getNumPartitions(), parallelism)))

Review Comment:
   There is no coalesce option on HoodieData



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2097304476

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * 3f6b855b6cdace0a26751cc48eeadfe4bd183564 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23692)
 
   * a387882bde246df2a81e8aca30c63835180791c9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23718)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2097305369

   
   ## CI report:
   
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * 5f0a670935552be77adb223d64469b2419b97dc8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23709)
 
   * fbb9dd5d64652ddec923dc7948f77adc61e823b3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23717)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097301076

   小文件会在后台不停的被合并,直到600M为止。你确定我们这样做,会不会导致下游程序读数据少读,重复读呢?
   
   
   > 2024年5月7日 10:14,Danny Chan ***@***.***> 写道:
   > 
   > 
   > We did have the tests already in the repo for clustering and compaction 
skipping read, can you ensure the option takes effect and increase the numbers 
of retained commits before cleaning with option clean.retain_commits.
   > 
   > —
   > Reply to this email directly, view it on GitHub 
, or 
unsubscribe 
.
   > You are receiving this because you are subscribed to this thread.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097295654

   I did not increase the clean.retain_commits,the clean.retain_commits was 1.
   
   By the way,how to configue the clean.async.enabled 
, true or false?
   
   > 2024年5月7日 10:14,Danny Chan ***@***.***> 写道:
   > 
   > 
   > We did have the tests already in the repo for clustering and compaction 
skipping read, can you ensure the option takes effect and increase the numbers 
of retained commits before cleaning with option clean.retain_commits.
   > 
   > —
   > Reply to this email directly, view it on GitHub 
, or 
unsubscribe 
.
   > You are receiving this because you are subscribed to this thread.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2097291256

   
   ## CI report:
   
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * 5f0a670935552be77adb223d64469b2419b97dc8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23709)
 
   * fbb9dd5d64652ddec923dc7948f77adc61e823b3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2097290412

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * 3f6b855b6cdace0a26751cc48eeadfe4bd183564 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23692)
 
   * a387882bde246df2a81e8aca30c63835180791c9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097276769

   We did have the tests already in the repo for clustering and compaction 
skipping read, can you ensure the option takes effect and increase the numbers 
of retained commits before cleaning with option `clean.retain_commits`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7663) Cannot discover new partitons when i using stream reading by flink1.1.6.1-hudi13.1

2024-05-06 Thread weitianpei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weitianpei closed HUDI-7663.

Resolution: Fixed

> Cannot discover new partitons when i using stream reading by 
> flink1.1.6.1-hudi13.1
> --
>
> Key: HUDI-7663
> URL: https://issues.apache.org/jira/browse/HUDI-7663
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink-sql
>Affects Versions: 0.13.1
> Environment: flink1.16.1
> hudi13.1
>Reporter: weitianpei
>Priority: Major
> Attachments: image-2024-04-25-09-53-22-731.png, 
> image-2024-04-25-09-57-42-345.png
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> !image-2024-04-25-09-53-22-731.png!
> I am reading hudi multiple-stage table by flink1.16.1-hudi1.13.1's steam 
> reading feature.
> Today is 2024-04-25,but my program cannot read any new parquet in partition 
> 20240425.
> It just read the data from partition 20240424,since I start my program at 
> yesterday's 2Pm.
> It is the newly partitions shown in the pic below.
> !image-2024-04-25-09-57-42-345.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-7234) Handle both inserts and updates in log blocks for partial updates

2024-05-06 Thread weitianpei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844086#comment-17844086
 ] 

weitianpei commented on HUDI-7234:
--

hi

> Handle both inserts and updates in log blocks for partial updates
> -
>
> Key: HUDI-7234
> URL: https://issues.apache.org/jira/browse/HUDI-7234
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 1.1.0
>
>
> Inserts can be written to log blocks, e.g., Flink.  We need to handle such 
> case for partial updates i.e mix of inserts and partial updates to the same 
> data block. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HUDI-7234) Handle both inserts and updates in log blocks for partial updates

2024-05-06 Thread weitianpei (Jira)


[ https://issues.apache.org/jira/browse/HUDI-7234 ]


weitianpei deleted comment on HUDI-7234:
--

was (Author: weitianpei):
hi

> Handle both inserts and updates in log blocks for partial updates
> -
>
> Key: HUDI-7234
> URL: https://issues.apache.org/jira/browse/HUDI-7234
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 1.1.0
>
>
> Inserts can be written to log blocks, e.g., Flink.  We need to handle such 
> case for partial updates i.e mix of inserts and partial updates to the same 
> data block. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097265211

   Would you mind addind a test to solve this problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097263209

   I added the skip parameter in my downstream flink program,but the same 
problem happend again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097257583

   > clustered still.
   > And the downstream flink program read these files would met FileNOTEXTIES 
exception.
   
   Either clustering and compaction can be skipped in flink streaming read.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097253674

   @codope  when will we solve this problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097252047

   
![7057026C-8A58-428C-BFD3-E2F75085E25D](https://github.com/apache/hudi/assets/30386282/7402d156-0506-47d8-8f38-700c74afcaac)
   please look this pic, April 17th,we found the little files in partition 
20240411 were clustered still.
   And the downstream flink program read these files would met FileNOTEXTIES 
exception.
   The upstream program enabled asynchronous compression configuration. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6778) Track schema in metadata table

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-6778:
-
Status: In Progress  (was: Open)

> Track schema in metadata table
> --
>
> Key: HUDI-6778
> URL: https://issues.apache.org/jira/browse/HUDI-6778
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7146) Implement secondary index

2024-05-06 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7146:
--
Reviewers: Danny Chen

> Implement secondary index
> -
>
> Key: HUDI-7146
> URL: https://issues.apache.org/jira/browse/HUDI-7146
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> # Secondary index schema should be flexible enough to accommodate various 
> kinds of secondary index. 
>  # Reuse as much as possible the existing framework for indexing.
>  # Merge with existing index config and introduce as less configs as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097236100

   If the job is not executing rollback repetitively, these files should be 
just a replacing of "COW" of files, for "COW", we create a new base file to 
replace the old one while the old one would be cleaned based on the cleaning 
configurations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Error using the property hoodie.datasource.write.drop.partition.columns [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on issue #11144:
URL: https://github.com/apache/hudi/issues/11144#issuecomment-2097231302

   The contract here is: the partition **field** shoud be in the table schema 
anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7639) Refactor HoodieFileIndex so that different indexes can be used via optimizer rules

2024-05-06 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7639:
--
Story Points: 5

> Refactor HoodieFileIndex so that different indexes can be used via optimizer 
> rules
> --
>
> Key: HUDI-7639
> URL: https://issues.apache.org/jira/browse/HUDI-7639
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Currently, `HoodieFileIndex` is responsible for partition pruning as well as 
> file skipping. All indexes are being used in 
> [lookupCandidateFilesInMetadataTable|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala#L333]
>  method through if-else branches. This is not only hard to maintain as we add 
> more indexes, but also induces a static hierarchy. Instead, we need more 
> flexibility so that we can alter logical plan based on availability of 
> indexes. For partition pruning in Spark, we already have 
> [HoodiePruneFileSourcePartitions|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodiePruneFileSourcePartitions.scala#L40]
>  rule but it is injected during the operator optimization batch and it does 
> not modify the result of the LogicalPlan. To be fully extensible, we should 
> be able to rewrite the LogicalPlan. We should be able to inject rules after 
> partition pruning after the operator optimization batch and before any CBO 
> rules that depend on stats. Spark provides 
> [injectPreCBORules|https://github.com/apache/spark/blob/6232085227ee2cc4e831996a1ac84c27868a1595/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala#L304]
>  API to do so, however it is only available in Spark 3.1.0 onwards.
> The goal of this ticket is to refactor index hierarchy and create new rules 
> such that Spark version < 3.1.0 still go via the old path, while later 
> versions can modify the plan using an appropriate index and inject as a 
> pre-CBO rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-06 Thread via GitHub


weitianpei commented on issue #11090:
URL: https://github.com/apache/hudi/issues/11090#issuecomment-2097217302

   跟这个参数没有关系,文件不断的持续的在原地进行创建,并删除旧文件
   
   > 2024年4月26日 11:11,Danny Chan ***@***.***> 写道:
   > 
   > 
   > There are some logs that reports the reader progress in the monitor 
operator, you can check that to see if the reader lags too much from the 
producer.
   > 
   > —
   > Reply to this email directly, view it on GitHub 
, or 
unsubscribe 
.
   > You are receiving this because you are subscribed to this thread.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on code in PR #11151:
URL: https://github.com/apache/hudi/pull/11151#discussion_r1591712468


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategyWithMORTable.java:
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.transaction;
+
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.testutils.HoodieCommonTestHarness;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieWriteConflictException;
+
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static 
org.apache.hudi.client.transaction.TestConflictResolutionStrategyUtil.createCommit;
+import static 
org.apache.hudi.client.transaction.TestConflictResolutionStrategyUtil.createCommitMetadata;
+import static 
org.apache.hudi.client.transaction.TestConflictResolutionStrategyUtil.createInflightCommit;
+import static 
org.apache.hudi.client.transaction.TestConflictResolutionStrategyUtil.createPendingCompaction;
+
+public class 
TestSimpleConcurrentFileWritesConflictResolutionStrategyWithMORTable extends 
HoodieCommonTestHarness {
+  @Override
+  protected HoodieTableType getTableType() {
+return HoodieTableType.MERGE_ON_READ;
+  }
+
+  @BeforeEach
+  public void init() throws IOException {
+initMetaClient();
+  }
+
+  @Test
+  public void testConcurrentWritesWithInterleavingInflightCompaction() throws 
Exception {
+createCommit(metaClient.createNewInstantTime(), metaClient);
+HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+// Consider commits before this are all successful.
+Option lastSuccessfulInstant = 
timeline.getCommitsTimeline().filterCompletedInstants().lastInstant();
+
+// Writer 1 starts.
+String currentWriterInstant = metaClient.createNewInstantTime();
+createInflightCommit(currentWriterInstant, metaClient);
+
+// Compaction 1 gets scheduled and becomes inflight.
+String newInstantTime = metaClient.createNewInstantTime();
+createPendingCompaction(newInstantTime, metaClient);
+
+// Writer 1 tries to commit.
+Option currentInstant = Option.of(
+new HoodieInstant(HoodieInstant.State.INFLIGHT, 
HoodieTimeline.DELTA_COMMIT_ACTION, currentWriterInstant));
+HoodieCommitMetadata currentMetadata = 
createCommitMetadata(currentWriterInstant);
+metaClient.reloadActiveTimeline();
+
+// Do conflict resolution.
+SimpleConcurrentFileWritesConflictResolutionStrategy strategy =
+new SimpleConcurrentFileWritesConflictResolutionStrategy();
+List candidateInstants = strategy.getCandidateInstants(
+metaClient, currentInstant.get(), 
lastSuccessfulInstant).collect(Collectors.toList());
+Assertions.assertEquals(1, candidateInstants.size());
+ConcurrentOperation thatCommitOperation = new 
ConcurrentOperation(candidateInstants.get(0), metaClient);
+ConcurrentOperation thisCommitOperation = new 
ConcurrentOperation(currentInstant.get(), currentMetadata);
+Assertions.assertTrue(strategy.hasConflict(thisCommitOperation, 
thatCommitOperation));
+Assertions.assertThrows(
+HoodieWriteConflictException.class,

Review Comment:
   I have left some comments in Lin's last fix, I can not figure a case where 
the requested compaction and writer conflict with each other.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2097199850

   Thanks for the work, I have reviewed and applied a patch here: 
   
[7522.patch.zip](https://github.com/apache/hudi/files/15228302/7522.patch.zip)
   
   Please supplement the tests with your spare time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7720] Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups [hudi]

2024-05-06 Thread via GitHub


danny0405 commented on code in PR #11161:
URL: https://github.com/apache/hudi/pull/11161#discussion_r1591700013


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java:
##
@@ -307,6 +307,10 @@ void 
removeFileGroupsInPendingClustering(Stream fetchAllStoredFileGroups(String partition) {
+if (!isPartitionAvailableInStore(partition)) {

Review Comment:
   We can `partitionToFileGroupsMap.get(partition)` first then decide if it is 
now to eliminate one lookup for the cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7715) Partition TTL for Flink

2024-05-06 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7715.

Resolution: Fixed

Fixed via master branch: fdb94192508a3d76fdba63429d9b0df718316a7e

> Partition TTL for Flink
> ---
>
> Key: HUDI-7715
> URL: https://issues.apache.org/jira/browse/HUDI-7715
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xi chaomin
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7715) Partition TTL for Flink

2024-05-06 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-7715:


Assignee: Danny Chen

> Partition TTL for Flink
> ---
>
> Key: HUDI-7715
> URL: https://issues.apache.org/jira/browse/HUDI-7715
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xi chaomin
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7715] Partition TTL for Flink (#11156)

2024-05-06 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new fdb94192508 [HUDI-7715] Partition TTL for Flink (#11156)
fdb94192508 is described below

commit fdb94192508a3d76fdba63429d9b0df718316a7e
Author: Manu <36392121+x...@users.noreply.github.com>
AuthorDate: Tue May 7 09:00:41 2024 +0800

[HUDI-7715] Partition TTL for Flink (#11156)
---
 .../hudi/table/HoodieFlinkCopyOnWriteTable.java|  3 +-
 .../commit/FlinkPartitionTTLActionExecutor.java| 73 ++
 .../hudi/sink/TestWriterWithPartitionTTl.java  | 89 ++
 .../test/java/org/apache/hudi/utils/TestData.java  |  8 ++
 .../TestHoodieSparkSqlWriterPartitionTTL.scala |  4 +-
 5 files changed, 173 insertions(+), 4 deletions(-)

diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java
index 1ea69d3a109..4fd217ce4bd 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java
@@ -61,6 +61,7 @@ import 
org.apache.hudi.table.action.commit.FlinkInsertCommitActionExecutor;
 import 
org.apache.hudi.table.action.commit.FlinkInsertOverwriteCommitActionExecutor;
 import 
org.apache.hudi.table.action.commit.FlinkInsertOverwriteTableCommitActionExecutor;
 import 
org.apache.hudi.table.action.commit.FlinkInsertPreppedCommitActionExecutor;
+import org.apache.hudi.table.action.commit.FlinkPartitionTTLActionExecutor;
 import org.apache.hudi.table.action.commit.FlinkUpsertCommitActionExecutor;
 import 
org.apache.hudi.table.action.commit.FlinkUpsertPreppedCommitActionExecutor;
 import org.apache.hudi.table.action.rollback.BaseRollbackPlanActionExecutor;
@@ -398,7 +399,7 @@ public class HoodieFlinkCopyOnWriteTable
 
   @Override
   public HoodieWriteMetadata> 
managePartitionTTL(HoodieEngineContext context, String instantTime) {
-throw new HoodieNotSupportedException("Manage partition ttl is not 
supported yet");
+return new FlinkPartitionTTLActionExecutor(context, config, this, 
instantTime).execute();
   }
 
   @Override
diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkPartitionTTLActionExecutor.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkPartitionTTLActionExecutor.java
new file mode 100644
index 000..f167fb5a916
--- /dev/null
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkPartitionTTLActionExecutor.java
@@ -0,0 +1,73 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
+ *  * with the License.  You may obtain a copy of the License at
+ *  *
+ *  *  http://www.apache.org/licenses/LICENSE-2.0
+ *  *
+ *  * Unless required by applicable law or agreed to in writing, software
+ *  * distributed under the License is distributed on an "AS IS" BASIS,
+ *  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  * See the License for the specific language governing permissions and
+ *  * limitations under the License.
+ *
+ */
+
+package org.apache.hudi.table.action.commit;
+
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.config.HoodieWriteConfig;
+import 
org.apache.hudi.exception.HoodieDeletePartitionPendingTableServiceException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.HoodieWriteMetadata;
+import 
org.apache.hudi.table.action.ttl.strategy.HoodiePartitionTTLStrategyFactory;
+import org.apache.hudi.table.action.ttl.strategy.PartitionTTLStrategy;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+
+public class FlinkPartitionTTLActionExecutor extends 
BaseFlinkCommitActionExecutor {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(FlinkPartitionTTLActionExecutor.class);
+
+  public FlinkPartitionTTLActionExecutor(HoodieEngineContext context,
+ HoodieWriteConfig config,
+ HoodieTable table,
+  

Re: [PR] [HUDI-7715] Partition TTL for Flink [hudi]

2024-05-06 Thread via GitHub


danny0405 merged PR #11156:
URL: https://github.com/apache/hudi/pull/11156


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-7234) Handle both inserts and updates in log blocks for partial updates

2024-05-06 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844069#comment-17844069
 ] 

Vinoth Chandar commented on HUDI-7234:
--

this to be handled in 1.1.0 along with partial update encoding cross records

> Handle both inserts and updates in log blocks for partial updates
> -
>
> Key: HUDI-7234
> URL: https://issues.apache.org/jira/browse/HUDI-7234
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 1.0.0
>
>
> Inserts can be written to log blocks, e.g., Flink.  We need to handle such 
> case for partial updates i.e mix of inserts and partial updates to the same 
> data block. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7234) Handle both inserts and updates in log blocks for partial updates

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7234:
-
Status: Open  (was: In Progress)

> Handle both inserts and updates in log blocks for partial updates
> -
>
> Key: HUDI-7234
> URL: https://issues.apache.org/jira/browse/HUDI-7234
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 1.0.0
>
>
> Inserts can be written to log blocks, e.g., Flink.  We need to handle such 
> case for partial updates i.e mix of inserts and partial updates to the same 
> data block. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7234) Handle both inserts and updates in log blocks for partial updates

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7234:
-
Fix Version/s: 1.1.0
   (was: 1.0.0)

> Handle both inserts and updates in log blocks for partial updates
> -
>
> Key: HUDI-7234
> URL: https://issues.apache.org/jira/browse/HUDI-7234
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 1.1.0
>
>
> Inserts can be written to log blocks, e.g., Flink.  We need to handle such 
> case for partial updates i.e mix of inserts and partial updates to the same 
> data block. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7541) Ensure extensibility to new indexes - vectors, search and other formats (CLP, unstructured data)

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7541:
-
Fix Version/s: 1.1.0
   (was: 1.0.0)

> Ensure extensibility to new indexes - vectors, search and other formats (CLP, 
> unstructured data)
> 
>
> Key: HUDI-7541
> URL: https://issues.apache.org/jira/browse/HUDI-7541
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-7541) Ensure extensibility to new indexes - vectors, search and other formats (CLP, unstructured data)

2024-05-06 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844068#comment-17844068
 ] 

Vinoth Chandar commented on HUDI-7541:
--

Punting this to 1.1

> Ensure extensibility to new indexes - vectors, search and other formats (CLP, 
> unstructured data)
> 
>
> Key: HUDI-7541
> URL: https://issues.apache.org/jira/browse/HUDI-7541
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7679) Ensure extensibility to unstructured data, logs (CLP), vectors, other index types

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar closed HUDI-7679.

Resolution: Duplicate

> Ensure extensibility to unstructured data, logs (CLP), vectors, other index 
> types
> -
>
> Key: HUDI-7679
> URL: https://issues.apache.org/jira/browse/HUDI-7679
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7538) Consolidate the CDC Formats (changelog format, RFC-51)

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7538:
-
Fix Version/s: (was: 1.1.0)

> Consolidate the CDC Formats (changelog format, RFC-51)
> --
>
> Key: HUDI-7538
> URL: https://issues.apache.org/jira/browse/HUDI-7538
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> For sake of more consistency, we need to consolidate the the changelog mode 
> (currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
> debezium style change log (currently supported for CoW for Spark/Flink)
>  
> |Format Name|CDC Source Required|Resource Cost(writer)|Resource 
> Cost(reader)|Friendly to Streaming|
> |CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the 
> debezium style output is not what Flink needs for e.g)|
> |Changelog|Yes|low|low|Yes|
> This proposal is to converge onto "CDC" as the path going forward, with the 
> following changes to incorporated for supporting existing users/usage of 
> changelog. CDC format is more generalized in the database world. It offers 
> advantages like not requiring further down-stream processing to say stitch 
> together +U and -U, to update a downstream table. for e.g a field that 
> changed is a key in a downstream table, so we need both +U and -U to compute 
> the updates. 
>  
> (A) Introduce a new "changelog" output mode for CDC queries, which generates 
> I,+U,-U,D format that changelog needs (this can be constructed easily by 
> processing the output of CDC query as follows)
>  * when before is `null`, emit I
>  * when after is `null`, emit D
>  * when both are non-null, emit two records +U and -U
> (B) New writes in 1.0 will *ONLY* produce .cdc changelog format, and stops 
> publishing to _hoodie_operation field 
>  # this means, anyone querying this field, using a snapshot query, will break.
>  # we will bring this back in 1.1 etc, based on user feedback as a 
> hidden/field in the FlinkCatalog.
> (C) To support backwards compatibilty, we fallback to reading 
> `_hoodie_operation` in 0.X tables. 
> For CDC reads, we use first use the CDC log if its avaible for that file 
> slice. If not and base file schema has {{_hoodie_operation}} already, we 
> fallback to reading {{_hoodie_operation}} from base file if 
> mode=OP_KEY_ONLY.. Throw error for other modes. 
> (D) Snapshot queries from spark, presto, trino etc all work with tables, that 
> have `_hoodie_operation` published. 
>  This is already completed for Spark. so others should be easy to do. 
>  
> (E) We need to complete a review of the CDC schema
> ts - should be completion time or instant time?
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-7679) Ensure extensibility to unstructured data, logs (CLP), vectors, other index types

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-7679.
--

> Ensure extensibility to unstructured data, logs (CLP), vectors, other index 
> types
> -
>
> Key: HUDI-7679
> URL: https://issues.apache.org/jira/browse/HUDI-7679
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7234) Handle both inserts and updates in log blocks for partial updates

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7234:
-
Status: In Progress  (was: Open)

> Handle both inserts and updates in log blocks for partial updates
> -
>
> Key: HUDI-7234
> URL: https://issues.apache.org/jira/browse/HUDI-7234
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 1.0.0
>
>
> Inserts can be written to log blocks, e.g., Flink.  We need to handle such 
> case for partial updates i.e mix of inserts and partial updates to the same 
> data block. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7541) Ensure extensibility to new indexes - vectors, search and other formats (CLP, unstructured data)

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7541:
-
Status: In Progress  (was: Open)

> Ensure extensibility to new indexes - vectors, search and other formats (CLP, 
> unstructured data)
> 
>
> Key: HUDI-7541
> URL: https://issues.apache.org/jira/browse/HUDI-7541
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7679) Ensure extensibility to unstructured data, logs (CLP), vectors, other index types

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7679:
-
Status: In Progress  (was: Open)

> Ensure extensibility to unstructured data, logs (CLP), vectors, other index 
> types
> -
>
> Key: HUDI-7679
> URL: https://issues.apache.org/jira/browse/HUDI-7679
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7538) Consolidate the CDC Formats (changelog format, RFC-51)

2024-05-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7538:
-
Fix Version/s: 1.1.0

> Consolidate the CDC Formats (changelog format, RFC-51)
> --
>
> Key: HUDI-7538
> URL: https://issues.apache.org/jira/browse/HUDI-7538
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.1.0, 1.0.0
>
>
> For sake of more consistency, we need to consolidate the the changelog mode 
> (currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
> debezium style change log (currently supported for CoW for Spark/Flink)
>  
> |Format Name|CDC Source Required|Resource Cost(writer)|Resource 
> Cost(reader)|Friendly to Streaming|
> |CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the 
> debezium style output is not what Flink needs for e.g)|
> |Changelog|Yes|low|low|Yes|
> This proposal is to converge onto "CDC" as the path going forward, with the 
> following changes to incorporated for supporting existing users/usage of 
> changelog. CDC format is more generalized in the database world. It offers 
> advantages like not requiring further down-stream processing to say stitch 
> together +U and -U, to update a downstream table. for e.g a field that 
> changed is a key in a downstream table, so we need both +U and -U to compute 
> the updates. 
>  
> (A) Introduce a new "changelog" output mode for CDC queries, which generates 
> I,+U,-U,D format that changelog needs (this can be constructed easily by 
> processing the output of CDC query as follows)
>  * when before is `null`, emit I
>  * when after is `null`, emit D
>  * when both are non-null, emit two records +U and -U
> (B) New writes in 1.0 will *ONLY* produce .cdc changelog format, and stops 
> publishing to _hoodie_operation field 
>  # this means, anyone querying this field, using a snapshot query, will break.
>  # we will bring this back in 1.1 etc, based on user feedback as a 
> hidden/field in the FlinkCatalog.
> (C) To support backwards compatibilty, we fallback to reading 
> `_hoodie_operation` in 0.X tables. 
> For CDC reads, we use first use the CDC log if its avaible for that file 
> slice. If not and base file schema has {{_hoodie_operation}} already, we 
> fallback to reading {{_hoodie_operation}} from base file if 
> mode=OP_KEY_ONLY.. Throw error for other modes. 
> (D) Snapshot queries from spark, presto, trino etc all work with tables, that 
> have `_hoodie_operation` published. 
>  This is already completed for Spark. so others should be easy to do. 
>  
> (E) We need to complete a review of the CDC schema
> ts - should be completion time or instant time?
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7507) ongoing concurrent writers with smaller timestamp can cause issues with table services

2024-05-06 Thread Krishen Bhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishen Bhan updated HUDI-7507:
---
Description: 
*Scenarios:*

Although HUDI operations hold a table lock when creating a .requested instant, 
because HUDI writers do not generate a timestamp and create a .requsted plan in 
the same transaction, there can be a scenario where 
 # Job 1 starts, chooses timestamp (x) , Job 2 starts and chooses timestamp (x 
- 1)
 # Job 1 schedules and creates requested file with instant timestamp (x)
 # Job 2 schedules and creates requested file with instant timestamp (x-1)
 # Both jobs continue running

If one job is writing a commit and the other is a table service, this can cause 
issues:
 * 
 ** If Job 2 is ingestion commit and Job 1 is compaction/log compaction, then 
when Job 1 runs before Job 2 and can create a compaction plan for all instant 
times (up to (x) ) that doesn’t include instant time (x-1) .  Later Job 2 will 
create instant time (x-1), but timeline will be in a corrupted state since 
compaction plan was supposed to include (x-1)
 ** There is a similar issue with clean. If Job2 is a long-running commit (that 
was stuck/delayed for a while before creating its .requested plan) and Job 1 is 
a clean, then Job 1 can perform a clean that updates the 
earliest-commit-to-retain without waiting for the inflight instant by Job 2 at 
(x-1) to complete. This causes Job2 to be "skipped" by clean.

[Edit] I added a diagram to visualize the issue, specifically the second 
scenario with clean

!Flowchart (1).png!

*Proposed approach:*

One way this can be resolved is by combining the operations of generating 
instant time and creating a requested file in the same HUDI table transaction. 
Specifically, executing the following steps whenever any instant (commit, table 
service, etc) is scheduled

Approach A
 # Acquire table lock
 # Look at the latest instant C on the active timeline (completed or not). 
Generate a timestamp after C
 # Create the plan and requested file using this new timestamp ( that is 
greater than C)
 # Release table lock

Unfortunately (A) has the following drawbacks
 * Every operation must now hold the table lock when computing its plan even if 
it's an expensive operation and will take a while
 * Users of HUDI cannot easily set their own instant time of an operation, and 
this restriction would break any public APIs that allow this and would require 
deprecating those APIs.

 

An alternate approach is to have every operation abort creating a .requested 
file unless it has the latest timestamp. Specifically, for any instant type, 
whenever an operation is about to create a .requested plan on timeline, it 
should take the table lock and assert that there are no other instants on 
timeline that are greater than it that could cause a conflict. If that 
assertion fails, then throw a retry-able conflict resolution exception.

Specifically, the following steps should be followed whenever any instant 
(commit, table service, etc) is scheduled

Approach B
 # Acquire table lock. Assume that the desired instant time C and requested 
file plan metadata have already been created, regardless of wether it was 
before this step or right after acquiring the table lock.
 # Get the set of all instants on the timeline that are greater than C 
(regardless of their operation type or sate status). 
 ## If the current operation is an "ingestion" type 
(commit/deltacommit/insert_overwrite replacecommit) then assert the set is 
empty. This is because another "ingestion" operation with a later instant time 
might schedule and execute a compaction at said instant time in MDT, leading 
the table in the aforementioned situation where a compact on MDT is scheduled 
after an inflight ingestion commit.
 ## If the current operation is a "table service" (clean/compaction/cluster) 
then assert that the set doesn't contain any table service instant types 
(clean/compaction/cluster).
 # Create requested plan on timeline (As usual)
 # Release table

Unlike (A), this approach (B) allows users to continue to use HUDI APIs where 
caller can specify instant time (preventing the need from deprecating any 
public API). It also allows the possibility of table service operations 
computing their plan without holding a lock. Despite this though, (B) has 
following drawbacks
 * It is not immediately clear how MDT vs base table operations should be 
handled here. Do we need to update (2) to build it's set from both base table 
and MDT timelines (rather than just MDT)?
 * This error will still be thrown even for scenarios of concurrent operations 
where it would be safe to continue. For example, assume two ingestion writers 
being executing on a dataset, with each only performing a insert commit on the 
dataset (with no table service being scheduled on MDT). If the writer that 
started scheduling later ending up having an earlier timestamp, i

Re: [PR] [HUDI-7350] Create hudi io factory [hudi]

2024-05-06 Thread via GitHub


hudi-bot commented on PR #11163:
URL: https://github.com/apache/hudi/pull/11163#issuecomment-2097080044

   
   ## CI report:
   
   * 7376b451044473ce16aad09a1d356a9140442f9c UNKNOWN
   * d491f7ed864af5c291d365dcfe9392a5bbc8dd2d UNKNOWN
   * ea258fe4883b5612f52ce68a0c2c33ec2c0ef089 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23715)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7350) Introduce HoodieIOFactory to abstract the reader and writer implementation

2024-05-06 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-7350:
--
Status: Patch Available  (was: In Progress)

> Introduce HoodieIOFactory to abstract the reader and writer implementation
> --
>
> Key: HUDI-7350
> URL: https://issues.apache.org/jira/browse/HUDI-7350
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jonathan Vexler
>Priority: Blocker
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >