Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11151:
URL: https://github.com/apache/hudi/pull/11151#issuecomment-2094649256

   
   ## CI report:
   
   * c68b630ed8b878dffc4df1f1074d6fa3987899d9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23663)
 
   * 2985ea2ec2f8a0b62086a6ac9933654051a65738 UNKNOWN
   * bd09a1b36becfbcdc75195427148eb948e384ac5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23664)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11151:
URL: https://github.com/apache/hudi/pull/11151#issuecomment-2094647826

   
   ## CI report:
   
   * c68b630ed8b878dffc4df1f1074d6fa3987899d9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23663)
 
   * 2985ea2ec2f8a0b62086a6ac9933654051a65738 UNKNOWN
   * bd09a1b36becfbcdc75195427148eb948e384ac5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11151:
URL: https://github.com/apache/hudi/pull/11151#issuecomment-2094646391

   
   ## CI report:
   
   * c68b630ed8b878dffc4df1f1074d6fa3987899d9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23663)
 
   * 2985ea2ec2f8a0b62086a6ac9933654051a65738 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11151:
URL: https://github.com/apache/hudi/pull/11151#issuecomment-2094637684

   
   ## CI report:
   
   * c68b630ed8b878dffc4df1f1074d6fa3987899d9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7710] Replace compaction.inflight during conflict resolution [hudi]

2024-05-04 Thread via GitHub


linliu-code opened a new pull request, #11151:
URL: https://github.com/apache/hudi/pull/11151

   ### Change Logs
   
   During conflict resolution between an ingestion writer and compaction, if 
the compaction is in `inflight` state, original logic tries to extract the 
compaction plan from this inflight file, which is NULL and causes NPE issue.
   
   Therefore, we return the `requested` instant.
   
   ### Impact
   
   Fixed a bug.
   
   ### Risk level (write none, low medium or high below)
   
   Low.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11137:
URL: https://github.com/apache/hudi/pull/11137#issuecomment-2094573804

   
   ## CI report:
   
   * adc1380cb496881fd2f1c8b30aa059759c7c5c9c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23662)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11137:
URL: https://github.com/apache/hudi/pull/11137#issuecomment-2094535016

   
   ## CI report:
   
   * a668de4b47df64e2d09b8c1bd0a172271c41a7e3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23644)
 
   * adc1380cb496881fd2f1c8b30aa059759c7c5c9c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23662)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11137:
URL: https://github.com/apache/hudi/pull/11137#issuecomment-2094533717

   
   ## CI report:
   
   * a668de4b47df64e2d09b8c1bd0a172271c41a7e3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23644)
 
   * adc1380cb496881fd2f1c8b30aa059759c7c5c9c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


the-other-tim-brown commented on code in PR #11150:
URL: https://github.com/apache/hudi/pull/11150#discussion_r1590187272


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##
@@ -457,9 +457,10 @@ private Dataset 
readRecordsForGroupAsRow(JavaSparkContext jsc,
 
 String readPathString =
 String.join(",", 
Arrays.stream(paths).map(StoragePath::toString).toArray(String[]::new));
+String globPathString = String.join(",", 
Arrays.stream(paths).map(StoragePath::getParent).map(StoragePath::toString).distinct().toArray(String[]::new));
 params.put("hoodie.datasource.read.paths", readPathString);
 // Building HoodieFileIndex needs this param to decide query path
-params.put("glob.paths", readPathString);
+params.put("glob.paths", globPathString);
 

Review Comment:
   I can't find a test class matching this class name. Is there a clustering 
test suite I should look in?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


danny0405 commented on code in PR #11150:
URL: https://github.com/apache/hudi/pull/11150#discussion_r1590186996


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##
@@ -457,9 +457,10 @@ private Dataset 
readRecordsForGroupAsRow(JavaSparkContext jsc,
 
 String readPathString =
 String.join(",", 
Arrays.stream(paths).map(StoragePath::toString).toArray(String[]::new));
+String globPathString = String.join(",", 
Arrays.stream(paths).map(StoragePath::getParent).map(StoragePath::toString).distinct().toArray(String[]::new));
 params.put("hoodie.datasource.read.paths", readPathString);
 // Building HoodieFileIndex needs this param to decide query path
-params.put("glob.paths", readPathString);
+params.put("glob.paths", globPathString);
 

Review Comment:
   do we have any test cases?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Remove compaction.inflight from conflict resolution [hudi]

2024-05-04 Thread via GitHub


danny0405 commented on code in PR #11148:
URL: https://github.com/apache/hudi/pull/11148#discussion_r1590186936


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/SimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -68,6 +69,7 @@ public Stream 
getCandidateInstants(HoodieTableMetaClient metaClie
 .getTimelineOfActions(CollectionUtils.createSet(REPLACE_COMMIT_ACTION, 
COMPACTION_ACTION))
 .findInstantsAfter(currentInstant.getTimestamp())
 .filterInflightsAndRequested()
+.filter(i -> (!i.getAction().equals(COMPACTION_ACTION)) || 
i.getState().equals(REQUESTED))
 .getInstantsAsStream();

Review Comment:
   I guess if the compaction does not really execute before, there is no need 
to resolve the conflicts, because the log files would slice based on their 
specific completion time. If there is no confclits for the same file group from 
multiple writers, then we are good. @linliu-code , we can add some test cases 
to illustrate this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] update fork count [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11107:
URL: https://github.com/apache/hudi/pull/11107#issuecomment-2094397899

   
   ## CI report:
   
   * 48122188bc0ee8f85d1d14aee3d5c320f2fb7b29 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23661)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11150:
URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094365713

   
   ## CI report:
   
   * 353708c54b454bf3749596f74267970f1c332b7b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23660)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] update fork count [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11107:
URL: https://github.com/apache/hudi/pull/11107#issuecomment-2094354296

   
   ## CI report:
   
   * 9757330d1ed3ff1afb3bc1b08b0f3ece78917045 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23539)
 
   * 48122188bc0ee8f85d1d14aee3d5c320f2fb7b29 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23661)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] update fork count [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11107:
URL: https://github.com/apache/hudi/pull/11107#issuecomment-2094352389

   
   ## CI report:
   
   * 9757330d1ed3ff1afb3bc1b08b0f3ece78917045 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23539)
 
   * 48122188bc0ee8f85d1d14aee3d5c320f2fb7b29 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7707] Enable bundle validation on Java 8 and 11 [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11142:
URL: https://github.com/apache/hudi/pull/11142#issuecomment-2094350107

   
   ## CI report:
   
   * 4d3fc1c3ff0254f545803f93ed361e448245ffaa Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23659)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Remove compaction.inflight from conflict resolution [hudi]

2024-05-04 Thread via GitHub


yihua commented on code in PR #11148:
URL: https://github.com/apache/hudi/pull/11148#discussion_r1590072865


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/SimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -68,6 +69,7 @@ public Stream 
getCandidateInstants(HoodieTableMetaClient metaClie
 .getTimelineOfActions(CollectionUtils.createSet(REPLACE_COMMIT_ACTION, 
COMPACTION_ACTION))
 .findInstantsAfter(currentInstant.getTimestamp())
 .filterInflightsAndRequested()
+.filter(i -> (!i.getAction().equals(COMPACTION_ACTION)) || 
i.getState().equals(REQUESTED))
 .getInstantsAsStream();

Review Comment:
   @linliu-code We still need to check the compaction for conflict correct?  So 
instead of filtering out `compaction.inflight`, we should convert 
`instant.compaction.inflight` to `instant.compaction.request` for checking?  
Can you write a test case for this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11150:
URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094339138

   
   ## CI report:
   
   * 11abd3eb1b9418d9013f820e3779f56c50810dfd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23658)
 
   * 353708c54b454bf3749596f74267970f1c332b7b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23660)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11150:
URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094337137

   
   ## CI report:
   
   * 11abd3eb1b9418d9013f820e3779f56c50810dfd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23658)
 
   * 353708c54b454bf3749596f74267970f1c332b7b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11150:
URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094335218

   
   ## CI report:
   
   * 11abd3eb1b9418d9013f820e3779f56c50810dfd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23658)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7711] Fix MultiTableStreamer can deal with path of properties files [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11149:
URL: https://github.com/apache/hudi/pull/11149#issuecomment-2094335208

   
   ## CI report:
   
   * bb03750d5e951785c6205f501d463614cc3315cf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23657)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7707] Enable bundle validation on Java 8 and 11 [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11142:
URL: https://github.com/apache/hudi/pull/11142#issuecomment-2094335199

   
   ## CI report:
   
   * 7bb8334732cdbdd3cba9868ea66c2c0817559981 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23656)
 
   * 4d3fc1c3ff0254f545803f93ed361e448245ffaa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23659)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11150:
URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094313377

   
   ## CI report:
   
   * 11abd3eb1b9418d9013f820e3779f56c50810dfd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23658)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7707] Enable bundle validation on Java 8 and 11 [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11142:
URL: https://github.com/apache/hudi/pull/11142#issuecomment-2094313307

   
   ## CI report:
   
   * 7bb8334732cdbdd3cba9868ea66c2c0817559981 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23656)
 
   * 4d3fc1c3ff0254f545803f93ed361e448245ffaa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11150:
URL: https://github.com/apache/hudi/pull/11150#issuecomment-2094309158

   
   ## CI report:
   
   * 11abd3eb1b9418d9013f820e3779f56c50810dfd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7711] Fix MultiTableStreamer can deal with path of properties files [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11149:
URL: https://github.com/apache/hudi/pull/11149#issuecomment-2094305741

   
   ## CI report:
   
   * bb03750d5e951785c6205f501d463614cc3315cf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23657)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-04 Thread via GitHub


the-other-tim-brown opened a new pull request, #11150:
URL: https://github.com/apache/hudi/pull/11150

   ### Change Logs
   
   - Fix usages of the glob paths to take in partition level paths instead of 
file level paths in clustering and metadata writing
   
   ### Impact
   
   - Fixes a bug where we see listing calls per file instead of per partition
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7711] Fix MultiTableStreamer can deal with path of properties files [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11149:
URL: https://github.com/apache/hudi/pull/11149#issuecomment-2094288851

   
   ## CI report:
   
   * bb03750d5e951785c6205f501d463614cc3315cf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-05-04 Thread Jihwan Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihwan Lee updated HUDI-7711:
-
Description: 
HudiMultiTableStreamer initializes common configs, then deepcopy related fields 
into each streams.

Because _propsFilePath_ on each streamer is not handled, they always retrieve 
path of test files as default value.

 

Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer should 
be able to have these configs. (such like inheritance)

 

MultiTable configs (kafka-source.properties):

 
{code:java}
...
hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
... {code}
 

 

/tmp/config_1.properties:

 
{code:java}
...
hoodie.datasource.write.recordkey.field=id
hoodie.streamer.source.kafka.topic=topic1
... {code}
 

 

/tmp/config_2.properties:
{code:java}
...
hoodie.datasource.write.recordkey.field=id
hoodie.streamer.source.kafka.topic=topic2
... {code}
 

error log (workspace is replaced to \{RUNNING_PATH}) :

 
{code:java}
24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
properties from dfs from file 
file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
server
24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while running 
MultiTableDeltaStreamer for table: {TABLE}
org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
from file 
file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
        at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
        at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
        at 
org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
        at 
org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
        at 
org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
        at 
org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
        at 
org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
        at 
org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
        at 
org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
        at 
org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
        at 
org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File 
file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties 
does not exist
        at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
        at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
        at 
org.apache.hadoop.fs.Chec

[PR] [HUDI-7711] Fix MultiTableStreamer can deal with path of properties files [hudi]

2024-05-04 Thread via GitHub


hwani3142 opened a new pull request, #11149:
URL: https://github.com/apache/hudi/pull/11149

   ### Change Logs
   fix copy logic on MultiTableStreamer
   
   ### Impact
   
   HoodieMultiTableStreamer 
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7711:
-
Labels: pull-request-available  (was: )

> Fix MultiTableStreamer can deal with path of properties file for each streamer
> --
>
> Key: HUDI-7711
> URL: https://issues.apache.org/jira/browse/HUDI-7711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hudi-utilities
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Priority: Major
>  Labels: pull-request-available
>
> HudiMultiTableStreamer initializes common configs, then deepcopy related 
> fields into each streams.
> Because _propsFilePath_ on each streamer is not handled, they always retrieve 
> path of test files as default value.
>  
> Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer 
> should be able to have these configs. (such like inheritance)
>  
> MultiTable configs (kafka-source.properties):
>  
> {code:java}
> ...
> hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
> hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
> hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
> ... {code}
>  
>  
> /tmp/config_1.properties:
>  
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic1
> ... {code}
>  
>  
> /tmp/config_2.properties:
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic2
> ... {code}
>  
> error log (workspace is replaced to \{RUNNING_PATH}) :
>  
> {code:java}
> 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
> properties from dfs from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
> server
> 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while 
> running MultiTableDeltaStreamer for table: review_processed_data
> org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
> from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>      

[jira] [Created] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-05-04 Thread Jihwan Lee (Jira)
Jihwan Lee created HUDI-7711:


 Summary: Fix MultiTableStreamer can deal with path of properties 
file for each streamer
 Key: HUDI-7711
 URL: https://issues.apache.org/jira/browse/HUDI-7711
 Project: Apache Hudi
  Issue Type: Bug
  Components: hudi-utilities
 Environment: hudi0.14.1, Spark3.2
Reporter: Jihwan Lee


HudiMultiTableStreamer initializes common configs, then deepcopy related fields 
into each streams.

Because _propsFilePath_ on each streamer is not handled, they always retrieve 
path of test files as default value.

 

Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer should 
be able to have these configs. (such like inheritance)

 

MultiTable configs (kafka-source.properties):

 
{code:java}
...
hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
... {code}
 

 

/tmp/config_1.properties:

 
{code:java}
...
hoodie.datasource.write.recordkey.field=id
hoodie.streamer.source.kafka.topic=topic1
... {code}
 

 

/tmp/config_2.properties:
{code:java}
...
hoodie.datasource.write.recordkey.field=id
hoodie.streamer.source.kafka.topic=topic2
... {code}
 

error log (workspace is replaced to \{RUNNING_PATH}) :

 
{code:java}
24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
properties from dfs from file 
file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
server
24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while running 
MultiTableDeltaStreamer for table: review_processed_data
org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
from file 
file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
        at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
        at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
        at 
org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
        at 
org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
        at 
org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
        at 
org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
        at 
org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
        at 
org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
        at 
org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
        at 
org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
        at 
org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File 
file:/home1/irteam/user/jihwan/hudi-util/multi_review/src/test/resources/streamer-config/dfs-source.properties
 does not exist
        at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
        at 
org.apache.hadoop.fs.RawLocalFi

Re: [PR] [HUDI-7707] Enable bundle validation on Java 8 and 11 [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11142:
URL: https://github.com/apache/hudi/pull/11142#issuecomment-2094105336

   
   ## CI report:
   
   * 7bb8334732cdbdd3cba9868ea66c2c0817559981 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23656)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7707] Enable bundle validation on Java 8 and 11 [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11142:
URL: https://github.com/apache/hudi/pull/11142#issuecomment-2094089132

   
   ## CI report:
   
   * fd5383cabb77ad3afc075ee1545e65c7e0613855 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23638)
 
   * 7bb8334732cdbdd3cba9868ea66c2c0817559981 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23656)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7710) BugFix: Remove compaction.inflight from conflict resolution

2024-05-04 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu closed HUDI-7710.
-
Resolution: Fixed

> BugFix: Remove compaction.inflight from conflict resolution
> ---
>
> Key: HUDI-7710
> URL: https://issues.apache.org/jira/browse/HUDI-7710
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: compaction
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Critical
>  Labels: pull-request-available
>
> During conflict resolution, compaction.inflight is found; since they don't 
> contain any plan information, this could cause NPE error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-04 Thread via GitHub


danny0405 commented on code in PR #11077:
URL: https://github.com/apache/hudi/pull/11077#discussion_r1589929342


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieSimpleMergeKey.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import java.io.Serializable;
+import java.util.Objects;
+
+/**
+ * Wraps {@link HoodieKey} and implements the {@link HoodieMergeKey} interface 
for simple scenarios where the key is a string.
+ */
+public class HoodieSimpleMergeKey implements HoodieMergeKey {

Review Comment:
   I'm talking about the notion instead of the physical impl.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-04 Thread via GitHub


danny0405 commented on code in PR #11077:
URL: https://github.com/apache/hudi/pull/11077#discussion_r1589929165


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java:
##
@@ -81,7 +83,7 @@ public class HoodieMergedLogRecordScanner extends 
AbstractHoodieLogRecordReader
   // A timer for calculating elapsed time in millis
   public final HoodieTimer timer = HoodieTimer.create();
   // Map of compacted/merged records
-  private final ExternalSpillableMap records;
+  private final ExternalSpillableMap records;

Review Comment:
   > we do a separate class hierarchy and not overload HoodieKey
   
   Not sure about the specific background here, but make the 
`ExternalSpillableMap` key as `Serializable` does not overload anything?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7707] Enable bundle validation on Java 8 and 11 [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11142:
URL: https://github.com/apache/hudi/pull/11142#issuecomment-2094076123

   
   ## CI report:
   
   * fd5383cabb77ad3afc075ee1545e65c7e0613855 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23638)
 
   * 7bb8334732cdbdd3cba9868ea66c2c0817559981 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (8911aa2d3c7 -> 1c7f8376ade)

2024-05-04 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 8911aa2d3c7 [HUDI-7576] Improve efficiency of 
getRelativePartitionPath, reduce computation of partitionPath in 
AbstractTableFileSystemView (#11001)
 add 1c7f8376ade [HUDI-7710] Remove compaction.inflight from conflict 
resolution (#11148)

No new revisions were added by this update.

Summary of changes:
 .../SimpleConcurrentFileWritesConflictResolutionStrategy.java   | 2 ++
 1 file changed, 2 insertions(+)



Re: [PR] [HUDI-7710] Remove `compaction.inflight` from conflict resolution [hudi]

2024-05-04 Thread via GitHub


danny0405 merged PR #11148:
URL: https://github.com/apache/hudi/pull/11148


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Remove `compaction.inflight` from conflict resolution [hudi]

2024-05-04 Thread via GitHub


danny0405 commented on code in PR #11148:
URL: https://github.com/apache/hudi/pull/11148#discussion_r1589927822


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/SimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -68,6 +69,7 @@ public Stream 
getCandidateInstants(HoodieTableMetaClient metaClie
 .getTimelineOfActions(CollectionUtils.createSet(REPLACE_COMMIT_ACTION, 
COMPACTION_ACTION))
 .findInstantsAfter(currentInstant.getTimestamp())
 .filterInflightsAndRequested()
+.filter(i -> (!i.getAction().equals(COMPACTION_ACTION)) || 
i.getState().equals(REQUESTED))
 .getInstantsAsStream();

Review Comment:
   Looks reasonable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Remove `compaction.inflight` from conflict resolution [hudi]

2024-05-04 Thread via GitHub


hudi-bot commented on PR #11148:
URL: https://github.com/apache/hudi/pull/11148#issuecomment-2094072154

   
   ## CI report:
   
   * 82ace4ec10ccae4108bed6f67674390f905eee7f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23654)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi-rs) branch main updated: ci: fix failing check and test case (#10)

2024-05-04 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hudi-rs.git


The following commit(s) were added to refs/heads/main by this push:
 new 82c1dce  ci: fix failing check and test case (#10)
82c1dce is described below

commit 82c1dce7848b117ec95107e08dfeded6f34e0b37
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sat May 4 02:24:54 2024 -0500

ci: fix failing check and test case (#10)

fixes #4
---
 .licenserc.yaml  | 1 +
 crates/core/src/table/meta_client.rs | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/.licenserc.yaml b/.licenserc.yaml
index 8fb45ba..2ec3964 100644
--- a/.licenserc.yaml
+++ b/.licenserc.yaml
@@ -23,6 +23,7 @@ header:
   paths-ignore:
 - 'LICENSE'
 - 'NOTICE'
+- '**/fixtures/**'
 
   comment: on-failure
 
diff --git a/crates/core/src/table/meta_client.rs 
b/crates/core/src/table/meta_client.rs
index f8c8e41..27f0cf9 100644
--- a/crates/core/src/table/meta_client.rs
+++ b/crates/core/src/table/meta_client.rs
@@ -120,9 +120,10 @@ fn meta_client_get_partition_paths() {
 let target_table_path = extract_test_table(fixture_path);
 let meta_client = MetaClient::new(&target_table_path);
 let partition_paths = meta_client.get_partition_paths().unwrap();
+let partition_path_set: HashSet<&str> = 
HashSet::from_iter(partition_paths.iter().map(|p| p.as_str()));
 assert_eq!(
-partition_paths,
-vec!["chennai", "sao_paulo", "san_francisco"]
+partition_path_set,
+HashSet::from_iter(vec!["chennai", "sao_paulo", "san_francisco"])
 )
 }