[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9123:
URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636682278

   
   ## CI report:
   
   * 92431edca469088ced64b1d92c7bbdc2e44d47a1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18596)
 
   * 05554e2e034b32dd56599e1f62408c123888d9cb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18609)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal opened a new pull request, #9203: [HUDI-6315] [WIP] Feature flag for disabling prepped merge.

2023-07-14 Thread via GitHub


amrishlal opened a new pull request, #9203:
URL: https://github.com/apache/hudi/pull/9203

   ### Change Logs
   
   Add user-defined feature flag for disabling prepped merge.
   
   ### Impact
   
   New feature flag `ENABLE_OPTIMIZED_MERGE_WRITES`
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [X] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [X] Change Logs and Impact were stated clearly
   - [X] Adequate tests were added if applicable
   - [X] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636680266

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 3e318aa173aaa30e984554380d41c7706b1e7061 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18608)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9123:
URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636680322

   
   ## CI report:
   
   * 92431edca469088ced64b1d92c7bbdc2e44d47a1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18596)
 
   * 05554e2e034b32dd56599e1f62408c123888d9cb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup

2023-07-14 Thread via GitHub


xushiyan commented on code in PR #9188:
URL: https://github.com/apache/hudi/pull/9188#discussion_r1264317406


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##
@@ -103,11 +102,6 @@ record -> new ImmutablePair<>(record.getPartitionPath(), 
record.getRecordKey()))
 // Step 3: Tag the incoming records, as inserts or updates, by joining 
with existing record keys
 HoodieData> taggedRecords = 
tagLocationBacktoRecords(keyFilenamePairs, records, hoodieTable);
 
-if (config.getBloomIndexUseCaching()) {

Review Comment:
   the main purpose of this PR is about fixing premature un-persisting like 
this example here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup

2023-07-14 Thread via GitHub


xushiyan commented on code in PR #9188:
URL: https://github.com/apache/hudi/pull/9188#discussion_r1264317245


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##
@@ -75,11 +74,11 @@ public HoodieBloomIndex(HoodieWriteConfig config, 
BaseHoodieBloomIndexHelper blo
   @Override
   public  HoodieData> tagLocation(
   HoodieData> records, HoodieEngineContext context,
-  HoodieTable hoodieTable) {
+  HoodieTable hoodieTable, Option instantTime) {
 // Step 0: cache the input records if needed
-if (config.getBloomIndexUseCaching()) {
-  records.persist(new HoodieConfig(config.getProps())
-  .getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE));
+if (config.getBloomIndexUseCaching() && instantTime.isPresent()) {
+  String storageLevel = 
config.getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE);

Review Comment:
   sounds good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup

2023-07-14 Thread via GitHub


xushiyan commented on code in PR #9188:
URL: https://github.com/apache/hudi/pull/9188#discussion_r1264317221


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java:
##
@@ -80,7 +81,7 @@ public O updateLocation(O writeStatuses, HoodieEngineContext 
context,
   @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
   public abstract  HoodieData> tagLocation(
   HoodieData> records, HoodieEngineContext context,
-  HoodieTable hoodieTable) throws HoodieIndexException;
+  HoodieTable hoodieTable, Option instantTime) throws 
HoodieIndexException;

Review Comment:
   the api is marked as "Evolving" so changes are expected in major release



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636653219

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * de9fca47509129e13c9b3a422261e8c55978faa0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18607)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18606)
 
   * 3e318aa173aaa30e984554380d41c7706b1e7061 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18608)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636641347

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * de9fca47509129e13c9b3a422261e8c55978faa0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18607)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18606)
 
   * 3e318aa173aaa30e984554380d41c7706b1e7061 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


nsivabalan commented on code in PR #9007:
URL: https://github.com/apache/hudi/pull/9007#discussion_r1264287885


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java:
##
@@ -170,7 +170,7 @@ public interface HoodieTimeline extends Serializable {
*
* @return
*/
-  HoodieTimeline filterCompletedInstantsOrRewriteTimeline();
+  HoodieTimeline filterCompletedAndRewriteInstants();

Review Comment:
   we should name it as "filterCompletedRewriteInstants"
   
   rewrite refers to commit, delta commits and replace commits. 
   completed refers to state. 
   



##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java:
##
@@ -53,6 +53,7 @@ public static Comparator 
getReverseCommitTimeComparator() {
 
   /**
* Timeline, based on which all getter work.
+   * This should be a write timeline that contains either completed instants 
or pending compaction instants.

Review Comment:
   can we rename the variable then.
   completedWriteAndCompactionTimeline



##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java:
##
@@ -146,17 +189,22 @@ public static class TimelineDiffResult {
 
 private final List newlySeenInstants;
 private final List finishedCompactionInstants;
-private final List finishedOrRemovedLogCompactionInstants;
+// Completed instants will have true as the value where as instants 
removed due to rollback will have false as value.
+private final List> 
finishedOrRemovedLogCompactionInstants;
+// Completed instants will have true as the value where as instants 
removed due to rollback will have false as value.
+private final List> 
finishedOrRemovedReplaceCommitInstants;
 private final boolean canSyncIncrementally;
 
 public static final TimelineDiffResult UNSAFE_SYNC_RESULT =
-new TimelineDiffResult(null, null, null, false);
+new TimelineDiffResult(null, null, null, null, false);
 
 public TimelineDiffResult(List newlySeenInstants, 
List finishedCompactionInstants,
-  List 
finishedOrRemovedLogCompactionInstants, boolean canSyncIncrementally) {
+  List> 
finishedOrRemovedLogCompactionInstants,
+  List> 
finishedOrRemovedReplaceCommitInstants, boolean canSyncIncrementally) {
   this.newlySeenInstants = newlySeenInstants;
   this.finishedCompactionInstants = finishedCompactionInstants;
   this.finishedOrRemovedLogCompactionInstants = 
finishedOrRemovedLogCompactionInstants;
+  this.finishedOrRemovedReplaceCommitInstants = 
finishedOrRemovedReplaceCommitInstants;

Review Comment:
   if your ans to my previous comment is, only clustering, should we rename all 
these variables accordingly. it might confuse down the line. 



##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java:
##
@@ -74,19 +74,62 @@ public static TimelineDiffResult 
getNewInstantsForIncrementalSync(HoodieTimeline
 
   newTimeline.getInstantsAsStream().filter(instant -> 
!oldTimelineInstants.contains(instant)).forEach(newInstants::add);
 
+  // Check for log compaction commits completed or removed.
   List> logCompactionInstants = 
getPendingLogCompactionTransitions(oldTimeline, newTimeline);
-  List finishedOrRemovedLogCompactionInstants = 
logCompactionInstants.stream()
+  List> 
finishedOrRemovedLogCompactionInstants = logCompactionInstants.stream()
   .filter(instantPair -> !instantPair.getKey().isCompleted()
   && (instantPair.getValue() == null || 
instantPair.getValue().isCompleted()))
-  .map(Pair::getKey).collect(Collectors.toList());
-  return new TimelineDiffResult(newInstants, finishedCompactionInstants, 
finishedOrRemovedLogCompactionInstants, true);
+  .map(instantPair -> (instantPair.getValue() == null)
+  ? Pair.of(instantPair.getKey(), false) : 
Pair.of(instantPair.getKey(), true))
+  .collect(Collectors.toList());
+
+  // Check for replace commits completed or removed.
+  List> replaceCommitInstants = 
getPendingReplaceCommitTransitions(oldTimeline, newTimeline);
+  List> 
finishedOrRemovedReplaceCommitInstants = replaceCommitInstants.stream()
+  .filter(instantPair -> !instantPair.getKey().isCompleted()
+  && (instantPair.getValue() == null || 
instantPair.getValue().isCompleted()))
+  .map(instantPair -> (instantPair.getValue() == null)
+  ? Pair.of(instantPair.getKey(), false) : 
Pair.of(instantPair.getKey(), true))
+  .collect(Collectors.toList());
+
+  // New instants will contains instants that are newly completed commits 
or newly created pending rewrite commits
+  // (i.e. compaction, logcompaciton, replacecommit)
+  // Finished or removed rewrite commits are handled independently.
+  return new TimelineDiffRe

[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9200:
URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636640109

   
   ## CI report:
   
   * 488f2a98894d13f55ff5f233fe47fa99e2bf420c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18591)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18599)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636639990

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * ea6504e78fbb1c776687d3632c5875e74070cebd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18604)
 
   * de9fca47509129e13c9b3a422261e8c55978faa0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18606)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18607)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] suryaprasanna commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


suryaprasanna commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636636242

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636630900

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * ea6504e78fbb1c776687d3632c5875e74070cebd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18604)
 
   * de9fca47509129e13c9b3a422261e8c55978faa0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18606)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636629167

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592)
 
   * ea6504e78fbb1c776687d3632c5875e74070cebd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18604)
 
   * de9fca47509129e13c9b3a422261e8c55978faa0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636627466

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 2ae49bedda144e147341bbed7876a45f1d940ad6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18603)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6530] Fixing the correct resource path (#9202)

2023-07-14 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 37d3d8ef504 [HUDI-6530] Fixing the correct resource path (#9202)
37d3d8ef504 is described below

commit 37d3d8ef504794d64fb87c838bf58bafa8acaa16
Author: lokesh-lingarajan-0310 
<84048984+lokesh-lingarajan-0...@users.noreply.github.com>
AuthorDate: Fri Jul 14 18:58:51 2023 -0700

[HUDI-6530] Fixing the correct resource path (#9202)

Co-authored-by: Lokesh Lingarajan 

---
 .../java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java | 2 +-
 .../test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java
index 653cb823233..83108ee0c7e 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java
@@ -63,7 +63,7 @@ public class TestGcsEventsSource extends UtilitiesTestBase {
 
   @BeforeEach
   public void beforeEach() throws Exception {
-schemaProvider = new 
FilebasedSchemaProvider(Helpers.setupSchemaOnDFS("delta-streamer-config", 
"gcs-metadata.avsc"), jsc);
+schemaProvider = new 
FilebasedSchemaProvider(Helpers.setupSchemaOnDFS("streamer-config", 
"gcs-metadata.avsc"), jsc);
 MockitoAnnotations.initMocks(this);
 
 props = new TypedProperties();
diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java
index 4db47c76784..5ed332a142d 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java
@@ -51,7 +51,7 @@ public class TestS3EventsSource extends 
AbstractCloudObjectsSourceTestBase {
 this.dfsRoot = basePath + "/parquetFiles";
 this.fileSuffix = ".parquet";
 fs.mkdirs(new Path(dfsRoot));
-schemaProvider = new 
FilebasedSchemaProvider(Helpers.setupSchemaOnDFS("delta-streamer-config", 
"s3-metadata.avsc"), jsc);
+schemaProvider = new 
FilebasedSchemaProvider(Helpers.setupSchemaOnDFS("streamer-config", 
"s3-metadata.avsc"), jsc);
   }
 
   @AfterEach



[GitHub] [hudi] nsivabalan merged pull request #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path

2023-07-14 Thread via GitHub


nsivabalan merged PR #9202:
URL: https://github.com/apache/hudi/pull/9202


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636613200

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * f4b2acb51670eebaff53504cc87ee9ebbd214360 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18602)
 
   * 2ae49bedda144e147341bbed7876a45f1d940ad6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18603)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636611094

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592)
 
   * ea6504e78fbb1c776687d3632c5875e74070cebd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18604)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636611049

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * f4b2acb51670eebaff53504cc87ee9ebbd214360 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18602)
 
   * 2ae49bedda144e147341bbed7876a45f1d940ad6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9202:
URL: https://github.com/apache/hudi/pull/9202#issuecomment-1636607988

   
   ## CI report:
   
   * 531177f0d624aed45a00fb6c1778daa867b90fdb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18597)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9123:
URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636607910

   
   ## CI report:
   
   * 92431edca469088ced64b1d92c7bbdc2e44d47a1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18596)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636607791

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * f4b2acb51670eebaff53504cc87ee9ebbd214360 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18602)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636586011

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592)
 
   * ea6504e78fbb1c776687d3632c5875e74070cebd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636585956

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18598)
 
   * e99d8065fc69b2cc354ba1688a2991a5e927eb48 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18600)
 
   * f4b2acb51670eebaff53504cc87ee9ebbd214360 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636581729

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18598)
 
   * e99d8065fc69b2cc354ba1688a2991a5e927eb48 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636578463

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18598)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636556240

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 4bea1208f1ba87ad8dd35e0ef55501cd4ffcee11 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18595)
 
   * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18598)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #9183: [SUPPORT] Glue 4.0 Hudi 0.12.1 PreCommit validator i.e SqlQueryEqualityPreCommitValidator is not working

2023-07-14 Thread via GitHub


soumilshah1995 commented on issue #9183:
URL: https://github.com/apache/hudi/issues/9183#issuecomment-1636551169

   Here is Video Tutorials 
   https://www.youtube.com/watch?v=KNzs9dj_Btc&t=73s
   
   # Tested 
   ```
   try:
   from pyspark.sql import SparkSession
   import os
   import sys
   import uuid
   from datetime import datetime
   from faker import Faker
   except Exception as e:
   print("Error: ", e)
   
   hudi_version = '0.13.1'
   jar_file = 'hudi-spark3.3-bundle_2.12-0.14.0-SNAPSHOT.jar'
   os.environ['PYSPARK_SUBMIT_ARGS'] = f"--jars {jar_file} pyspark-shell"
   os.environ['PYSPARK_PYTHON'] = sys.executable
   os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable
   
   spark = SparkSession.builder \
   .config('spark.serializer', 
'org.apache.spark.serializer.KryoSerializer') \
   .config('spark.jars', jar_file) \
   .config('spark.sql.hive.convertMetastoreParquet', 'false') \
   .getOrCreate()
   
   db_name = "hudidb"
   table_name = "pre_commit_validator"
   recordkey = 'uuid'
   precombine = 'precomb'
   method = 'upsert'
   table_type = "COPY_ON_WRITE"
   validator_query = """SELECT COUNT(*) FROM  WHERE message IS 
NULL;"""
   path = f"file:///C:/tmp/{db_name}/{table_name}"
   
   hudi_options = {
   'hoodie.table.name': table_name,
   'hoodie.datasource.write.recordkey.field': recordkey,
   'hoodie.datasource.write.table.name': table_name,
   'hoodie.datasource.write.operation': method,
   'hoodie.datasource.write.precombine.field': precombine,
   'hoodie.upsert.shuffle.parallelism': 2,
   'hoodie.insert.shuffle.parallelism': 2,
   "hoodie.precommit.validators": 
"org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator",
   "hoodie.precommit.validators.equality.sql.queries": validator_query
   }
   
   spark_df = spark.createDataFrame(data=[
   (1, "This is APPEND 1", 111, "1"),
   (2, "This is APPEND 2", 222, "2"), ],
   schema=["uuid", "message", "precomb", "partition"])
   
   
spark_df.write.format("hudi").options(**hudi_options).mode("append").save(path)
   
spark.read.format("hudi").load(path).createOrReplaceTempView("hudi_snapshots")
   spark.sql("select * from hudi_snapshots").show(truncate=False)
   
   
   spark_df = spark.createDataFrame(
   data=[
   (4, None, 444, None),
   (5, "This is APPEND 5", 555, "5"),
   ],
   schema=["uuid", "message", "precomb", "partition"])
   spark_df.show()
   
spark_df.write.format("hudi").options(**hudi_options).mode("append").save(path)
   
spark.read.format("hudi").load(path).createOrReplaceTempView("hudi_snapshots")
   spark.sql("select * from hudi_snapshots").show(truncate=False)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636550496

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 4bea1208f1ba87ad8dd35e0ef55501cd4ffcee11 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18595)
 
   * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9200:
URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636545849

   
   ## CI report:
   
   * 488f2a98894d13f55ff5f233fe47fa99e2bf420c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18591)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18599)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9188:
URL: https://github.com/apache/hudi/pull/9188#issuecomment-1636545795

   
   ## CI report:
   
   * 4c247d87685c4900a327275aa8e3b5909554ad36 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18565)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636545530

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636545412

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 4bea1208f1ba87ad8dd35e0ef55501cd4ffcee11 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18595)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0

2023-07-14 Thread via GitHub


voonhous commented on PR #9200:
URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636540345

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0

2023-07-14 Thread via GitHub


voonhous commented on PR #9200:
URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636540046

   @hudu-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9202:
URL: https://github.com/apache/hudi/pull/9202#issuecomment-1636513847

   
   ## CI report:
   
   * 531177f0d624aed45a00fb6c1778daa867b90fdb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18597)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9123:
URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636513735

   
   ## CI report:
   
   * c4b55caaa515af207aa3ba1bef87cb1568d9b38a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18589)
 
   * 92431edca469088ced64b1d92c7bbdc2e44d47a1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18596)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636513486

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 7b79a304400af94497a6dd50cb8a3116531504c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18593)
 
   * ba905eb083f5de4e2a3055cc0e137ca218ec1e96 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18594)
 
   * 4bea1208f1ba87ad8dd35e0ef55501cd4ffcee11 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9202:
URL: https://github.com/apache/hudi/pull/9202#issuecomment-1636508263

   
   ## CI report:
   
   * 531177f0d624aed45a00fb6c1778daa867b90fdb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9123:
URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636508104

   
   ## CI report:
   
   * c4b55caaa515af207aa3ba1bef87cb1568d9b38a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18589)
 
   * 92431edca469088ced64b1d92c7bbdc2e44d47a1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636507859

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 7b79a304400af94497a6dd50cb8a3116531504c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18593)
 
   * ba905eb083f5de4e2a3055cc0e137ca218ec1e96 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9200:
URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636503157

   
   ## CI report:
   
   * 488f2a98894d13f55ff5f233fe47fa99e2bf420c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18591)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9133:
URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636503019

   
   ## CI report:
   
   * 71d4fd08f41e4aab163a92e82a15e35cf9c79ea0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18590)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636502733

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 7b79a304400af94497a6dd50cb8a3116531504c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18593)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lokesh-lingarajan-0310 opened a new pull request, #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path

2023-07-14 Thread via GitHub


lokesh-lingarajan-0310 opened a new pull request, #9202:
URL: https://github.com/apache/hudi/pull/9202

   ### Change Logs
   
   Fixing the correct resource path
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   No
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636464975

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 9206f0ec85caee9b9e351820692affa370906291 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18581)
 
   * 7b79a304400af94497a6dd50cb8a3116531504c6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18593)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17

2023-07-14 Thread via GitHub


hudi-bot commented on PR #8949:
URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636457298

   
   ## CI report:
   
   * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN
   * 9206f0ec85caee9b9e351820692affa370906291 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18581)
 
   * 7b79a304400af94497a6dd50cb8a3116531504c6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #8400: [SUPPORT] Hudi Offline Compaction in EMR Serverless 6.10 for YouTube Video

2023-07-14 Thread via GitHub


soumilshah1995 commented on issue #8400:
URL: https://github.com/apache/hudi/issues/8400#issuecomment-1636454180

   @AmareshB 
   
   Sure 
   
   @AmareshB 
   
   # Step 1 : Create EMR 6.11 Cluster 
   
![image](https://github.com/apache/hudi/assets/39345855/320ac005-344f-4da6-a02c-1cdad5462226)
   
   # Step2 : Create MOR table 
   ```
   try:
   import sys
   import os
   from pyspark.context import SparkContext
   from pyspark.sql.session import SparkSession
   from awsglue.context import GlueContext
   from awsglue.job import Job
   from awsglue.dynamicframe import DynamicFrame
   from pyspark.sql.functions import col, to_timestamp, 
monotonically_increasing_id, to_date, when
   from pyspark.sql.functions import *
   from awsglue.utils import getResolvedOptions
   from pyspark.sql.types import *
   from datetime import datetime, date
   import boto3
   from functools import reduce
   from pyspark.sql import Row
   
   import uuid
   from faker import Faker
   except Exception as e:
   print("Modules are missing : {} ".format(e))
   
   spark = (SparkSession.builder.config('spark.serializer', 
'org.apache.spark.serializer.KryoSerializer') \
.config('spark.sql.hive.convertMetastoreParquet', 'false') \
.config('spark.sql.catalog.spark_catalog', 
'org.apache.spark.sql.hudi.catalog.HoodieCatalog') \
.config('spark.sql.extensions', 
'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
.config('spark.sql.legacy.pathOptionBehavior.enabled', 
'true').getOrCreate())
   
   sc = spark.sparkContext
   glueContext = GlueContext(sc)
   job = Job(glueContext)
   logger = glueContext.get_logger()
   
   # =INSERTING DATA 
=
   global faker
   faker = Faker()
   
   
   class DataGenerator(object):
   
   @staticmethod
   def get_data():
   return [
   (
   x,
   faker.name(),
   faker.random_element(elements=('IT', 'HR', 'Sales', 
'Marketing')),
   faker.random_element(elements=('CA', 'NY', 'TX', 'FL', 'IL', 
'RJ')),
   str(faker.random_int(min=1, max=15)),
   str(faker.random_int(min=18, max=60)),
   str(faker.random_int(min=0, max=10)),
   str(faker.unix_time()),
   faker.email(),
   faker.credit_card_number(card_type='amex'),
   
   ) for x in range(5)
   ]
   
   
   # == Settings 
===
   db_name = "hudidb"
   table_name = "employees"
   recordkey = 'emp_id'
   precombine = "ts"
   PARTITION_FIELD = 'state'
   path = "s3://soumilshah-hudi-demos/hudi/"
   method = 'upsert'
   table_type = "MERGE_ON_READ"
   # 

   
   hudi_part_write_config = {
   'className': 'org.apache.hudi',
   
   'hoodie.table.name': table_name,
   'hoodie.datasource.write.table.type': table_type,
   'hoodie.datasource.write.operation': method,
   'hoodie.datasource.write.recordkey.field': recordkey,
   'hoodie.datasource.write.precombine.field': precombine,
   "hoodie.schema.on.read.enable": "true",
   "hoodie.datasource.write.reconcile.schema": "true",
   
   'hoodie.datasource.hive_sync.mode': 'hms',
   'hoodie.datasource.hive_sync.enable': 'true',
   'hoodie.datasource.hive_sync.use_jdbc': 'false',
   'hoodie.datasource.hive_sync.support_timestamp': 'false',
   'hoodie.datasource.hive_sync.database': db_name,
   'hoodie.datasource.hive_sync.table': table_name
   
   , "hoodie.compact.inline": "false"
   , 'hoodie.compact.schedule.inline': 'true'
   , "hoodie.metadata.index.check.timeout.seconds": "60"
   , "hoodie.write.concurrency.mode": "optimistic_concurrency_control"
   , "hoodie.write.lock.provider": 
"org.apache.hudi.client.transaction.lock.InProcessLockProvider"
   
   }
   
   
   # 
   """Create Spark Data Frame """
   # 
   data = DataGenerator.get_data()
   
   columns = ["emp_id", "employee_name", "department", "state", "salary", 
"age", "bonus", "ts"]
   df = spark.createDataFrame(data=data, schema=columns)
   
df.write.format("hudi").options(**hudi_part_write_config).mode("overwrite").save(path)
   
   
   # 
   """APPEND """
   # 
   
   impleDataUpd = [
   (6, "This is APPEND", "Sales", "RJ", 81000, 30, 23000, 827307999),
   (7, "This is APPEND", "Engineering", "RJ", 79000, 53, 15000, 1627694678),
   ]
   
   columns = ["emp_id", "employee_name", "department", "state", "salary", 

[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9123:
URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636450633

   
   ## CI report:
   
   * c4b55caaa515af207aa3ba1bef87cb1568d9b38a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18589)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup

2023-07-14 Thread via GitHub


nsivabalan commented on code in PR #9188:
URL: https://github.com/apache/hudi/pull/9188#discussion_r1264147849


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##
@@ -103,11 +102,6 @@ record -> new ImmutablePair<>(record.getPartitionPath(), 
record.getRecordKey()))
 // Step 3: Tag the incoming records, as inserts or updates, by joining 
with existing record keys
 HoodieData> taggedRecords = 
tagLocationBacktoRecords(keyFilenamePairs, records, hoodieTable);
 
-if (config.getBloomIndexUseCaching()) {

Review Comment:
   I guess this was intentional. After this, taggedRecords is what is getting 
used. and we do cache that in BaseSparkCommitActionExecutor.execute
   ```
@Override
 public HoodieWriteMetadata> 
execute(HoodieData> inputRecords) {
   // Cache the tagged records, so we don't end up computing both
   JavaRDD> inputRDD = 
HoodieJavaRDD.getJavaRDD(inputRecords);
   if (inputRDD.getStorageLevel() == StorageLevel.NONE()) {
 
HoodieJavaRDD.of(inputRDD).persist(config.getTaggedRecordStorageLevel(),
 context, HoodieDataCacheKey.of(config.getBasePath(), instantTime));
   } else {
 LOG.info("RDD PreppedRecords was persisted at: " + 
inputRDD.getStorageLevel());
   }
   .
   .
   ```
   
   So, not sure if we want to keep the persistance until the very end for these 
rdds which may not be used only. 
   
   



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java:
##
@@ -80,7 +81,7 @@ public O updateLocation(O writeStatuses, HoodieEngineContext 
context,
   @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
   public abstract  HoodieData> tagLocation(
   HoodieData> records, HoodieEngineContext context,
-  HoodieTable hoodieTable) throws HoodieIndexException;
+  HoodieTable hoodieTable, Option instantTime) throws 
HoodieIndexException;

Review Comment:
   this is a public api. we might have to deprecate and add a new one if we 
wish to change the signature



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##
@@ -75,11 +74,11 @@ public HoodieBloomIndex(HoodieWriteConfig config, 
BaseHoodieBloomIndexHelper blo
   @Override
   public  HoodieData> tagLocation(
   HoodieData> records, HoodieEngineContext context,
-  HoodieTable hoodieTable) {
+  HoodieTable hoodieTable, Option instantTime) {
 // Step 0: cache the input records if needed
-if (config.getBloomIndexUseCaching()) {
-  records.persist(new HoodieConfig(config.getProps())
-  .getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE));
+if (config.getBloomIndexUseCaching() && instantTime.isPresent()) {
+  String storageLevel = 
config.getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE);

Review Comment:
   can we move this to constructor and use it everywhere instead of parsing 
multiple times?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Vsevolod3 opened a new issue, #9201: [SUPPORT] Flink - Async Compaction Not Triggered With time_elapsed as COMPACTION_TRIGGER_STRATEGY

2023-07-14 Thread via GitHub


Vsevolod3 opened a new issue, #9201:
URL: https://github.com/apache/hudi/issues/9201

   I am running a Flink (1.15.2) job in EMR (emr-6.9.0), reading records from 
Kafka and writing them to S3 using Hudi (1.13.0). The table type is MoR and 
properties for compaction are COMPACTION_ASYNC_ENABLED = true and 
COMPACTION_TRIGGER_STRATEGY = time_elapsed .
   
   ## To Reproduce
   
   Steps to reproduce the behavior:
   
   1. Submit Flink job to EMR cluster (set COMPACTION_ASYNC_ENABLED = true, 
COMPACTION_TRIGGER_STRATEGY = time_elapsed, and COMPACTION_DELTA_SECONDS = 600)
   2. Load data (not exceeding 3 commits per file ID)
   3. Wait for > 600 seconds.
   
   ### Full list of Hudi properties for reference
   ```sql
 'index.type' = 'FLINK_STATE',
 'compaction.schedule.enabled' = 'true',
 'hoodie.index.bucket.engine' = 'SIMPLE',
 'clustering.plan.strategy.sort.columns' = 'acct_id',
 'write.bucket_assign.tasks' = '3',
 'compaction.delta_seconds' = '300',
 'clustering.delta_commits' = '4',
 'clustering.plan.strategy.small.file.limit' = '600',
 'compaction.async.enabled' = 'true',
 'compaction.max_memory' = '1024',
 'hoodie.parquet.max.file.size' = '125829120',
 'read.streaming.enabled' = 'false',
 'path' = 's3://my_bucket/my_path/account/',
 'hoodie.logfile.max.size' = '1073741824',
 'hoodie.datasource.write.hive_style_partitioning' = 'true',
 'hoodie.parquet.compression.ratio' = '0.1',
 'hoodie.parquet.small.file.limit' = '104857600',
 'hoodie.bucket.index.hash.field' = 'acct_id',
 'compaction.tasks' = '3',
 'precombine.field' = 'update_ts',
 'write.task.max.size' = '4094',
 'hoodie.parquet.compression.codec' = 'snappy',
 'compaction.delta_commits' = '3',
 'clustering.tasks' = '3',
 'compaction.trigger.strategy' = 'time_elapsed',
 'hoodie.bucket.index.num.buckets' = '256',
 'read.tasks' = '3',
 'compaction.timeout.seconds' = '1200',
 'clustering.async.enabled' = 'true',
 'table.type' = 'MERGE_ON_READ',
 'metadata.compaction.delta_commits' = '10',
 'clustering.plan.strategy.max.num.groups' = '30',
 'write.tasks' = '3',
 'clustering.schedule.enabled' = 'false',
 'hoodie.logfile.data.block.format' = 'avro',
 'write.batch.size' = '4094.0',
 'write.sort.memory' = '4094'
   ```
   
   ## Expected behavior
   
   Compaction should be run after about 5 minutes of the job tasks being fully 
started.
   
   ## Environment Description
   
   * Hudi version : 0.13.0
   * Spark version : N/A (using Flink 1.15.2)
   * Hive version : tbd
   * Hadoop version : emr-6.9.0
   * Storage (HDFS/S3/GCS..) : s3
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   No errors are logged in Flink for this.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9188:
URL: https://github.com/apache/hudi/pull/9188#issuecomment-1636322662

   
   ## CI report:
   
   * 4c247d87685c4900a327275aa8e3b5909554ad36 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18565)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636322264

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 07eb1aa79162259f3ac79e61bec621f68afb5551 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18588)
 
   * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni commented on pull request #8716: [HUDI-6226] Support parquet native bloom filters

2023-07-14 Thread via GitHub


parisni commented on PR #8716:
URL: https://github.com/apache/hudi/pull/8716#issuecomment-1636309573

   @nsivabalan
   
   There is existing spark benchmarks here. Basically 20% slower for writes and 
up to 4x for reads. 
https://github.com/apache/spark/blob/18d0a276c501a102af3e7ed251831983b9148a4f/sql/core/benchmarks/BloomFilterBenchmark-jdk11-results.txt
   
   
   As for documentation plz consider this pr 
https://github.com/apache/hudi/pull/9056
   
   On July 14, 2023 6:02:18 PM UTC, Sivabalan Narayanan ***@***.***> wrote:
   >hey @parisni : good job on the patch. Curious to know if you have any perf 
nos on this. on both write and read side. whats the perf overhead we are seeing 
on the write side and how much improvement we are seeing on the read side w/ 
the bloom filter. 
   >
   >Also, would you provide a short write up(whats this support is all about, 
how users can leverage this and whats the benefit) on this that we can use it 
in our release page? 
   >
   >-- 
   >Reply to this email directly or view it on GitHub:
   >https://github.com/apache/hudi/pull/8716#issuecomment-1636201917
   >You are receiving this because you were mentioned.
   >
   >Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9188:
URL: https://github.com/apache/hudi/pull/9188#issuecomment-1636308856

   
   ## CI report:
   
   * 4c247d87685c4900a327275aa8e3b5909554ad36 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636308312

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585)
 
   * 07eb1aa79162259f3ac79e61bec621f68afb5551 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18588)
 
   * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9200:
URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636231942

   
   ## CI report:
   
   * 488f2a98894d13f55ff5f233fe47fa99e2bf420c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18591)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9133:
URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636231720

   
   ## CI report:
   
   * 76ba7ad679da5e445d7503a070f00dfb1814b1e4 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18587)
 
   * 71d4fd08f41e4aab163a92e82a15e35cf9c79ea0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18590)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636231430

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585)
 
   * 07eb1aa79162259f3ac79e61bec621f68afb5551 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18588)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9200:
URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636223402

   
   ## CI report:
   
   * 488f2a98894d13f55ff5f233fe47fa99e2bf420c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9133:
URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636223141

   
   ## CI report:
   
   * 07deb3c1400d4fc530e434f6f9b74cb7640c7e47 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18403)
 
   * 76ba7ad679da5e445d7503a070f00dfb1814b1e4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18587)
 
   * 71d4fd08f41e4aab163a92e82a15e35cf9c79ea0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9123:
URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636223046

   
   ## CI report:
   
   * ca0ec686f26a2786bc350f3dfb1a83baf3bc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18582)
 
   * c4b55caaa515af207aa3ba1bef87cb1568d9b38a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18589)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636222736

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585)
 
   * 07eb1aa79162259f3ac79e61bec621f68afb5551 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9123:
URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636212208

   
   ## CI report:
   
   * ca0ec686f26a2786bc350f3dfb1a83baf3bc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18582)
 
   * c4b55caaa515af207aa3ba1bef87cb1568d9b38a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6538) Refactor methods in TimelineDiffHelper class

2023-07-14 Thread Surya Prasanna Yalla (Jira)
Surya Prasanna Yalla created HUDI-6538:
--

 Summary: Refactor methods in TimelineDiffHelper class
 Key: HUDI-6538
 URL: https://issues.apache.org/jira/browse/HUDI-6538
 Project: Apache Hudi
  Issue Type: Task
Reporter: Surya Prasanna Yalla


Refactor methods in TimelineDiffHelper class to address following comment in 
[PR-9007|https://github.com/apache/hudi/pull/9007]

 
{code:java}
The methods getPendingReplaceCommitTransitions and 
getPendingLogCompactionTransitions look almost the same except the action type, 
can we abstract a little to merge them altogether?{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] nsivabalan commented on pull request #8716: [HUDI-6226] Support parquet native bloom filters

2023-07-14 Thread via GitHub


nsivabalan commented on PR #8716:
URL: https://github.com/apache/hudi/pull/8716#issuecomment-1636201917

   hey @parisni : good job on the patch. Curious to know if you have any perf 
nos on this. on both write and read side. whats the perf overhead we are seeing 
on the write side and how much improvement we are seeing on the read side w/ 
the bloom filter. 
   
   Also, would you provide a short write up(whats this support is all about, 
how users can leverage this and whats the benefit) on this that we can use it 
in our release page? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9121: [HUDI-6476] Improve the performance of getAllPartitionPaths

2023-07-14 Thread via GitHub


nsivabalan commented on code in PR #9121:
URL: https://github.com/apache/hudi/pull/9121#discussion_r1263994796


##
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##
@@ -106,42 +107,33 @@ private List 
getPartitionPathWithPathPrefix(String relativePathPrefix) t
   // TODO: Get the parallelism from HoodieWriteConfig
   int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, 
pathsToList.size());
 
-  // List all directories in parallel
+  // List all directories in parallel:
+  // if current dictionary contains PartitionMetadata, add it to result
+  // if current dictionary does not contain PartitionMetadata, add its 
subdirectory to queue to be processed.
   engineContext.setJobStatus(this.getClass().getSimpleName(), "Listing all 
partitions with prefix " + relativePathPrefix);
-  List dirToFileListing = engineContext.flatMap(pathsToList, 
path -> {
+  // result below holds a list of pair. first entry in the pair optionally 
holds the deduced list of partitions.
+  // and second entry holds optionally a directory path to be processed 
further.
+  List, Option>> result = 
engineContext.flatMap(pathsToList, path -> {
 FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-return Arrays.stream(fileSystem.listStatus(path));
+if (HoodiePartitionMetadata.hasPartitionMetadata(fileSystem, path)) {
+  return 
Stream.of(Pair.of(Option.of(FSUtils.getRelativePartitionPath(new 
Path(datasetBasePath), path)), Option.empty()));

Review Comment:
   partition meta file could have extensions like parquet, orc etc. did we 
consider that?
   
   this was in previous code: 
   
fileStatus.getPath().getName().startsWith(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX)
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6537) Bump checkstyle version

2023-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6537:
-
Labels: pull-request-available  (was: )

> Bump checkstyle version
> ---
>
> Key: HUDI-6537
> URL: https://issues.apache.org/jira/browse/HUDI-6537
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>  Labels: pull-request-available
>
> Encountered an ambiguous checkstyle error here:
> {code:java}
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
> 1058at java.lang.String.substring (String.java:1967)
> 1059at org.apache.maven.plugins.checkstyle.RuleUtil.getCategory 
> (RuleUtil.java:95)
> 1060at 
> org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.countViolations
>  (CheckstyleViolationCheckMojo.java:646)
> 1061at 
> org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.execute 
> (CheckstyleViolationCheckMojo.java:564)
> 1062at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
> (DefaultBuildPluginManager.java:137)
> 1063at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 
> (MojoExecutor.java:370)
> 1064at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute 
> (MojoExecutor.java:351)
> 1065at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:215)
> 1066at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:171)
> 1067at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:163)
> 1068at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:117)
> 1069at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:81)
> 1070at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:56)
> 1071at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:128)
> 1072at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:299)
> 1073at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:193)
> 1074at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:106)
> 1075at org.apache.maven.cli.MavenCli.execute (MavenCli.java:963)
> 1076at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296)
> 1077at org.apache.maven.cli.MavenCli.main (MavenCli.java:199)
> 1078at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> 1079at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> 1080at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> 1081at java.lang.reflect.Method.invoke (Method.java:498)
> 1082at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:282)
> 1083at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:225)
> 1084at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:406)
> 1085at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:347)
> 1086 {code}
> [https://github.com/apache/hudi/actions/runs/5556435429/jobs/10148956808?pr=9133]
>  
> Running the code in the same state iwith checkstyle:3.1.0 will throw the 
> error below (expected):
> {code:java}
> final CastMapConverter[] converters = IntStream.
> range(0, fromChildren.size())
> .mapToObj(i -> {
>   LogicalType fromChild = fromChildren.get(i);
>   LogicalType toChild = toChildren.get(i);
>   if (isPrimitiveTypeRootEqual(fromChild.getTypeRoot(), 
> toChild.getTypeRoot())) {
> return createNoOpConverter();
> ...
> [ERROR] 
> src/main/java/org/apache/hudi/table/format/CastMapConverters.java:[315,52] 
> (extension) SeparatorWrapDot: '.' should be on a new line.
>  {code}
> Bug describing this issue:
> https://issues.apache.org/jira/browse/MCHECKSTYLE-347



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] voonhous opened a new pull request, #9200: [HUDI-6537] Bump checkstyle version to 3.1.0

2023-07-14 Thread via GitHub


voonhous opened a new pull request, #9200:
URL: https://github.com/apache/hudi/pull/9200

   ### Change Logs
   
   Bump checkstyle version to 3.1.0 due to an ambiguous checkstyle message that 
was thrown as shown below:
   
   ```
   Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1
   at java.lang.String.substring (String.java:1967)
   at org.apache.maven.plugins.checkstyle.RuleUtil.getCategory 
(RuleUtil.java:95)
   at 
org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.countViolations
 (CheckstyleViolationCheckMojo.java:646)
   at 
org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.execute 
(CheckstyleViolationCheckMojo.java:564)
   at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
(DefaultBuildPluginManager.java:137)
   at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 
(MojoExecutor.java:370)
   at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute 
(MojoExecutor.java:351)
   at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:215)
   at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:171)
   at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:163)
   at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
   at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
   at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
   at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
   at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:299)
   at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:193)
   at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:106)
   at org.apache.maven.cli.MavenCli.execute (MavenCli.java:963)
   at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296)
   at org.apache.maven.cli.MavenCli.main (MavenCli.java:199)
   at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke (Method.java:498)
   at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
(Launcher.java:282)
   at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
(Launcher.java:225)
   at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
(Launcher.java:406)
   at org.codehaus.plexus.classworlds.launcher.Launcher.main 
(Launcher.java:347)
   ```
   
   
https://github.com/apache/hudi/actions/runs/5556435429/jobs/10148956808?pr=9133
   
   Bug:
   https://issues.apache.org/jira/browse/MCHECKSTYLE-347
   
   Running the code with checkstyle:3.1.0 will throw the correct checkstyle 
error:
   
   ```log
   Running the code in the same state iwith checkstyle:3.1.0 will throw the 
error below (expected):
   
   final CastMapConverter[] converters = IntStream.
   range(0, fromChildren.size())
   .mapToObj(i -> {
 LogicalType fromChild = fromChildren.get(i);
 LogicalType toChild = toChildren.get(i);
 if (isPrimitiveTypeRootEqual(fromChild.getTypeRoot(), 
toChild.getTypeRoot())) {
   return createNoOpConverter();
   ...
   
   [ERROR] 
src/main/java/org/apache/hudi/table/format/CastMapConverters.java:[315,52] 
(extension) SeparatorWrapDot: '.' should be on a new line.
   ```
   
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   None
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific com

[jira] [Created] (HUDI-6537) Bump checkstyle version

2023-07-14 Thread voon (Jira)
voon created HUDI-6537:
--

 Summary: Bump checkstyle version
 Key: HUDI-6537
 URL: https://issues.apache.org/jira/browse/HUDI-6537
 Project: Apache Hudi
  Issue Type: Bug
Reporter: voon


Encountered an ambiguous checkstyle error here:
{code:java}
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1
1058at java.lang.String.substring (String.java:1967)
1059at org.apache.maven.plugins.checkstyle.RuleUtil.getCategory 
(RuleUtil.java:95)
1060at 
org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.countViolations
 (CheckstyleViolationCheckMojo.java:646)
1061at 
org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.execute 
(CheckstyleViolationCheckMojo.java:564)
1062at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
(DefaultBuildPluginManager.java:137)
1063at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 
(MojoExecutor.java:370)
1064at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute 
(MojoExecutor.java:351)
1065at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:215)
1066at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:171)
1067at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:163)
1068at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
1069at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
1070at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
1071at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
1072at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:299)
1073at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:193)
1074at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:106)
1075at org.apache.maven.cli.MavenCli.execute (MavenCli.java:963)
1076at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296)
1077at org.apache.maven.cli.MavenCli.main (MavenCli.java:199)
1078at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
1079at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
1080at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
1081at java.lang.reflect.Method.invoke (Method.java:498)
1082at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
(Launcher.java:282)
1083at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
(Launcher.java:225)
1084at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
(Launcher.java:406)
1085at org.codehaus.plexus.classworlds.launcher.Launcher.main 
(Launcher.java:347)
1086 {code}
[https://github.com/apache/hudi/actions/runs/5556435429/jobs/10148956808?pr=9133]

 

Running the code in the same state iwith checkstyle:3.1.0 will throw the error 
below (expected):
{code:java}
final CastMapConverter[] converters = IntStream.
range(0, fromChildren.size())
.mapToObj(i -> {
  LogicalType fromChild = fromChildren.get(i);
  LogicalType toChild = toChildren.get(i);
  if (isPrimitiveTypeRootEqual(fromChild.getTypeRoot(), 
toChild.getTypeRoot())) {
return createNoOpConverter();
...

[ERROR] 
src/main/java/org/apache/hudi/table/format/CastMapConverters.java:[315,52] 
(extension) SeparatorWrapDot: '.' should be on a new line.
 {code}
Bug describing this issue:

https://issues.apache.org/jira/browse/MCHECKSTYLE-347



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6537) Bump checkstyle version

2023-07-14 Thread voon (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

voon reassigned HUDI-6537:
--

Assignee: voon

> Bump checkstyle version
> ---
>
> Key: HUDI-6537
> URL: https://issues.apache.org/jira/browse/HUDI-6537
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>
> Encountered an ambiguous checkstyle error here:
> {code:java}
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
> 1058at java.lang.String.substring (String.java:1967)
> 1059at org.apache.maven.plugins.checkstyle.RuleUtil.getCategory 
> (RuleUtil.java:95)
> 1060at 
> org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.countViolations
>  (CheckstyleViolationCheckMojo.java:646)
> 1061at 
> org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.execute 
> (CheckstyleViolationCheckMojo.java:564)
> 1062at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
> (DefaultBuildPluginManager.java:137)
> 1063at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 
> (MojoExecutor.java:370)
> 1064at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute 
> (MojoExecutor.java:351)
> 1065at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:215)
> 1066at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:171)
> 1067at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:163)
> 1068at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:117)
> 1069at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:81)
> 1070at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:56)
> 1071at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:128)
> 1072at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:299)
> 1073at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:193)
> 1074at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:106)
> 1075at org.apache.maven.cli.MavenCli.execute (MavenCli.java:963)
> 1076at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296)
> 1077at org.apache.maven.cli.MavenCli.main (MavenCli.java:199)
> 1078at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> 1079at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> 1080at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> 1081at java.lang.reflect.Method.invoke (Method.java:498)
> 1082at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:282)
> 1083at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:225)
> 1084at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:406)
> 1085at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:347)
> 1086 {code}
> [https://github.com/apache/hudi/actions/runs/5556435429/jobs/10148956808?pr=9133]
>  
> Running the code in the same state iwith checkstyle:3.1.0 will throw the 
> error below (expected):
> {code:java}
> final CastMapConverter[] converters = IntStream.
> range(0, fromChildren.size())
> .mapToObj(i -> {
>   LogicalType fromChild = fromChildren.get(i);
>   LogicalType toChild = toChildren.get(i);
>   if (isPrimitiveTypeRootEqual(fromChild.getTypeRoot(), 
> toChild.getTypeRoot())) {
> return createNoOpConverter();
> ...
> [ERROR] 
> src/main/java/org/apache/hudi/table/format/CastMapConverters.java:[315,52] 
> (extension) SeparatorWrapDot: '.' should be on a new line.
>  {code}
> Bug describing this issue:
> https://issues.apache.org/jira/browse/MCHECKSTYLE-347



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] voonhous commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…

2023-07-14 Thread via GitHub


voonhous commented on PR #9133:
URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636170813

   We might need to update the checkstyle plugin from 3.0.0 to 3.1.0 due to 
this bug:
   
   https://issues.apache.org/jira/browse/MCHECKSTYLE-347
   
   I will submit a PR for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9133:
URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636164814

   
   ## CI report:
   
   * 07deb3c1400d4fc530e434f6f9b74cb7640c7e47 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18403)
 
   * 76ba7ad679da5e445d7503a070f00dfb1814b1e4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18587)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9133:
URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636155755

   
   ## CI report:
   
   * 07deb3c1400d4fc530e434f6f9b74cb7640c7e47 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18403)
 
   * 76ba7ad679da5e445d7503a070f00dfb1814b1e4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on a diff in pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…

2023-07-14 Thread via GitHub


voonhous commented on code in PR #9133:
URL: https://github.com/apache/hudi/pull/9133#discussion_r1263947981


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java:
##
@@ -165,21 +192,132 @@ void add(int pos, LogicalType fromType, LogicalType 
toType) {
 }
 break;
   }
+  case ARRAY: {
+if (from == ARRAY) {
+  LogicalType fromElementType =  fromType.getChildren().get(0);
+  LogicalType toElementType = toType.getChildren().get(0);
+  return array -> doArrayConversion((ArrayData) array, 
fromElementType, toElementType);
+}
+break;
+  }
+  case MAP: {
+if (from == MAP) {
+  return map -> doMapConversion((MapData) map, fromType, toType);
+}
+break;
+  }
+  case ROW: {
+if (from == ROW) {
+  // Assumption: InternalSchemaManager should produce a cast that is 
of the same size
+  return row -> doRowConversion((RowData) row, fromType, toType);
+}
+break;
+  }
   default:
 }
-return null;
+throw new IllegalArgumentException(String.format("Unsupported conversion 
for %s => %s", fromType, toType));
   }
 
-  private void add(int pos, Cast cast) {
-castMap.put(pos, cast);
+  /**
+   * Helper function to perform convert an arrayData from one LogicalType to 
another.
+   *
+   * @param arrayNon-null array data to be converted; however 
array-elements are allowed to be null
+   * @param fromType The input LogicalType of the row data to be converted from
+   * @param toType   The output LogicalType of the row data to be converted to
+   * @return Converted array that has the structure/specifications of that 
defined by the output LogicalType
+   */
+  private static ArrayData doArrayConversion(@Nonnull ArrayData array, 
LogicalType fromType, LogicalType toType) {
+// using Object type here as primitives are not allowed to be null
+Object[] objects = new Object[array.size()];
+for (int i = 0; i < array.size(); i++) {
+  Object fromObject = 
ArrayData.createElementGetter(fromType).getElementOrNull(array, i);
+  // need to handle nulls to prevent NullPointerException in 
#getConversion()
+  Object toObject = fromObject != null ? getConversion(fromType, 
toType).apply(fromObject) : null;
+  objects[i] = toObject;
+}
+return new GenericArrayData(objects);
+  }
+
+  /**
+   * Helper function to perform convert a MapData from one LogicalType to 
another.
+   *
+   * @param map  Non-null map data to be converted; however, values are 
allowed to be null
+   * @param fromType The input LogicalType of the row data to be converted from
+   * @param toType   The output LogicalType of the row data to be converted to
+   * @return Converted map that has the structure/specifications of that 
defined by the output LogicalType
+   */
+  private static MapData doMapConversion(@Nonnull MapData map, LogicalType 
fromType, LogicalType toType) {
+// no schema evolution is allowed on the keyType, hence, we only need to 
care about the valueType
+LogicalType fromValueType = fromType.getChildren().get(1);
+LogicalType toValueType = toType.getChildren().get(1);
+LogicalType keyType = fromType.getChildren().get(0);
+
+final Map result = new HashMap<>();
+for (int i = 0; i < map.size(); i++) {
+  Object keyObject = 
ArrayData.createElementGetter(keyType).getElementOrNull(map.keyArray(), i);
+  Object fromObject = 
ArrayData.createElementGetter(fromValueType).getElementOrNull(map.valueArray(), 
i);
+  // need to handle nulls to prevent NullPointerException in 
#getConversion()
+  Object toObject = fromObject != null ? getConversion(fromValueType, 
toValueType).apply(fromObject) : null;
+  result.put(keyObject, toObject);
+}
+return new GenericMapData(result);
+  }
+
+  /**
+   * Helper function to perform convert a RowData from one LogicalType to 
another.
+   *
+   * @param row  Non-null row data to be converted; however, fields might 
contain nulls
+   * @param fromType The input LogicalType of the row data to be converted from
+   * @param toType   The output LogicalType of the row data to be converted to
+   * @return Converted row that has the structure/specifications of that 
defined by the output LogicalType
+   */
+  private static RowData doRowConversion(@Nonnull RowData row, LogicalType 
fromType, LogicalType toType) {
+// note: InternalSchema.merge guarantees that the schema to be read 
fromType is orientated in the same order as toType
+// hence, we can match types by position as it is guaranteed that it is 
referencing the same field
+List fromChildren = fromType.getChildren();
+List toChildren = toType.getChildren();
+ValidationUtils.checkArgument(fromChildren.size() == toChildren.size(),
+"fromType [" + fromType + "] size: != toType [" + toType + "] size");
+
+GenericRow

[jira] [Updated] (HUDI-6536) Mention table version change in 0.11.x release notes

2023-07-14 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6536:

Fix Version/s: 0.14.0

> Mention table version change in 0.11.x release notes
> 
>
> Key: HUDI-6536
> URL: https://issues.apache.org/jira/browse/HUDI-6536
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6536) Mention table version change in 0.11.x release notes

2023-07-14 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6536:
---

 Summary: Mention table version change in 0.11.x release notes
 Key: HUDI-6536
 URL: https://issues.apache.org/jira/browse/HUDI-6536
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6536) Mention table version change in 0.11.x release notes

2023-07-14 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-6536:
---

Assignee: Ethan Guo

> Mention table version change in 0.11.x release notes
> 
>
> Key: HUDI-6536
> URL: https://issues.apache.org/jira/browse/HUDI-6536
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on a diff in pull request #9198: [HUDI-6532] Fix a typo in BaseFlinkCommitActionExecutor.

2023-07-14 Thread via GitHub


yihua commented on code in PR #9198:
URL: https://github.com/apache/hudi/pull/9198#discussion_r1263920351


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/BaseFlinkCommitActionExecutor.java:
##
@@ -194,7 +194,7 @@ protected Iterator> handleUpsertPartition(
 }
   }
 } catch (Throwable t) {
-  String msg = "Error upsetting bucketType " + bucketType + " for 
partition :" + partitionPath;
+  String msg = "Error setting up bucketType " + bucketType + " for 
partition :" + partitionPath;

Review Comment:
   This should be `upserting`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636084210

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6530] Applying schema during ingestion using a schema provider for s3/gcs metadata job (#9191)

2023-07-14 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8f7877f2855 [HUDI-6530] Applying schema during ingestion using a 
schema provider for s3/gcs metadata job (#9191)
8f7877f2855 is described below

commit 8f7877f28559f49b90225a279d5a7ad50c689c0b
Author: lokesh-lingarajan-0310 
<84048984+lokesh-lingarajan-0...@users.noreply.github.com>
AuthorDate: Fri Jul 14 08:39:36 2023 -0700

[HUDI-6530] Applying schema during ingestion using a schema provider for 
s3/gcs metadata job (#9191)

Co-authored-by: Lokesh Lingarajan 

---
 .../org/apache/hudi/utilities/UtilHelpers.java |   8 +
 .../hudi/utilities/sources/GcsEventsSource.java|  11 +-
 .../hudi/utilities/sources/S3EventsSource.java |  17 +-
 .../utilities/sources/TestGcsEventsSource.java |  42 -
 .../hudi/utilities/sources/TestS3EventsSource.java |   4 +-
 .../resources/streamer-config/gcs-metadata.avsc|  60 ---
 .../resources/streamer-config/s3-metadata.avsc | 188 +
 7 files changed, 299 insertions(+), 31 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
index a0d241752c5..35a5c9fcb47 100644
--- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
+++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
@@ -60,6 +60,7 @@ import org.apache.hudi.utilities.schema.SchemaProvider;
 import org.apache.hudi.utilities.schema.SchemaProviderWithPostProcessor;
 import org.apache.hudi.utilities.schema.SparkAvroPostProcessor;
 import 
org.apache.hudi.utilities.schema.postprocessor.ChainedSchemaPostProcessor;
+import org.apache.hudi.utilities.sources.InputBatch;
 import org.apache.hudi.utilities.sources.Source;
 import 
org.apache.hudi.utilities.sources.processor.ChainedJsonKafkaSourcePostProcessor;
 import 
org.apache.hudi.utilities.sources.processor.JsonKafkaSourcePostProcessor;
@@ -193,6 +194,13 @@ public class UtilHelpers {
 
   }
 
+  public static StructType getSourceSchema(SchemaProvider schemaProvider) {
+if (schemaProvider != null && schemaProvider.getSourceSchema() != null && 
schemaProvider.getSourceSchema() != InputBatch.NULL_SCHEMA) {
+  return 
AvroConversionUtils.convertAvroSchemaToStructType(schemaProvider.getSourceSchema());
+}
+return null;
+  }
+
   public static Option createTransformer(Option> 
classNamesOpt, Option sourceSchema,
   boolean 
isErrorTableWriterEnabled) throws IOException {
 
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsSource.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsSource.java
index dfc9b5b2b2e..89ce7eddf54 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsSource.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsSource.java
@@ -22,6 +22,7 @@ import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.UtilHelpers;
 import org.apache.hudi.utilities.exception.HoodieReadFromSourceException;
 import org.apache.hudi.utilities.schema.SchemaProvider;
 import org.apache.hudi.utilities.sources.helpers.gcs.MessageBatch;
@@ -35,6 +36,7 @@ import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Encoders;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.types.StructType;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -96,6 +98,7 @@ 
absolute_path_to/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar \
 public class GcsEventsSource extends RowSource {
 
   private final PubsubMessagesFetcher pubsubMessagesFetcher;
+  private final SchemaProvider schemaProvider;
   private final boolean ackMessages;
 
   private final List messagesToAck = new ArrayList<>();
@@ -121,6 +124,7 @@ public class GcsEventsSource extends RowSource {
 
 this.pubsubMessagesFetcher = pubsubMessagesFetcher;
 this.ackMessages = props.getBoolean(ACK_MESSAGES.key(), 
ACK_MESSAGES.defaultValue());
+this.schemaProvider = schemaProvider;
 
 LOG.info("Created GcsEventsSource");
   }
@@ -146,7 +150,12 @@ public class GcsEventsSource extends RowSource {
 
 LOG.info("Returning checkpoint value: " + CHECKPOINT_VALUE_ZERO);
 
-return Pair.of(Option.of(sparkSession.read().json(eventRecords)), 
CHECKPOINT_VALUE_ZERO);
+StructType sourceSchema = UtilHelpers.getSourceSchema(schemaProvider);
+if (sourceSchema != null) {
+  return 
Pair.of(Option.of(sparkSession.read().schema(

[GitHub] [hudi] nsivabalan commented on pull request #9191: [HUDI-6530] Applying schema during ingestion using a schema provider for s3/gcs metadata job

2023-07-14 Thread via GitHub


nsivabalan commented on PR #9191:
URL: https://github.com/apache/hudi/pull/9191#issuecomment-1636038995

   CI failed due to a flaky test. going ahead w/ landing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #9191: [HUDI-6530] Applying schema during ingestion using a schema provider for s3/gcs metadata job

2023-07-14 Thread via GitHub


nsivabalan merged PR #9191:
URL: https://github.com/apache/hudi/pull/9191


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6535) Need a way to be able to schedule cleaner service inline and execute as a offline job

2023-07-14 Thread Amaresh Bingumalla (Jira)
Amaresh Bingumalla created HUDI-6535:


 Summary: Need a way to be able to schedule cleaner service inline 
and execute as a offline job
 Key: HUDI-6535
 URL: https://issues.apache.org/jira/browse/HUDI-6535
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Amaresh Bingumalla


With the current hudi version 0.13.1 there is no way to schedule cleaner 
service as part of the writer job. Only possible options are execute inline or 
scheduleAndExecute offline jobs. 
Having an inline schedule only option similar to compaction jobs will be 
helpful to see when the cleaner services are required. 

 

Related compactor code - 
[https://github.com/apache/hudi/blob/51ddf1affcdead2e3b5e871ba4816c71e6f4b99a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java#L194]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1635928412

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18233)
 
   * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9198: [HUDI-6532] Fix a typo in BaseFlinkCommitActionExecutor.

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9198:
URL: https://github.com/apache/hudi/pull/9198#issuecomment-1635872857

   
   ## CI report:
   
   * db352e825762702d4989dabd66472029303d5026 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18584)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1635861329

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18233)
 
   * 26b3151e371774b3e99324bd9c305157fcde5789 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9197: [HUDI-6531] Little adjust to avoid creating an object but no need in one case

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9197:
URL: https://github.com/apache/hudi/pull/9197#issuecomment-1635850226

   
   ## CI report:
   
   * 95b59f74ed1b5608e71bb03c0933bcc239e6d497 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18583)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9198: [HUDI-6532] Fix a typo in BaseFlinkCommitActionExecutor.

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9198:
URL: https://github.com/apache/hudi/pull/9198#issuecomment-1635709474

   
   ## CI report:
   
   * db352e825762702d4989dabd66472029303d5026 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18584)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6533) Glue Catalog Sync not working with 0.12.3.

2023-07-14 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-6533:

Priority: Blocker  (was: Critical)

> Glue Catalog Sync not working with 0.12.3.
> --
>
> Key: HUDI-6533
> URL: https://issues.apache.org/jira/browse/HUDI-6533
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: meta-sync
>Reporter: Aditya Goenka
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Glue Catalog sync is broken with minor versions - 0.12.3 and 0.13.1
> Also not working with master. 
> Github Issue - [https://github.com/apache/hudi/issues/9134]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6534) Spark Consistent Hashing row writer support

2023-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6534:
-
Labels: pull-request-available  (was: )

> Spark Consistent Hashing row writer support
> ---
>
> Key: HUDI-6534
> URL: https://issues.apache.org/jira/browse/HUDI-6534
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: index, spark, writer-core
>Reporter: Qijun Fu
>Priority: Major
>  Labels: pull-request-available
>
> Spark Consistent Hashing row writer support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] stream2000 opened a new pull request, #9199: [HUDI-6534]Support consistent hashing row write

2023-07-14 Thread via GitHub


stream2000 opened a new pull request, #9199:
URL: https://github.com/apache/hudi/pull/9199

   ### Change Logs
   
   Support consistent hashing row writer 
   ### Impact
   
   Support consistent hashing row writer
   
   ### Risk level (write none, low medium or high below)
   
   medium, will enabled by default since row writer is enabled by default
   
   ### Documentation Update
   
   will update document after landing
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6534) Spark Consistent Hashing row writer support

2023-07-14 Thread Qijun Fu (Jira)
Qijun Fu created HUDI-6534:
--

 Summary: Spark Consistent Hashing row writer support
 Key: HUDI-6534
 URL: https://issues.apache.org/jira/browse/HUDI-6534
 Project: Apache Hudi
  Issue Type: New Feature
  Components: index, spark, writer-core
Reporter: Qijun Fu


Spark Consistent Hashing row writer support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] ad1happy2go commented on issue #9134: [SUPPORT] Failed to sync hive metastore with Hudi 0.12.3 and AWS Glue 4.0 (Spark 3.3)

2023-07-14 Thread via GitHub


ad1happy2go commented on issue #9134:
URL: https://github.com/apache/hudi/issues/9134#issuecomment-1635689522

   @xmubeta Able to reproduce this issue, Looks like a regression for 0.12.3 
and 0.13.1.
   
   Created a critical JIRA to fix it - 
https://issues.apache.org/jira/browse/HUDI-6533


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6533) Glue Catalog Sync not working with 0.12.3.

2023-07-14 Thread Aditya Goenka (Jira)
Aditya Goenka created HUDI-6533:
---

 Summary: Glue Catalog Sync not working with 0.12.3.
 Key: HUDI-6533
 URL: https://issues.apache.org/jira/browse/HUDI-6533
 Project: Apache Hudi
  Issue Type: Bug
  Components: meta-sync
Reporter: Aditya Goenka
 Fix For: 0.14.0


Glue Catalog sync is broken with minor versions - 0.12.3 and 0.13.1

Also not working with master. 

Github Issue - [https://github.com/apache/hudi/issues/9134]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9198: [HUDI-6532] Fix a typo in BaseFlinkCommitActionExecutor.

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9198:
URL: https://github.com/apache/hudi/pull/9198#issuecomment-1635658437

   
   ## CI report:
   
   * db352e825762702d4989dabd66472029303d5026 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6532) Fix a typo in BaseFlinkCommitActionExecutor.

2023-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6532:
-
Labels: pull-request-available  (was: )

> Fix a typo in BaseFlinkCommitActionExecutor.
> 
>
> Key: HUDI-6532
> URL: https://issues.apache.org/jira/browse/HUDI-6532
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: StarBoy1005
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-07-14-18-06-04-273.png
>
>
> Here is creating an Iterator object, I guess the word "upsetting" in 
> exception is kind of misleading. 
>  !image-2023-07-14-18-06-04-273.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9197: [HUDI-6531] Little adjust to avoid creating an object but no need in one case

2023-07-14 Thread via GitHub


hudi-bot commented on PR #9197:
URL: https://github.com/apache/hudi/pull/9197#issuecomment-1635645105

   
   ## CI report:
   
   * 95b59f74ed1b5608e71bb03c0933bcc239e6d497 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18583)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >