[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql
hudi-bot commented on PR #9123: URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636682278 ## CI report: * 92431edca469088ced64b1d92c7bbdc2e44d47a1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18596) * 05554e2e034b32dd56599e1f62408c123888d9cb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18609) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] amrishlal opened a new pull request, #9203: [HUDI-6315] [WIP] Feature flag for disabling prepped merge.
amrishlal opened a new pull request, #9203: URL: https://github.com/apache/hudi/pull/9203 ### Change Logs Add user-defined feature flag for disabling prepped merge. ### Impact New feature flag `ENABLE_OPTIMIZED_MERGE_WRITES` ### Risk level (write none, low medium or high below) Low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [X] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [X] Change Logs and Impact were stated clearly - [X] Adequate tests were added if applicable - [X] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636680266 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 3e318aa173aaa30e984554380d41c7706b1e7061 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18608) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql
hudi-bot commented on PR #9123: URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636680322 ## CI report: * 92431edca469088ced64b1d92c7bbdc2e44d47a1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18596) * 05554e2e034b32dd56599e1f62408c123888d9cb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup
xushiyan commented on code in PR #9188: URL: https://github.com/apache/hudi/pull/9188#discussion_r1264317406 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java: ## @@ -103,11 +102,6 @@ record -> new ImmutablePair<>(record.getPartitionPath(), record.getRecordKey())) // Step 3: Tag the incoming records, as inserts or updates, by joining with existing record keys HoodieData> taggedRecords = tagLocationBacktoRecords(keyFilenamePairs, records, hoodieTable); -if (config.getBloomIndexUseCaching()) { Review Comment: the main purpose of this PR is about fixing premature un-persisting like this example here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup
xushiyan commented on code in PR #9188: URL: https://github.com/apache/hudi/pull/9188#discussion_r1264317245 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java: ## @@ -75,11 +74,11 @@ public HoodieBloomIndex(HoodieWriteConfig config, BaseHoodieBloomIndexHelper blo @Override public HoodieData> tagLocation( HoodieData> records, HoodieEngineContext context, - HoodieTable hoodieTable) { + HoodieTable hoodieTable, Option instantTime) { // Step 0: cache the input records if needed -if (config.getBloomIndexUseCaching()) { - records.persist(new HoodieConfig(config.getProps()) - .getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE)); +if (config.getBloomIndexUseCaching() && instantTime.isPresent()) { + String storageLevel = config.getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE); Review Comment: sounds good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup
xushiyan commented on code in PR #9188: URL: https://github.com/apache/hudi/pull/9188#discussion_r1264317221 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -80,7 +81,7 @@ public O updateLocation(O writeStatuses, HoodieEngineContext context, @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) public abstract HoodieData> tagLocation( HoodieData> records, HoodieEngineContext context, - HoodieTable hoodieTable) throws HoodieIndexException; + HoodieTable hoodieTable, Option instantTime) throws HoodieIndexException; Review Comment: the api is marked as "Evolving" so changes are expected in major release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636653219 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * de9fca47509129e13c9b3a422261e8c55978faa0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18607) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18606) * 3e318aa173aaa30e984554380d41c7706b1e7061 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18608) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636641347 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * de9fca47509129e13c9b3a422261e8c55978faa0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18607) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18606) * 3e318aa173aaa30e984554380d41c7706b1e7061 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
nsivabalan commented on code in PR #9007: URL: https://github.com/apache/hudi/pull/9007#discussion_r1264287885 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java: ## @@ -170,7 +170,7 @@ public interface HoodieTimeline extends Serializable { * * @return */ - HoodieTimeline filterCompletedInstantsOrRewriteTimeline(); + HoodieTimeline filterCompletedAndRewriteInstants(); Review Comment: we should name it as "filterCompletedRewriteInstants" rewrite refers to commit, delta commits and replace commits. completed refers to state. ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java: ## @@ -53,6 +53,7 @@ public static Comparator getReverseCommitTimeComparator() { /** * Timeline, based on which all getter work. + * This should be a write timeline that contains either completed instants or pending compaction instants. Review Comment: can we rename the variable then. completedWriteAndCompactionTimeline ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java: ## @@ -146,17 +189,22 @@ public static class TimelineDiffResult { private final List newlySeenInstants; private final List finishedCompactionInstants; -private final List finishedOrRemovedLogCompactionInstants; +// Completed instants will have true as the value where as instants removed due to rollback will have false as value. +private final List> finishedOrRemovedLogCompactionInstants; +// Completed instants will have true as the value where as instants removed due to rollback will have false as value. +private final List> finishedOrRemovedReplaceCommitInstants; private final boolean canSyncIncrementally; public static final TimelineDiffResult UNSAFE_SYNC_RESULT = -new TimelineDiffResult(null, null, null, false); +new TimelineDiffResult(null, null, null, null, false); public TimelineDiffResult(List newlySeenInstants, List finishedCompactionInstants, - List finishedOrRemovedLogCompactionInstants, boolean canSyncIncrementally) { + List> finishedOrRemovedLogCompactionInstants, + List> finishedOrRemovedReplaceCommitInstants, boolean canSyncIncrementally) { this.newlySeenInstants = newlySeenInstants; this.finishedCompactionInstants = finishedCompactionInstants; this.finishedOrRemovedLogCompactionInstants = finishedOrRemovedLogCompactionInstants; + this.finishedOrRemovedReplaceCommitInstants = finishedOrRemovedReplaceCommitInstants; Review Comment: if your ans to my previous comment is, only clustering, should we rename all these variables accordingly. it might confuse down the line. ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java: ## @@ -74,19 +74,62 @@ public static TimelineDiffResult getNewInstantsForIncrementalSync(HoodieTimeline newTimeline.getInstantsAsStream().filter(instant -> !oldTimelineInstants.contains(instant)).forEach(newInstants::add); + // Check for log compaction commits completed or removed. List> logCompactionInstants = getPendingLogCompactionTransitions(oldTimeline, newTimeline); - List finishedOrRemovedLogCompactionInstants = logCompactionInstants.stream() + List> finishedOrRemovedLogCompactionInstants = logCompactionInstants.stream() .filter(instantPair -> !instantPair.getKey().isCompleted() && (instantPair.getValue() == null || instantPair.getValue().isCompleted())) - .map(Pair::getKey).collect(Collectors.toList()); - return new TimelineDiffResult(newInstants, finishedCompactionInstants, finishedOrRemovedLogCompactionInstants, true); + .map(instantPair -> (instantPair.getValue() == null) + ? Pair.of(instantPair.getKey(), false) : Pair.of(instantPair.getKey(), true)) + .collect(Collectors.toList()); + + // Check for replace commits completed or removed. + List> replaceCommitInstants = getPendingReplaceCommitTransitions(oldTimeline, newTimeline); + List> finishedOrRemovedReplaceCommitInstants = replaceCommitInstants.stream() + .filter(instantPair -> !instantPair.getKey().isCompleted() + && (instantPair.getValue() == null || instantPair.getValue().isCompleted())) + .map(instantPair -> (instantPair.getValue() == null) + ? Pair.of(instantPair.getKey(), false) : Pair.of(instantPair.getKey(), true)) + .collect(Collectors.toList()); + + // New instants will contains instants that are newly completed commits or newly created pending rewrite commits + // (i.e. compaction, logcompaciton, replacecommit) + // Finished or removed rewrite commits are handled independently. + return new TimelineDiffRe
[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0
hudi-bot commented on PR #9200: URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636640109 ## CI report: * 488f2a98894d13f55ff5f233fe47fa99e2bf420c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18591) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18599) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636639990 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * ea6504e78fbb1c776687d3632c5875e74070cebd Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18604) * de9fca47509129e13c9b3a422261e8c55978faa0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18606) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18607) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] suryaprasanna commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
suryaprasanna commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636636242 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636630900 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * ea6504e78fbb1c776687d3632c5875e74070cebd Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18604) * de9fca47509129e13c9b3a422261e8c55978faa0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18606) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636629167 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592) * ea6504e78fbb1c776687d3632c5875e74070cebd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18604) * de9fca47509129e13c9b3a422261e8c55978faa0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636627466 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 2ae49bedda144e147341bbed7876a45f1d940ad6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18603) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6530] Fixing the correct resource path (#9202)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 37d3d8ef504 [HUDI-6530] Fixing the correct resource path (#9202) 37d3d8ef504 is described below commit 37d3d8ef504794d64fb87c838bf58bafa8acaa16 Author: lokesh-lingarajan-0310 <84048984+lokesh-lingarajan-0...@users.noreply.github.com> AuthorDate: Fri Jul 14 18:58:51 2023 -0700 [HUDI-6530] Fixing the correct resource path (#9202) Co-authored-by: Lokesh Lingarajan --- .../java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java | 2 +- .../test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java index 653cb823233..83108ee0c7e 100644 --- a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java +++ b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestGcsEventsSource.java @@ -63,7 +63,7 @@ public class TestGcsEventsSource extends UtilitiesTestBase { @BeforeEach public void beforeEach() throws Exception { -schemaProvider = new FilebasedSchemaProvider(Helpers.setupSchemaOnDFS("delta-streamer-config", "gcs-metadata.avsc"), jsc); +schemaProvider = new FilebasedSchemaProvider(Helpers.setupSchemaOnDFS("streamer-config", "gcs-metadata.avsc"), jsc); MockitoAnnotations.initMocks(this); props = new TypedProperties(); diff --git a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java index 4db47c76784..5ed332a142d 100644 --- a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java +++ b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestS3EventsSource.java @@ -51,7 +51,7 @@ public class TestS3EventsSource extends AbstractCloudObjectsSourceTestBase { this.dfsRoot = basePath + "/parquetFiles"; this.fileSuffix = ".parquet"; fs.mkdirs(new Path(dfsRoot)); -schemaProvider = new FilebasedSchemaProvider(Helpers.setupSchemaOnDFS("delta-streamer-config", "s3-metadata.avsc"), jsc); +schemaProvider = new FilebasedSchemaProvider(Helpers.setupSchemaOnDFS("streamer-config", "s3-metadata.avsc"), jsc); } @AfterEach
[GitHub] [hudi] nsivabalan merged pull request #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path
nsivabalan merged PR #9202: URL: https://github.com/apache/hudi/pull/9202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636613200 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * f4b2acb51670eebaff53504cc87ee9ebbd214360 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18602) * 2ae49bedda144e147341bbed7876a45f1d940ad6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18603) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636611094 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592) * ea6504e78fbb1c776687d3632c5875e74070cebd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18604) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636611049 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * f4b2acb51670eebaff53504cc87ee9ebbd214360 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18602) * 2ae49bedda144e147341bbed7876a45f1d940ad6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path
hudi-bot commented on PR #9202: URL: https://github.com/apache/hudi/pull/9202#issuecomment-1636607988 ## CI report: * 531177f0d624aed45a00fb6c1778daa867b90fdb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18597) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql
hudi-bot commented on PR #9123: URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636607910 ## CI report: * 92431edca469088ced64b1d92c7bbdc2e44d47a1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18596) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636607791 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * f4b2acb51670eebaff53504cc87ee9ebbd214360 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18602) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636586011 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592) * ea6504e78fbb1c776687d3632c5875e74070cebd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636585956 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18598) * e99d8065fc69b2cc354ba1688a2991a5e927eb48 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18600) * f4b2acb51670eebaff53504cc87ee9ebbd214360 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636581729 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18598) * e99d8065fc69b2cc354ba1688a2991a5e927eb48 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636578463 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18598) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636556240 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 4bea1208f1ba87ad8dd35e0ef55501cd4ffcee11 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18595) * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18598) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 commented on issue #9183: [SUPPORT] Glue 4.0 Hudi 0.12.1 PreCommit validator i.e SqlQueryEqualityPreCommitValidator is not working
soumilshah1995 commented on issue #9183: URL: https://github.com/apache/hudi/issues/9183#issuecomment-1636551169 Here is Video Tutorials https://www.youtube.com/watch?v=KNzs9dj_Btc&t=73s # Tested ``` try: from pyspark.sql import SparkSession import os import sys import uuid from datetime import datetime from faker import Faker except Exception as e: print("Error: ", e) hudi_version = '0.13.1' jar_file = 'hudi-spark3.3-bundle_2.12-0.14.0-SNAPSHOT.jar' os.environ['PYSPARK_SUBMIT_ARGS'] = f"--jars {jar_file} pyspark-shell" os.environ['PYSPARK_PYTHON'] = sys.executable os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable spark = SparkSession.builder \ .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \ .config('spark.jars', jar_file) \ .config('spark.sql.hive.convertMetastoreParquet', 'false') \ .getOrCreate() db_name = "hudidb" table_name = "pre_commit_validator" recordkey = 'uuid' precombine = 'precomb' method = 'upsert' table_type = "COPY_ON_WRITE" validator_query = """SELECT COUNT(*) FROM WHERE message IS NULL;""" path = f"file:///C:/tmp/{db_name}/{table_name}" hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': recordkey, 'hoodie.datasource.write.table.name': table_name, 'hoodie.datasource.write.operation': method, 'hoodie.datasource.write.precombine.field': precombine, 'hoodie.upsert.shuffle.parallelism': 2, 'hoodie.insert.shuffle.parallelism': 2, "hoodie.precommit.validators": "org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator", "hoodie.precommit.validators.equality.sql.queries": validator_query } spark_df = spark.createDataFrame(data=[ (1, "This is APPEND 1", 111, "1"), (2, "This is APPEND 2", 222, "2"), ], schema=["uuid", "message", "precomb", "partition"]) spark_df.write.format("hudi").options(**hudi_options).mode("append").save(path) spark.read.format("hudi").load(path).createOrReplaceTempView("hudi_snapshots") spark.sql("select * from hudi_snapshots").show(truncate=False) spark_df = spark.createDataFrame( data=[ (4, None, 444, None), (5, "This is APPEND 5", 555, "5"), ], schema=["uuid", "message", "precomb", "partition"]) spark_df.show() spark_df.write.format("hudi").options(**hudi_options).mode("append").save(path) spark.read.format("hudi").load(path).createOrReplaceTempView("hudi_snapshots") spark.sql("select * from hudi_snapshots").show(truncate=False) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636550496 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 4bea1208f1ba87ad8dd35e0ef55501cd4ffcee11 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18595) * 9ab5d549da4eea0808afe9a7830ab2d4e68109ce UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0
hudi-bot commented on PR #9200: URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636545849 ## CI report: * 488f2a98894d13f55ff5f233fe47fa99e2bf420c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18591) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18599) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup
hudi-bot commented on PR #9188: URL: https://github.com/apache/hudi/pull/9188#issuecomment-1636545795 ## CI report: * 4c247d87685c4900a327275aa8e3b5909554ad36 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18565) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636545530 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636545412 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 4bea1208f1ba87ad8dd35e0ef55501cd4ffcee11 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18595) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0
voonhous commented on PR #9200: URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636540345 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0
voonhous commented on PR #9200: URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636540046 @hudu-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path
hudi-bot commented on PR #9202: URL: https://github.com/apache/hudi/pull/9202#issuecomment-1636513847 ## CI report: * 531177f0d624aed45a00fb6c1778daa867b90fdb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18597) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql
hudi-bot commented on PR #9123: URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636513735 ## CI report: * c4b55caaa515af207aa3ba1bef87cb1568d9b38a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18589) * 92431edca469088ced64b1d92c7bbdc2e44d47a1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18596) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636513486 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 7b79a304400af94497a6dd50cb8a3116531504c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18593) * ba905eb083f5de4e2a3055cc0e137ca218ec1e96 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18594) * 4bea1208f1ba87ad8dd35e0ef55501cd4ffcee11 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path
hudi-bot commented on PR #9202: URL: https://github.com/apache/hudi/pull/9202#issuecomment-1636508263 ## CI report: * 531177f0d624aed45a00fb6c1778daa867b90fdb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql
hudi-bot commented on PR #9123: URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636508104 ## CI report: * c4b55caaa515af207aa3ba1bef87cb1568d9b38a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18589) * 92431edca469088ced64b1d92c7bbdc2e44d47a1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636507859 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 7b79a304400af94497a6dd50cb8a3116531504c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18593) * ba905eb083f5de4e2a3055cc0e137ca218ec1e96 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0
hudi-bot commented on PR #9200: URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636503157 ## CI report: * 488f2a98894d13f55ff5f233fe47fa99e2bf420c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18591) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…
hudi-bot commented on PR #9133: URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636503019 ## CI report: * 71d4fd08f41e4aab163a92e82a15e35cf9c79ea0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18590) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636502733 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 7b79a304400af94497a6dd50cb8a3116531504c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18593) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lokesh-lingarajan-0310 opened a new pull request, #9202: [HUDI-6530] Fixing the testcase to reflect correct resource path
lokesh-lingarajan-0310 opened a new pull request, #9202: URL: https://github.com/apache/hudi/pull/9202 ### Change Logs Fixing the correct resource path ### Impact None ### Risk level (write none, low medium or high below) low ### Documentation Update No ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636464975 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 9206f0ec85caee9b9e351820692affa370906291 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18581) * 7b79a304400af94497a6dd50cb8a3116531504c6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18593) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8949: [DNM] Testing Java 17
hudi-bot commented on PR #8949: URL: https://github.com/apache/hudi/pull/8949#issuecomment-1636457298 ## CI report: * 4e99d55baa97cc2fda388c6d6b8246fcffd7e3d6 UNKNOWN * 9206f0ec85caee9b9e351820692affa370906291 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18581) * 7b79a304400af94497a6dd50cb8a3116531504c6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 commented on issue #8400: [SUPPORT] Hudi Offline Compaction in EMR Serverless 6.10 for YouTube Video
soumilshah1995 commented on issue #8400: URL: https://github.com/apache/hudi/issues/8400#issuecomment-1636454180 @AmareshB Sure @AmareshB # Step 1 : Create EMR 6.11 Cluster ![image](https://github.com/apache/hudi/assets/39345855/320ac005-344f-4da6-a02c-1cdad5462226) # Step2 : Create MOR table ``` try: import sys import os from pyspark.context import SparkContext from pyspark.sql.session import SparkSession from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame from pyspark.sql.functions import col, to_timestamp, monotonically_increasing_id, to_date, when from pyspark.sql.functions import * from awsglue.utils import getResolvedOptions from pyspark.sql.types import * from datetime import datetime, date import boto3 from functools import reduce from pyspark.sql import Row import uuid from faker import Faker except Exception as e: print("Modules are missing : {} ".format(e)) spark = (SparkSession.builder.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \ .config('spark.sql.hive.convertMetastoreParquet', 'false') \ .config('spark.sql.catalog.spark_catalog', 'org.apache.spark.sql.hudi.catalog.HoodieCatalog') \ .config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \ .config('spark.sql.legacy.pathOptionBehavior.enabled', 'true').getOrCreate()) sc = spark.sparkContext glueContext = GlueContext(sc) job = Job(glueContext) logger = glueContext.get_logger() # =INSERTING DATA = global faker faker = Faker() class DataGenerator(object): @staticmethod def get_data(): return [ ( x, faker.name(), faker.random_element(elements=('IT', 'HR', 'Sales', 'Marketing')), faker.random_element(elements=('CA', 'NY', 'TX', 'FL', 'IL', 'RJ')), str(faker.random_int(min=1, max=15)), str(faker.random_int(min=18, max=60)), str(faker.random_int(min=0, max=10)), str(faker.unix_time()), faker.email(), faker.credit_card_number(card_type='amex'), ) for x in range(5) ] # == Settings === db_name = "hudidb" table_name = "employees" recordkey = 'emp_id' precombine = "ts" PARTITION_FIELD = 'state' path = "s3://soumilshah-hudi-demos/hudi/" method = 'upsert' table_type = "MERGE_ON_READ" # hudi_part_write_config = { 'className': 'org.apache.hudi', 'hoodie.table.name': table_name, 'hoodie.datasource.write.table.type': table_type, 'hoodie.datasource.write.operation': method, 'hoodie.datasource.write.recordkey.field': recordkey, 'hoodie.datasource.write.precombine.field': precombine, "hoodie.schema.on.read.enable": "true", "hoodie.datasource.write.reconcile.schema": "true", 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.use_jdbc': 'false', 'hoodie.datasource.hive_sync.support_timestamp': 'false', 'hoodie.datasource.hive_sync.database': db_name, 'hoodie.datasource.hive_sync.table': table_name , "hoodie.compact.inline": "false" , 'hoodie.compact.schedule.inline': 'true' , "hoodie.metadata.index.check.timeout.seconds": "60" , "hoodie.write.concurrency.mode": "optimistic_concurrency_control" , "hoodie.write.lock.provider": "org.apache.hudi.client.transaction.lock.InProcessLockProvider" } # """Create Spark Data Frame """ # data = DataGenerator.get_data() columns = ["emp_id", "employee_name", "department", "state", "salary", "age", "bonus", "ts"] df = spark.createDataFrame(data=data, schema=columns) df.write.format("hudi").options(**hudi_part_write_config).mode("overwrite").save(path) # """APPEND """ # impleDataUpd = [ (6, "This is APPEND", "Sales", "RJ", 81000, 30, 23000, 827307999), (7, "This is APPEND", "Engineering", "RJ", 79000, 53, 15000, 1627694678), ] columns = ["emp_id", "employee_name", "department", "state", "salary",
[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql
hudi-bot commented on PR #9123: URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636450633 ## CI report: * c4b55caaa515af207aa3ba1bef87cb1568d9b38a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18589) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup
nsivabalan commented on code in PR #9188: URL: https://github.com/apache/hudi/pull/9188#discussion_r1264147849 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java: ## @@ -103,11 +102,6 @@ record -> new ImmutablePair<>(record.getPartitionPath(), record.getRecordKey())) // Step 3: Tag the incoming records, as inserts or updates, by joining with existing record keys HoodieData> taggedRecords = tagLocationBacktoRecords(keyFilenamePairs, records, hoodieTable); -if (config.getBloomIndexUseCaching()) { Review Comment: I guess this was intentional. After this, taggedRecords is what is getting used. and we do cache that in BaseSparkCommitActionExecutor.execute ``` @Override public HoodieWriteMetadata> execute(HoodieData> inputRecords) { // Cache the tagged records, so we don't end up computing both JavaRDD> inputRDD = HoodieJavaRDD.getJavaRDD(inputRecords); if (inputRDD.getStorageLevel() == StorageLevel.NONE()) { HoodieJavaRDD.of(inputRDD).persist(config.getTaggedRecordStorageLevel(), context, HoodieDataCacheKey.of(config.getBasePath(), instantTime)); } else { LOG.info("RDD PreppedRecords was persisted at: " + inputRDD.getStorageLevel()); } . . ``` So, not sure if we want to keep the persistance until the very end for these rdds which may not be used only. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -80,7 +81,7 @@ public O updateLocation(O writeStatuses, HoodieEngineContext context, @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) public abstract HoodieData> tagLocation( HoodieData> records, HoodieEngineContext context, - HoodieTable hoodieTable) throws HoodieIndexException; + HoodieTable hoodieTable, Option instantTime) throws HoodieIndexException; Review Comment: this is a public api. we might have to deprecate and add a new one if we wish to change the signature ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java: ## @@ -75,11 +74,11 @@ public HoodieBloomIndex(HoodieWriteConfig config, BaseHoodieBloomIndexHelper blo @Override public HoodieData> tagLocation( HoodieData> records, HoodieEngineContext context, - HoodieTable hoodieTable) { + HoodieTable hoodieTable, Option instantTime) { // Step 0: cache the input records if needed -if (config.getBloomIndexUseCaching()) { - records.persist(new HoodieConfig(config.getProps()) - .getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE)); +if (config.getBloomIndexUseCaching() && instantTime.isPresent()) { + String storageLevel = config.getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE); Review Comment: can we move this to constructor and use it everywhere instead of parsing multiple times? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Vsevolod3 opened a new issue, #9201: [SUPPORT] Flink - Async Compaction Not Triggered With time_elapsed as COMPACTION_TRIGGER_STRATEGY
Vsevolod3 opened a new issue, #9201: URL: https://github.com/apache/hudi/issues/9201 I am running a Flink (1.15.2) job in EMR (emr-6.9.0), reading records from Kafka and writing them to S3 using Hudi (1.13.0). The table type is MoR and properties for compaction are COMPACTION_ASYNC_ENABLED = true and COMPACTION_TRIGGER_STRATEGY = time_elapsed . ## To Reproduce Steps to reproduce the behavior: 1. Submit Flink job to EMR cluster (set COMPACTION_ASYNC_ENABLED = true, COMPACTION_TRIGGER_STRATEGY = time_elapsed, and COMPACTION_DELTA_SECONDS = 600) 2. Load data (not exceeding 3 commits per file ID) 3. Wait for > 600 seconds. ### Full list of Hudi properties for reference ```sql 'index.type' = 'FLINK_STATE', 'compaction.schedule.enabled' = 'true', 'hoodie.index.bucket.engine' = 'SIMPLE', 'clustering.plan.strategy.sort.columns' = 'acct_id', 'write.bucket_assign.tasks' = '3', 'compaction.delta_seconds' = '300', 'clustering.delta_commits' = '4', 'clustering.plan.strategy.small.file.limit' = '600', 'compaction.async.enabled' = 'true', 'compaction.max_memory' = '1024', 'hoodie.parquet.max.file.size' = '125829120', 'read.streaming.enabled' = 'false', 'path' = 's3://my_bucket/my_path/account/', 'hoodie.logfile.max.size' = '1073741824', 'hoodie.datasource.write.hive_style_partitioning' = 'true', 'hoodie.parquet.compression.ratio' = '0.1', 'hoodie.parquet.small.file.limit' = '104857600', 'hoodie.bucket.index.hash.field' = 'acct_id', 'compaction.tasks' = '3', 'precombine.field' = 'update_ts', 'write.task.max.size' = '4094', 'hoodie.parquet.compression.codec' = 'snappy', 'compaction.delta_commits' = '3', 'clustering.tasks' = '3', 'compaction.trigger.strategy' = 'time_elapsed', 'hoodie.bucket.index.num.buckets' = '256', 'read.tasks' = '3', 'compaction.timeout.seconds' = '1200', 'clustering.async.enabled' = 'true', 'table.type' = 'MERGE_ON_READ', 'metadata.compaction.delta_commits' = '10', 'clustering.plan.strategy.max.num.groups' = '30', 'write.tasks' = '3', 'clustering.schedule.enabled' = 'false', 'hoodie.logfile.data.block.format' = 'avro', 'write.batch.size' = '4094.0', 'write.sort.memory' = '4094' ``` ## Expected behavior Compaction should be run after about 5 minutes of the job tasks being fully started. ## Environment Description * Hudi version : 0.13.0 * Spark version : N/A (using Flink 1.15.2) * Hive version : tbd * Hadoop version : emr-6.9.0 * Storage (HDFS/S3/GCS..) : s3 * Running on Docker? (yes/no) : no **Stacktrace** No errors are logged in Flink for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup
hudi-bot commented on PR #9188: URL: https://github.com/apache/hudi/pull/9188#issuecomment-1636322662 ## CI report: * 4c247d87685c4900a327275aa8e3b5909554ad36 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18565) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636322264 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 07eb1aa79162259f3ac79e61bec621f68afb5551 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18588) * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18592) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni commented on pull request #8716: [HUDI-6226] Support parquet native bloom filters
parisni commented on PR #8716: URL: https://github.com/apache/hudi/pull/8716#issuecomment-1636309573 @nsivabalan There is existing spark benchmarks here. Basically 20% slower for writes and up to 4x for reads. https://github.com/apache/spark/blob/18d0a276c501a102af3e7ed251831983b9148a4f/sql/core/benchmarks/BloomFilterBenchmark-jdk11-results.txt As for documentation plz consider this pr https://github.com/apache/hudi/pull/9056 On July 14, 2023 6:02:18 PM UTC, Sivabalan Narayanan ***@***.***> wrote: >hey @parisni : good job on the patch. Curious to know if you have any perf nos on this. on both write and read side. whats the perf overhead we are seeing on the write side and how much improvement we are seeing on the read side w/ the bloom filter. > >Also, would you provide a short write up(whats this support is all about, how users can leverage this and whats the benefit) on this that we can use it in our release page? > >-- >Reply to this email directly or view it on GitHub: >https://github.com/apache/hudi/pull/8716#issuecomment-1636201917 >You are receiving this because you were mentioned. > >Message ID: ***@***.***> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9188: [HUDI-6528] Fix premature RDD unpersist during index lookup
hudi-bot commented on PR #9188: URL: https://github.com/apache/hudi/pull/9188#issuecomment-1636308856 ## CI report: * 4c247d87685c4900a327275aa8e3b5909554ad36 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636308312 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585) * 07eb1aa79162259f3ac79e61bec621f68afb5551 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18588) * 578401a45b7c5ffbd9360de3bd3e18c362b4b2b5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0
hudi-bot commented on PR #9200: URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636231942 ## CI report: * 488f2a98894d13f55ff5f233fe47fa99e2bf420c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18591) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…
hudi-bot commented on PR #9133: URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636231720 ## CI report: * 76ba7ad679da5e445d7503a070f00dfb1814b1e4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18587) * 71d4fd08f41e4aab163a92e82a15e35cf9c79ea0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18590) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636231430 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585) * 07eb1aa79162259f3ac79e61bec621f68afb5551 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18588) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9200: [HUDI-6537] Bump checkstyle version to 3.1.0
hudi-bot commented on PR #9200: URL: https://github.com/apache/hudi/pull/9200#issuecomment-1636223402 ## CI report: * 488f2a98894d13f55ff5f233fe47fa99e2bf420c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…
hudi-bot commented on PR #9133: URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636223141 ## CI report: * 07deb3c1400d4fc530e434f6f9b74cb7640c7e47 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18403) * 76ba7ad679da5e445d7503a070f00dfb1814b1e4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18587) * 71d4fd08f41e4aab163a92e82a15e35cf9c79ea0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql
hudi-bot commented on PR #9123: URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636223046 ## CI report: * ca0ec686f26a2786bc350f3dfb1a83baf3bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18582) * c4b55caaa515af207aa3ba1bef87cb1568d9b38a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18589) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636222736 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585) * 07eb1aa79162259f3ac79e61bec621f68afb5551 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9123: [HUDI-6478] Simplifying INSERT_INTO configs for spark-sql
hudi-bot commented on PR #9123: URL: https://github.com/apache/hudi/pull/9123#issuecomment-1636212208 ## CI report: * ca0ec686f26a2786bc350f3dfb1a83baf3bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18582) * c4b55caaa515af207aa3ba1bef87cb1568d9b38a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6538) Refactor methods in TimelineDiffHelper class
Surya Prasanna Yalla created HUDI-6538: -- Summary: Refactor methods in TimelineDiffHelper class Key: HUDI-6538 URL: https://issues.apache.org/jira/browse/HUDI-6538 Project: Apache Hudi Issue Type: Task Reporter: Surya Prasanna Yalla Refactor methods in TimelineDiffHelper class to address following comment in [PR-9007|https://github.com/apache/hudi/pull/9007] {code:java} The methods getPendingReplaceCommitTransitions and getPendingLogCompactionTransitions look almost the same except the action type, can we abstract a little to merge them altogether?{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] nsivabalan commented on pull request #8716: [HUDI-6226] Support parquet native bloom filters
nsivabalan commented on PR #8716: URL: https://github.com/apache/hudi/pull/8716#issuecomment-1636201917 hey @parisni : good job on the patch. Curious to know if you have any perf nos on this. on both write and read side. whats the perf overhead we are seeing on the write side and how much improvement we are seeing on the read side w/ the bloom filter. Also, would you provide a short write up(whats this support is all about, how users can leverage this and whats the benefit) on this that we can use it in our release page? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #9121: [HUDI-6476] Improve the performance of getAllPartitionPaths
nsivabalan commented on code in PR #9121: URL: https://github.com/apache/hudi/pull/9121#discussion_r1263994796 ## hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java: ## @@ -106,42 +107,33 @@ private List getPartitionPathWithPathPrefix(String relativePathPrefix) t // TODO: Get the parallelism from HoodieWriteConfig int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size()); - // List all directories in parallel + // List all directories in parallel: + // if current dictionary contains PartitionMetadata, add it to result + // if current dictionary does not contain PartitionMetadata, add its subdirectory to queue to be processed. engineContext.setJobStatus(this.getClass().getSimpleName(), "Listing all partitions with prefix " + relativePathPrefix); - List dirToFileListing = engineContext.flatMap(pathsToList, path -> { + // result below holds a list of pair. first entry in the pair optionally holds the deduced list of partitions. + // and second entry holds optionally a directory path to be processed further. + List, Option>> result = engineContext.flatMap(pathsToList, path -> { FileSystem fileSystem = path.getFileSystem(hadoopConf.get()); -return Arrays.stream(fileSystem.listStatus(path)); +if (HoodiePartitionMetadata.hasPartitionMetadata(fileSystem, path)) { + return Stream.of(Pair.of(Option.of(FSUtils.getRelativePartitionPath(new Path(datasetBasePath), path)), Option.empty())); Review Comment: partition meta file could have extensions like parquet, orc etc. did we consider that? this was in previous code: fileStatus.getPath().getName().startsWith(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6537) Bump checkstyle version
[ https://issues.apache.org/jira/browse/HUDI-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6537: - Labels: pull-request-available (was: ) > Bump checkstyle version > --- > > Key: HUDI-6537 > URL: https://issues.apache.org/jira/browse/HUDI-6537 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > > Encountered an ambiguous checkstyle error here: > {code:java} > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -1 > 1058at java.lang.String.substring (String.java:1967) > 1059at org.apache.maven.plugins.checkstyle.RuleUtil.getCategory > (RuleUtil.java:95) > 1060at > org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.countViolations > (CheckstyleViolationCheckMojo.java:646) > 1061at > org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.execute > (CheckstyleViolationCheckMojo.java:564) > 1062at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:137) > 1063at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:370) > 1064at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:351) > 1065at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:215) > 1066at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:171) > 1067at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:163) > 1068at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:117) > 1069at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:81) > 1070at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:56) > 1071at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:128) > 1072at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:299) > 1073at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:193) > 1074at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:106) > 1075at org.apache.maven.cli.MavenCli.execute (MavenCli.java:963) > 1076at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296) > 1077at org.apache.maven.cli.MavenCli.main (MavenCli.java:199) > 1078at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > 1079at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > 1080at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > 1081at java.lang.reflect.Method.invoke (Method.java:498) > 1082at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > 1083at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > 1084at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > 1085at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:347) > 1086 {code} > [https://github.com/apache/hudi/actions/runs/5556435429/jobs/10148956808?pr=9133] > > Running the code in the same state iwith checkstyle:3.1.0 will throw the > error below (expected): > {code:java} > final CastMapConverter[] converters = IntStream. > range(0, fromChildren.size()) > .mapToObj(i -> { > LogicalType fromChild = fromChildren.get(i); > LogicalType toChild = toChildren.get(i); > if (isPrimitiveTypeRootEqual(fromChild.getTypeRoot(), > toChild.getTypeRoot())) { > return createNoOpConverter(); > ... > [ERROR] > src/main/java/org/apache/hudi/table/format/CastMapConverters.java:[315,52] > (extension) SeparatorWrapDot: '.' should be on a new line. > {code} > Bug describing this issue: > https://issues.apache.org/jira/browse/MCHECKSTYLE-347 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] voonhous opened a new pull request, #9200: [HUDI-6537] Bump checkstyle version to 3.1.0
voonhous opened a new pull request, #9200: URL: https://github.com/apache/hudi/pull/9200 ### Change Logs Bump checkstyle version to 3.1.0 due to an ambiguous checkstyle message that was thrown as shown below: ``` Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring (String.java:1967) at org.apache.maven.plugins.checkstyle.RuleUtil.getCategory (RuleUtil.java:95) at org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.countViolations (CheckstyleViolationCheckMojo.java:646) at org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.execute (CheckstyleViolationCheckMojo.java:564) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:370) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:351) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:171) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:163) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:299) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:193) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:106) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:963) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296) at org.apache.maven.cli.MavenCli.main (MavenCli.java:199) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) ``` https://github.com/apache/hudi/actions/runs/5556435429/jobs/10148956808?pr=9133 Bug: https://issues.apache.org/jira/browse/MCHECKSTYLE-347 Running the code with checkstyle:3.1.0 will throw the correct checkstyle error: ```log Running the code in the same state iwith checkstyle:3.1.0 will throw the error below (expected): final CastMapConverter[] converters = IntStream. range(0, fromChildren.size()) .mapToObj(i -> { LogicalType fromChild = fromChildren.get(i); LogicalType toChild = toChildren.get(i); if (isPrimitiveTypeRootEqual(fromChild.getTypeRoot(), toChild.getTypeRoot())) { return createNoOpConverter(); ... [ERROR] src/main/java/org/apache/hudi/table/format/CastMapConverters.java:[315,52] (extension) SeparatorWrapDot: '.' should be on a new line. ``` ### Impact _Describe any public API or user-facing feature change or any performance impact._ None ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com
[jira] [Created] (HUDI-6537) Bump checkstyle version
voon created HUDI-6537: -- Summary: Bump checkstyle version Key: HUDI-6537 URL: https://issues.apache.org/jira/browse/HUDI-6537 Project: Apache Hudi Issue Type: Bug Reporter: voon Encountered an ambiguous checkstyle error here: {code:java} Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 1058at java.lang.String.substring (String.java:1967) 1059at org.apache.maven.plugins.checkstyle.RuleUtil.getCategory (RuleUtil.java:95) 1060at org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.countViolations (CheckstyleViolationCheckMojo.java:646) 1061at org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.execute (CheckstyleViolationCheckMojo.java:564) 1062at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) 1063at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:370) 1064at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:351) 1065at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) 1066at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:171) 1067at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:163) 1068at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) 1069at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) 1070at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) 1071at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) 1072at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:299) 1073at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:193) 1074at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:106) 1075at org.apache.maven.cli.MavenCli.execute (MavenCli.java:963) 1076at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296) 1077at org.apache.maven.cli.MavenCli.main (MavenCli.java:199) 1078at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) 1079at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) 1080at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) 1081at java.lang.reflect.Method.invoke (Method.java:498) 1082at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) 1083at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) 1084at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) 1085at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) 1086 {code} [https://github.com/apache/hudi/actions/runs/5556435429/jobs/10148956808?pr=9133] Running the code in the same state iwith checkstyle:3.1.0 will throw the error below (expected): {code:java} final CastMapConverter[] converters = IntStream. range(0, fromChildren.size()) .mapToObj(i -> { LogicalType fromChild = fromChildren.get(i); LogicalType toChild = toChildren.get(i); if (isPrimitiveTypeRootEqual(fromChild.getTypeRoot(), toChild.getTypeRoot())) { return createNoOpConverter(); ... [ERROR] src/main/java/org/apache/hudi/table/format/CastMapConverters.java:[315,52] (extension) SeparatorWrapDot: '.' should be on a new line. {code} Bug describing this issue: https://issues.apache.org/jira/browse/MCHECKSTYLE-347 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6537) Bump checkstyle version
[ https://issues.apache.org/jira/browse/HUDI-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon reassigned HUDI-6537: -- Assignee: voon > Bump checkstyle version > --- > > Key: HUDI-6537 > URL: https://issues.apache.org/jira/browse/HUDI-6537 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > > Encountered an ambiguous checkstyle error here: > {code:java} > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -1 > 1058at java.lang.String.substring (String.java:1967) > 1059at org.apache.maven.plugins.checkstyle.RuleUtil.getCategory > (RuleUtil.java:95) > 1060at > org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.countViolations > (CheckstyleViolationCheckMojo.java:646) > 1061at > org.apache.maven.plugins.checkstyle.CheckstyleViolationCheckMojo.execute > (CheckstyleViolationCheckMojo.java:564) > 1062at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:137) > 1063at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:370) > 1064at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:351) > 1065at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:215) > 1066at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:171) > 1067at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:163) > 1068at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:117) > 1069at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:81) > 1070at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:56) > 1071at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:128) > 1072at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:299) > 1073at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:193) > 1074at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:106) > 1075at org.apache.maven.cli.MavenCli.execute (MavenCli.java:963) > 1076at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296) > 1077at org.apache.maven.cli.MavenCli.main (MavenCli.java:199) > 1078at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > 1079at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > 1080at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > 1081at java.lang.reflect.Method.invoke (Method.java:498) > 1082at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > 1083at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > 1084at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > 1085at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:347) > 1086 {code} > [https://github.com/apache/hudi/actions/runs/5556435429/jobs/10148956808?pr=9133] > > Running the code in the same state iwith checkstyle:3.1.0 will throw the > error below (expected): > {code:java} > final CastMapConverter[] converters = IntStream. > range(0, fromChildren.size()) > .mapToObj(i -> { > LogicalType fromChild = fromChildren.get(i); > LogicalType toChild = toChildren.get(i); > if (isPrimitiveTypeRootEqual(fromChild.getTypeRoot(), > toChild.getTypeRoot())) { > return createNoOpConverter(); > ... > [ERROR] > src/main/java/org/apache/hudi/table/format/CastMapConverters.java:[315,52] > (extension) SeparatorWrapDot: '.' should be on a new line. > {code} > Bug describing this issue: > https://issues.apache.org/jira/browse/MCHECKSTYLE-347 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] voonhous commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…
voonhous commented on PR #9133: URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636170813 We might need to update the checkstyle plugin from 3.0.0 to 3.1.0 due to this bug: https://issues.apache.org/jira/browse/MCHECKSTYLE-347 I will submit a PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…
hudi-bot commented on PR #9133: URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636164814 ## CI report: * 07deb3c1400d4fc530e434f6f9b74cb7640c7e47 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18403) * 76ba7ad679da5e445d7503a070f00dfb1814b1e4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18587) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…
hudi-bot commented on PR #9133: URL: https://github.com/apache/hudi/pull/9133#issuecomment-1636155755 ## CI report: * 07deb3c1400d4fc530e434f6f9b74cb7640c7e47 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18403) * 76ba7ad679da5e445d7503a070f00dfb1814b1e4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on a diff in pull request #9133: [HUDI-6474] Added support for reading tables evolved using comprehensive schema e…
voonhous commented on code in PR #9133: URL: https://github.com/apache/hudi/pull/9133#discussion_r1263947981 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java: ## @@ -165,21 +192,132 @@ void add(int pos, LogicalType fromType, LogicalType toType) { } break; } + case ARRAY: { +if (from == ARRAY) { + LogicalType fromElementType = fromType.getChildren().get(0); + LogicalType toElementType = toType.getChildren().get(0); + return array -> doArrayConversion((ArrayData) array, fromElementType, toElementType); +} +break; + } + case MAP: { +if (from == MAP) { + return map -> doMapConversion((MapData) map, fromType, toType); +} +break; + } + case ROW: { +if (from == ROW) { + // Assumption: InternalSchemaManager should produce a cast that is of the same size + return row -> doRowConversion((RowData) row, fromType, toType); +} +break; + } default: } -return null; +throw new IllegalArgumentException(String.format("Unsupported conversion for %s => %s", fromType, toType)); } - private void add(int pos, Cast cast) { -castMap.put(pos, cast); + /** + * Helper function to perform convert an arrayData from one LogicalType to another. + * + * @param arrayNon-null array data to be converted; however array-elements are allowed to be null + * @param fromType The input LogicalType of the row data to be converted from + * @param toType The output LogicalType of the row data to be converted to + * @return Converted array that has the structure/specifications of that defined by the output LogicalType + */ + private static ArrayData doArrayConversion(@Nonnull ArrayData array, LogicalType fromType, LogicalType toType) { +// using Object type here as primitives are not allowed to be null +Object[] objects = new Object[array.size()]; +for (int i = 0; i < array.size(); i++) { + Object fromObject = ArrayData.createElementGetter(fromType).getElementOrNull(array, i); + // need to handle nulls to prevent NullPointerException in #getConversion() + Object toObject = fromObject != null ? getConversion(fromType, toType).apply(fromObject) : null; + objects[i] = toObject; +} +return new GenericArrayData(objects); + } + + /** + * Helper function to perform convert a MapData from one LogicalType to another. + * + * @param map Non-null map data to be converted; however, values are allowed to be null + * @param fromType The input LogicalType of the row data to be converted from + * @param toType The output LogicalType of the row data to be converted to + * @return Converted map that has the structure/specifications of that defined by the output LogicalType + */ + private static MapData doMapConversion(@Nonnull MapData map, LogicalType fromType, LogicalType toType) { +// no schema evolution is allowed on the keyType, hence, we only need to care about the valueType +LogicalType fromValueType = fromType.getChildren().get(1); +LogicalType toValueType = toType.getChildren().get(1); +LogicalType keyType = fromType.getChildren().get(0); + +final Map result = new HashMap<>(); +for (int i = 0; i < map.size(); i++) { + Object keyObject = ArrayData.createElementGetter(keyType).getElementOrNull(map.keyArray(), i); + Object fromObject = ArrayData.createElementGetter(fromValueType).getElementOrNull(map.valueArray(), i); + // need to handle nulls to prevent NullPointerException in #getConversion() + Object toObject = fromObject != null ? getConversion(fromValueType, toValueType).apply(fromObject) : null; + result.put(keyObject, toObject); +} +return new GenericMapData(result); + } + + /** + * Helper function to perform convert a RowData from one LogicalType to another. + * + * @param row Non-null row data to be converted; however, fields might contain nulls + * @param fromType The input LogicalType of the row data to be converted from + * @param toType The output LogicalType of the row data to be converted to + * @return Converted row that has the structure/specifications of that defined by the output LogicalType + */ + private static RowData doRowConversion(@Nonnull RowData row, LogicalType fromType, LogicalType toType) { +// note: InternalSchema.merge guarantees that the schema to be read fromType is orientated in the same order as toType +// hence, we can match types by position as it is guaranteed that it is referencing the same field +List fromChildren = fromType.getChildren(); +List toChildren = toType.getChildren(); +ValidationUtils.checkArgument(fromChildren.size() == toChildren.size(), +"fromType [" + fromType + "] size: != toType [" + toType + "] size"); + +GenericRow
[jira] [Updated] (HUDI-6536) Mention table version change in 0.11.x release notes
[ https://issues.apache.org/jira/browse/HUDI-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6536: Fix Version/s: 0.14.0 > Mention table version change in 0.11.x release notes > > > Key: HUDI-6536 > URL: https://issues.apache.org/jira/browse/HUDI-6536 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6536) Mention table version change in 0.11.x release notes
Ethan Guo created HUDI-6536: --- Summary: Mention table version change in 0.11.x release notes Key: HUDI-6536 URL: https://issues.apache.org/jira/browse/HUDI-6536 Project: Apache Hudi Issue Type: Bug Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6536) Mention table version change in 0.11.x release notes
[ https://issues.apache.org/jira/browse/HUDI-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-6536: --- Assignee: Ethan Guo > Mention table version change in 0.11.x release notes > > > Key: HUDI-6536 > URL: https://issues.apache.org/jira/browse/HUDI-6536 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on a diff in pull request #9198: [HUDI-6532] Fix a typo in BaseFlinkCommitActionExecutor.
yihua commented on code in PR #9198: URL: https://github.com/apache/hudi/pull/9198#discussion_r1263920351 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/BaseFlinkCommitActionExecutor.java: ## @@ -194,7 +194,7 @@ protected Iterator> handleUpsertPartition( } } } catch (Throwable t) { - String msg = "Error upsetting bucketType " + bucketType + " for partition :" + partitionPath; + String msg = "Error setting up bucketType " + bucketType + " for partition :" + partitionPath; Review Comment: This should be `upserting`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1636084210 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6530] Applying schema during ingestion using a schema provider for s3/gcs metadata job (#9191)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 8f7877f2855 [HUDI-6530] Applying schema during ingestion using a schema provider for s3/gcs metadata job (#9191) 8f7877f2855 is described below commit 8f7877f28559f49b90225a279d5a7ad50c689c0b Author: lokesh-lingarajan-0310 <84048984+lokesh-lingarajan-0...@users.noreply.github.com> AuthorDate: Fri Jul 14 08:39:36 2023 -0700 [HUDI-6530] Applying schema during ingestion using a schema provider for s3/gcs metadata job (#9191) Co-authored-by: Lokesh Lingarajan --- .../org/apache/hudi/utilities/UtilHelpers.java | 8 + .../hudi/utilities/sources/GcsEventsSource.java| 11 +- .../hudi/utilities/sources/S3EventsSource.java | 17 +- .../utilities/sources/TestGcsEventsSource.java | 42 - .../hudi/utilities/sources/TestS3EventsSource.java | 4 +- .../resources/streamer-config/gcs-metadata.avsc| 60 --- .../resources/streamer-config/s3-metadata.avsc | 188 + 7 files changed, 299 insertions(+), 31 deletions(-) diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java index a0d241752c5..35a5c9fcb47 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java @@ -60,6 +60,7 @@ import org.apache.hudi.utilities.schema.SchemaProvider; import org.apache.hudi.utilities.schema.SchemaProviderWithPostProcessor; import org.apache.hudi.utilities.schema.SparkAvroPostProcessor; import org.apache.hudi.utilities.schema.postprocessor.ChainedSchemaPostProcessor; +import org.apache.hudi.utilities.sources.InputBatch; import org.apache.hudi.utilities.sources.Source; import org.apache.hudi.utilities.sources.processor.ChainedJsonKafkaSourcePostProcessor; import org.apache.hudi.utilities.sources.processor.JsonKafkaSourcePostProcessor; @@ -193,6 +194,13 @@ public class UtilHelpers { } + public static StructType getSourceSchema(SchemaProvider schemaProvider) { +if (schemaProvider != null && schemaProvider.getSourceSchema() != null && schemaProvider.getSourceSchema() != InputBatch.NULL_SCHEMA) { + return AvroConversionUtils.convertAvroSchemaToStructType(schemaProvider.getSourceSchema()); +} +return null; + } + public static Option createTransformer(Option> classNamesOpt, Option sourceSchema, boolean isErrorTableWriterEnabled) throws IOException { diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsSource.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsSource.java index dfc9b5b2b2e..89ce7eddf54 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsSource.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsSource.java @@ -22,6 +22,7 @@ import org.apache.hudi.common.config.TypedProperties; import org.apache.hudi.common.util.Option; import org.apache.hudi.common.util.collection.Pair; import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.utilities.UtilHelpers; import org.apache.hudi.utilities.exception.HoodieReadFromSourceException; import org.apache.hudi.utilities.schema.SchemaProvider; import org.apache.hudi.utilities.sources.helpers.gcs.MessageBatch; @@ -35,6 +36,7 @@ import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Encoders; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.types.StructType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -96,6 +98,7 @@ absolute_path_to/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar \ public class GcsEventsSource extends RowSource { private final PubsubMessagesFetcher pubsubMessagesFetcher; + private final SchemaProvider schemaProvider; private final boolean ackMessages; private final List messagesToAck = new ArrayList<>(); @@ -121,6 +124,7 @@ public class GcsEventsSource extends RowSource { this.pubsubMessagesFetcher = pubsubMessagesFetcher; this.ackMessages = props.getBoolean(ACK_MESSAGES.key(), ACK_MESSAGES.defaultValue()); +this.schemaProvider = schemaProvider; LOG.info("Created GcsEventsSource"); } @@ -146,7 +150,12 @@ public class GcsEventsSource extends RowSource { LOG.info("Returning checkpoint value: " + CHECKPOINT_VALUE_ZERO); -return Pair.of(Option.of(sparkSession.read().json(eventRecords)), CHECKPOINT_VALUE_ZERO); +StructType sourceSchema = UtilHelpers.getSourceSchema(schemaProvider); +if (sourceSchema != null) { + return Pair.of(Option.of(sparkSession.read().schema(
[GitHub] [hudi] nsivabalan commented on pull request #9191: [HUDI-6530] Applying schema during ingestion using a schema provider for s3/gcs metadata job
nsivabalan commented on PR #9191: URL: https://github.com/apache/hudi/pull/9191#issuecomment-1636038995 CI failed due to a flaky test. going ahead w/ landing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan merged pull request #9191: [HUDI-6530] Applying schema during ingestion using a schema provider for s3/gcs metadata job
nsivabalan merged PR #9191: URL: https://github.com/apache/hudi/pull/9191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6535) Need a way to be able to schedule cleaner service inline and execute as a offline job
Amaresh Bingumalla created HUDI-6535: Summary: Need a way to be able to schedule cleaner service inline and execute as a offline job Key: HUDI-6535 URL: https://issues.apache.org/jira/browse/HUDI-6535 Project: Apache Hudi Issue Type: New Feature Reporter: Amaresh Bingumalla With the current hudi version 0.13.1 there is no way to schedule cleaner service as part of the writer job. Only possible options are execute inline or scheduleAndExecute offline jobs. Having an inline schedule only option similar to compaction jobs will be helpful to see when the cleaner services are required. Related compactor code - [https://github.com/apache/hudi/blob/51ddf1affcdead2e3b5e871ba4816c71e6f4b99a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java#L194] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1635928412 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18233) * 26b3151e371774b3e99324bd9c305157fcde5789 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18585) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9198: [HUDI-6532] Fix a typo in BaseFlinkCommitActionExecutor.
hudi-bot commented on PR #9198: URL: https://github.com/apache/hudi/pull/9198#issuecomment-1635872857 ## CI report: * db352e825762702d4989dabd66472029303d5026 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18584) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1635861329 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18233) * 26b3151e371774b3e99324bd9c305157fcde5789 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9197: [HUDI-6531] Little adjust to avoid creating an object but no need in one case
hudi-bot commented on PR #9197: URL: https://github.com/apache/hudi/pull/9197#issuecomment-1635850226 ## CI report: * 95b59f74ed1b5608e71bb03c0933bcc239e6d497 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18583) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9198: [HUDI-6532] Fix a typo in BaseFlinkCommitActionExecutor.
hudi-bot commented on PR #9198: URL: https://github.com/apache/hudi/pull/9198#issuecomment-1635709474 ## CI report: * db352e825762702d4989dabd66472029303d5026 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18584) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6533) Glue Catalog Sync not working with 0.12.3.
[ https://issues.apache.org/jira/browse/HUDI-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Goenka updated HUDI-6533: Priority: Blocker (was: Critical) > Glue Catalog Sync not working with 0.12.3. > -- > > Key: HUDI-6533 > URL: https://issues.apache.org/jira/browse/HUDI-6533 > Project: Apache Hudi > Issue Type: Bug > Components: meta-sync >Reporter: Aditya Goenka >Priority: Blocker > Fix For: 0.14.0 > > > Glue Catalog sync is broken with minor versions - 0.12.3 and 0.13.1 > Also not working with master. > Github Issue - [https://github.com/apache/hudi/issues/9134] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6534) Spark Consistent Hashing row writer support
[ https://issues.apache.org/jira/browse/HUDI-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6534: - Labels: pull-request-available (was: ) > Spark Consistent Hashing row writer support > --- > > Key: HUDI-6534 > URL: https://issues.apache.org/jira/browse/HUDI-6534 > Project: Apache Hudi > Issue Type: New Feature > Components: index, spark, writer-core >Reporter: Qijun Fu >Priority: Major > Labels: pull-request-available > > Spark Consistent Hashing row writer support -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] stream2000 opened a new pull request, #9199: [HUDI-6534]Support consistent hashing row write
stream2000 opened a new pull request, #9199: URL: https://github.com/apache/hudi/pull/9199 ### Change Logs Support consistent hashing row writer ### Impact Support consistent hashing row writer ### Risk level (write none, low medium or high below) medium, will enabled by default since row writer is enabled by default ### Documentation Update will update document after landing ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6534) Spark Consistent Hashing row writer support
Qijun Fu created HUDI-6534: -- Summary: Spark Consistent Hashing row writer support Key: HUDI-6534 URL: https://issues.apache.org/jira/browse/HUDI-6534 Project: Apache Hudi Issue Type: New Feature Components: index, spark, writer-core Reporter: Qijun Fu Spark Consistent Hashing row writer support -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] ad1happy2go commented on issue #9134: [SUPPORT] Failed to sync hive metastore with Hudi 0.12.3 and AWS Glue 4.0 (Spark 3.3)
ad1happy2go commented on issue #9134: URL: https://github.com/apache/hudi/issues/9134#issuecomment-1635689522 @xmubeta Able to reproduce this issue, Looks like a regression for 0.12.3 and 0.13.1. Created a critical JIRA to fix it - https://issues.apache.org/jira/browse/HUDI-6533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6533) Glue Catalog Sync not working with 0.12.3.
Aditya Goenka created HUDI-6533: --- Summary: Glue Catalog Sync not working with 0.12.3. Key: HUDI-6533 URL: https://issues.apache.org/jira/browse/HUDI-6533 Project: Apache Hudi Issue Type: Bug Components: meta-sync Reporter: Aditya Goenka Fix For: 0.14.0 Glue Catalog sync is broken with minor versions - 0.12.3 and 0.13.1 Also not working with master. Github Issue - [https://github.com/apache/hudi/issues/9134] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9198: [HUDI-6532] Fix a typo in BaseFlinkCommitActionExecutor.
hudi-bot commented on PR #9198: URL: https://github.com/apache/hudi/pull/9198#issuecomment-1635658437 ## CI report: * db352e825762702d4989dabd66472029303d5026 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6532) Fix a typo in BaseFlinkCommitActionExecutor.
[ https://issues.apache.org/jira/browse/HUDI-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6532: - Labels: pull-request-available (was: ) > Fix a typo in BaseFlinkCommitActionExecutor. > > > Key: HUDI-6532 > URL: https://issues.apache.org/jira/browse/HUDI-6532 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: StarBoy1005 >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-07-14-18-06-04-273.png > > > Here is creating an Iterator object, I guess the word "upsetting" in > exception is kind of misleading. > !image-2023-07-14-18-06-04-273.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9197: [HUDI-6531] Little adjust to avoid creating an object but no need in one case
hudi-bot commented on PR #9197: URL: https://github.com/apache/hudi/pull/9197#issuecomment-1635645105 ## CI report: * 95b59f74ed1b5608e71bb03c0933bcc239e6d497 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18583) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org