[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
hudi-bot commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650991561 ## CI report: * 463953fa8ffd4e41dcc02c67cf931d894a12848d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18838) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] chandu-1101 commented on issue #9141: [BUG] Example from Hudi Quick start doesnt work!
chandu-1101 commented on issue #9141: URL: https://github.com/apache/hudi/issues/9141#issuecomment-1650981047 Wow! Wonderful. Thank you once again. I will put the flag check and get back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] big-doudou commented on pull request #9182: [HUDI-6588] Fix duplicate fileId on TM partial-failover and recovery
big-doudou commented on PR #9182: URL: https://github.com/apache/hudi/pull/9182#issuecomment-1650926469 > Each failed attempt of a subtask would trigger invocation of `StreamWriteOperatorCoordinator#subtaskFailed`, the original write metadata would got cleaned, The StreamWriteOperatorCoordinator#subtaskFailed just set eventBuffer=null. How does this affect metadata cleaning? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] JingFengWang opened a new issue, #9285: When compiling hudi-0.13.0, the package org.apache.http does not exist error is thrown
JingFengWang opened a new issue, #9285: URL: https://github.com/apache/hudi/issues/9285 **Steps to reproduce the behavior:** **run command:** mvn clean package -DskipTests -Dflink1.13 -Dscala-2.11 **exception log:** [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /D:/project/hudi/hudi-common/src/main/java/org/apache/hudi/common/table/view/RemoteHoodieTableFileSystemView.java:[46,23] 程序包org.apache.http不存在 [ERROR] /D:/project/hudi/hudi-common/src/main/java/org/apache/hudi/common/table/view/PriorityBasedFileSystemView.java:[35,23] 找不到符号 符号: 类 HttpStatus 位置: 程序包 org.apache.http [INFO] 2 errors [INFO] - [INFO] [INFO] Reactor Summary for Hudi 0.14.0-SNAPSHOT: [INFO] [INFO] Hudi ... SUCCESS [ 3.645 s] [INFO] hudi-tests-common .. SUCCESS [ 3.805 s] [INFO] hudi-common FAILURE [ 12.802 s] [INFO] hudi-hadoop-mr . SKIPPED [INFO] hudi-sync-common ... SKIPPED [INFO] hudi-hive-sync . SKIPPED [INFO] hudi-aws ... SKIPPED [INFO] hudi-timeline-service .. SKIPPED [INFO] hudi-client SKIPPED [INFO] hudi-client-common . SKIPPED [INFO] hudi-spark-client .. SKIPPED [INFO] hudi-spark-datasource .. SKIPPED [INFO] hudi-spark-common_2.11 . SKIPPED [INFO] hudi-spark2_2.11 ... SKIPPED [INFO] hudi-java-client ... SKIPPED [INFO] hudi-spark_2.11 SKIPPED [INFO] hudi-gcp ... SKIPPED [INFO] hudi-utilities_2.11 SKIPPED [INFO] hudi-utilities-bundle_2.11 . SKIPPED [INFO] hudi-cli ... SKIPPED [INFO] hudi-flink-client .. SKIPPED [INFO] hudi-datahub-sync .. SKIPPED [INFO] hudi-adb-sync .. SKIPPED [INFO] hudi-sync .. SKIPPED [INFO] hudi-hadoop-mr-bundle .. SKIPPED [INFO] hudi-datahub-sync-bundle ... SKIPPED [INFO] hudi-hive-sync-bundle .. SKIPPED [INFO] hudi-aws-bundle SKIPPED [INFO] hudi-gcp-bundle SKIPPED [INFO] hudi-spark2.4-bundle_2.11 .. SKIPPED [INFO] hudi-presto-bundle . SKIPPED [INFO] hudi-utilities-slim-bundle_2.11 SKIPPED [INFO] hudi-timeline-server-bundle SKIPPED [INFO] hudi-trino-bundle .. SKIPPED [INFO] hudi-examples .. SKIPPED [INFO] hudi-examples-common ... SKIPPED [INFO] hudi-examples-spark SKIPPED [INFO] hudi-flink-datasource .. SKIPPED [INFO] hudi-flink1.13.x ... SKIPPED [INFO] hudi-flink . SKIPPED [INFO] hudi-examples-flink SKIPPED [INFO] hudi-examples-java . SKIPPED [INFO] hudi-flink1.14.x ... SKIPPED [INFO] hudi-flink1.15.x ... SKIPPED [INFO] hudi-flink1.16.x ... SKIPPED [INFO] hudi-flink1.17.x ... SKIPPED [INFO] hudi-kafka-connect . SKIPPED [INFO] hudi-flink1.13-bundle .. SKIPPED [INFO] hudi-kafka-connect-bundle .. SKIPPED [INFO] hudi-cli-bundle_2.11 ... SKIPPED [INFO] hudi-spark2-common . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 21.270 s [INFO] Finished at: 2023-07-26T10:12:26+08:00 [INFO] [ERROR] Failed to execute goal
[GitHub] [hudi] ad1happy2go commented on issue #9143: [SUPPORT] Failure to delete records with missing attributes from PostgresDebeziumSource
ad1happy2go commented on issue #9143: URL: https://github.com/apache/hudi/issues/9143#issuecomment-1650912511 @Sam-Serpoosh In this case we need to maintain global uniqueness , so Global Index should be the right option. On a large dataset it might have downside, but as partition value is not coming we anyway need to delete doing lookup from the entire dataset only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Sam-Serpoosh commented on issue #9143: [SUPPORT] Failure to delete records with missing attributes from PostgresDebeziumSource
Sam-Serpoosh commented on issue #9143: URL: https://github.com/apache/hudi/issues/9143#issuecomment-1650903089 @ad1happy2go IIUC, here are the options: - Leverage `REPLICA IDENTITY FULL` which has some downsides as mentioned in PG documentation and the article I shared in my earlier comment. - Leverage `REPLICA IDENTITY USING ` as long as the field upon which we'd like to partition has a UNIQUE index in the upstream PG Table. - Leverage the `GLOBAL_BLOOM` indexing you mentioned. Are there any downsides or trade-offs with the `GLOBAL_BLOOM` index type I should keep in mind? I'll try this approach as well on my end **without** `REPLICA IDENTITY FULL` and see how it goes. Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
danny0405 commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650874179 Okay, a static lock mapping makes sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
Zouxxyy commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650870862 > Can you elaborate a little more why the table service client can hold a separate lock ? Because `InProcessLockProvider` is valid as long as it is in the same JVM process (see `static final Map LOCK_INSTANCE_PER_BASEPATH = new ConcurrentHashMap<>();`), other locks can not be in the same JVM. Maybe i'm missing something. Of course, it is best to use the same lock manager, because it used to be like this before #6732. And CI seems to be stable now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
hudi-bot commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650869802 ## CI report: * 4e64258913f8f19b139ab1407f0c08d812f65669 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18833) * 463953fa8ffd4e41dcc02c67cf931d894a12848d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18838) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
hudi-bot commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650864450 ## CI report: * 4e64258913f8f19b139ab1407f0c08d812f65669 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18833) * 463953fa8ffd4e41dcc02c67cf931d894a12848d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #9229: [HUDI-6565] Spark offline compaction add failed retry mechanism
danny0405 commented on code in PR #9229: URL: https://github.com/apache/hudi/pull/9229#discussion_r1274291759 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java: ## @@ -101,6 +104,12 @@ public static class Config implements Serializable { public String runningMode = null; @Parameter(names = {"--strategy", "-st"}, description = "Strategy Class", required = false) public String strategyClassName = LogFileSizeBasedCompactionStrategy.class.getName(); +@Parameter(names = {"--job-max-processing-time-ms", "-mt"}, description = "Take effect when using --mode/-m execute or scheduleAndExecute. " ++ "If maxProcessingTimeMs passed but compaction job is still unfinished, hoodie would consider this job as failed and relaunch.") +public long maxProcessingTimeMs = 0; +@Parameter(names = {"--retry-last-failed-compaction-job", "-rc"}, description = "Take effect when using --mode/-m execute or scheduleAndExecute. " Review Comment: > the failed inflight compaction plan which will never been re-run Can we fix that rollback by including the inflight compactions instead of introducing new config options? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #9182: [HUDI-6588] Fix duplicate fileId on TM partial-failover and recovery
danny0405 commented on PR #9182: URL: https://github.com/apache/hudi/pull/9182#issuecomment-1650862935 Each failed attempt of a subtask would trigger invocation of `StreamWriteOperatorCoordinator#subtaskFailed`, the original write metadata would got cleaned, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 merged pull request #9274: [MINOR] fix millis append format error
danny0405 merged PR #9274: URL: https://github.com/apache/hudi/pull/9274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [MINOR] Fix millis append format error (#9274)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 2a022393388 [MINOR] Fix millis append format error (#9274) 2a022393388 is described below commit 2a0223933884cb044e7aa56f205cae926358a030 Author: KnightChess <981159...@qq.com> AuthorDate: Wed Jul 26 10:02:53 2023 +0800 [MINOR] Fix millis append format error (#9274) --- .../apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java index 5223227fce9..366d654bec1 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java @@ -114,7 +114,7 @@ public class HoodieInstantTimeGenerator { /** * Creates an instant string given a valid date-time string. - * @param dateString A date-time string in the format -MM-dd HH:mm:ss[:SSS] + * @param dateString A date-time string in the format -MM-dd HH:mm:ss[.SSS] * @return A timeline instant * @throws ParseException If we cannot parse the date string */ @@ -124,7 +124,7 @@ public class HoodieInstantTimeGenerator { } catch (Exception e) { // Attempt to add the milliseconds in order to complete parsing return getInstantFromTemporalAccessor(LocalDateTime.parse( - String.format("%s:%s", dateString, DEFAULT_MILLIS_EXT), MILLIS_GRANULARITY_DATE_FORMATTER)); + String.format("%s.%s", dateString, DEFAULT_MILLIS_EXT), MILLIS_GRANULARITY_DATE_FORMATTER)); } }
[GitHub] [hudi] codope closed issue #8761: [SUPPORT] "Illegal Lambda Deserialization" When Leveraging PostgresDebeziumSource
codope closed issue #8761: [SUPPORT] "Illegal Lambda Deserialization" When Leveraging PostgresDebeziumSource URL: https://github.com/apache/hudi/issues/8761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
danny0405 commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650853733 > I think about it again, and think that it’s okay to not pass the txtmanger Can you elaborate a little more why the table service client can hold a separate lock ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink
danny0405 commented on code in PR #9211: URL: https://github.com/apache/hudi/pull/9211#discussion_r1274284407 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java: ## @@ -114,10 +116,28 @@ public void testCheckpointFails() throws Exception { } @Test - public void testSubtaskFails() throws Exception { + public void testSubtaskFailsWithEagerFailedWritesCleanPolicy() throws Exception { +testSubtaskFails() +// the last checkpoint instant was rolled back by subTaskFails(0, 2) +// with EAGER cleaning strategy +.assertNoEvent() Review Comment: Can we add new tests instead of modifying existing one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650807747 ## CI report: * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18832) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18837) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650614871 ## CI report: * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18832) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18837) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
kazdy commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650583696 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9276: Mor perf spark33
hudi-bot commented on PR #9276: URL: https://github.com/apache/hudi/pull/9276#issuecomment-1650565195 ## CI report: * 54a4e7e9aeabb42258e0d1f2b6cfa2960275c330 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18836) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9276: Mor perf spark33
hudi-bot commented on PR #9276: URL: https://github.com/apache/hudi/pull/9276#issuecomment-1650555772 ## CI report: * 37d3b9365a38e8f266c1c486e9d18c9ef34be2a0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18808) * 54a4e7e9aeabb42258e0d1f2b6cfa2960275c330 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bhasudha opened a new pull request, #9284: [DOCS] Change algolia search to leverage crawler instead of legacy do…
bhasudha opened a new pull request, #9284: URL: https://github.com/apache/hudi/pull/9284 …csearch ### Change Logs Migrating to new crawler based search ### Impact can affect website search functionality ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kkalanda-score closed pull request #9283: CI (Take 2)
kkalanda-score closed pull request #9283: CI (Take 2) URL: https://github.com/apache/hudi/pull/9283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kkalanda-score opened a new pull request, #9283: CI (Take 2)
kkalanda-score opened a new pull request, #9283: URL: https://github.com/apache/hudi/pull/9283 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (42799c0956f -> 03bc5549c7a)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 42799c0956f [HUDI-6438] Config parameter 'MAKE_NEW_COLUMNS_NULLABLE' to allow for marking a newly created column as nullable. (#9262) add 03bc5549c7a [HUDI-6509] Add GitHub CI for Java 17 (#9136) No new revisions were added by this update. Summary of changes: .github/workflows/bot.yml | 105 ++-- .github/workflows/pr_compliance.yml| 2 +- hudi-common/pom.xml| 7 + .../org/apache/hudi/avro/TestHoodieAvroUtils.java | 8 +- .../common/fs/TestHoodieWrapperFileSystem.java | 30 +++- .../common/functional/TestHoodieLogFormat.java | 25 ++- .../TestHoodieLogFormatAppendFailure.java | 10 +- .../hudi/common/testutils/HoodieTestUtils.java | 29 .../util/TestDFSPropertiesConfiguration.java | 14 +- .../hudi/common/util/TestObjectSizeCalculator.java | 30 ++-- .../spark/sql/hive/TestHiveClientUtils.scala | 25 ++- packaging/bundle-validation/Dockerfile | 25 +++ packaging/bundle-validation/ci_run.sh | 5 +- .../bundle-validation/conf/core-site.xml | 14 +- .../bundle-validation/conf/hdfs-site.xml | 25 ++- .../docker_java17/TestHiveClientUtils.scala| 27 ++-- .../docker_java17/docker_java17_test.sh| 178 + packaging/bundle-validation/run_docker_java17.sh | 116 ++ packaging/bundle-validation/validate.sh| 3 +- pom.xml| 16 +- 20 files changed, 605 insertions(+), 89 deletions(-) copy docker/hoodie/hadoop/hive_base/conf/hive-site.xml => packaging/bundle-validation/conf/core-site.xml (78%) copy hudi-flink-datasource/hudi-flink/src/test/resources/hive-site.xml => packaging/bundle-validation/conf/hdfs-site.xml (71%) copy hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/IndexRecord.java => packaging/bundle-validation/docker_java17/TestHiveClientUtils.scala (60%) create mode 100755 packaging/bundle-validation/docker_java17/docker_java17_test.sh create mode 100755 packaging/bundle-validation/run_docker_java17.sh
[GitHub] [hudi] yihua merged pull request #9136: [HUDI-6509] Add GitHub CI for Java 17
yihua merged PR #9136: URL: https://github.com/apache/hudi/pull/9136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on pull request #9136: [HUDI-6509] Add GitHub CI for Java 17
yihua commented on PR #9136: URL: https://github.com/apache/hudi/pull/9136#issuecomment-1650381811 CI has passed for [6b33d37](https://github.com/apache/hudi/pull/9136/commits/6b33d37bc57d2b5be3649590fee6767f34cccea3). No need to rerun CI again. https://github.com/apache/hudi/assets/2497195/eb90a7d2-c98a-4f3d-aa83-d0c6b6d7efe4;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6591) getAllPartitionPaths perf fix need to account for parquet/orc partition path meta files
sivabalan narayanan created HUDI-6591: - Summary: getAllPartitionPaths perf fix need to account for parquet/orc partition path meta files Key: HUDI-6591 URL: https://issues.apache.org/jira/browse/HUDI-6591 Project: Apache Hudi Issue Type: Bug Components: reader-core Reporter: sivabalan narayanan [https://github.com/apache/hudi/pull/9121/files?diff=split=0#r1263994796] we might need to follow up to ensure we dont' break the parquet/orc partition meta file flows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on a diff in pull request #9136: [HUDI-6509] Add GitHub CI for Java 17
yihua commented on code in PR #9136: URL: https://github.com/apache/hudi/pull/9136#discussion_r1273945242 ## .github/workflows/bot.yml: ## @@ -112,6 +112,61 @@ jobs: run: mvn test -Pfunctional-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -pl "$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS + test-spark-java17: +runs-on: ubuntu-latest +strategy: + matrix: +include: + - scalaProfile: "scala-2.12" +sparkProfile: "spark3.3" +sparkModules: "hudi-spark-datasource/hudi-spark3.3.x" + - scalaProfile: "scala-2.12" +sparkProfile: "spark3.4" +sparkModules: "hudi-spark-datasource/hudi-spark3.4.x" + +steps: + - uses: actions/checkout@v3 + - name: Set up JDK 8 +uses: actions/setup-java@v3 +with: + java-version: '8' + distribution: 'adopt' + architecture: x64 + - name: Build Project +env: + SCALA_PROFILE: ${{ matrix.scalaProfile }} + SPARK_PROFILE: ${{ matrix.sparkProfile }} +run: + mvn clean install -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -DskipTests=true $MVN_ARGS + - name: Set up JDK 17 +uses: actions/setup-java@v3 +with: + java-version: '17' + distribution: 'adopt' + architecture: x64 + - name: Quickstart Test +env: + SCALA_PROFILE: ${{ matrix.scalaProfile }} + SPARK_PROFILE: ${{ matrix.sparkProfile }} +run: + mvn test -Punit-tests -Pjava17 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -pl hudi-examples/hudi-examples-spark $MVN_ARGS + - name: UT - Common & Spark +env: + SCALA_PROFILE: ${{ matrix.scalaProfile }} + SPARK_PROFILE: ${{ matrix.sparkProfile }} + SPARK_MODULES: ${{ matrix.sparkModules }} +if: ${{ !endsWith(env.SPARK_PROFILE, '3.2') }} # skip test spark 3.2 as it's covered by Azure CI Review Comment: nit: not required. ## .github/workflows/bot.yml: ## @@ -151,6 +206,34 @@ jobs: mvn clean install -Pintegration-tests -D"$SCALA_PROFILE" -D"$FLINK_PROFILE" -pl hudi-flink-datasource/hudi-flink -am -Davro.version=1.10.0 -DskipTests=true $MVN_ARGS mvn verify -Pintegration-tests -D"$SCALA_PROFILE" -D"$FLINK_PROFILE" -pl hudi-flink-datasource/hudi-flink $MVN_ARGS + docker-java17-test: +runs-on: ubuntu-latest +strategy: + matrix: +include: + - flinkProfile: 'flink1.17' +sparkProfile: 'spark3.4' +sparkRuntime: 'spark3.4.0' + +steps: + - uses: actions/checkout@v3 + - name: Set up JDK 8 +uses: actions/setup-java@v3 +with: + java-version: '8' + distribution: 'adopt' + architecture: x64 + - name: UT/FT - Docker Test - OpenJDK 17 +env: + FLINK_PROFILE: ${{ matrix.flinkProfile }} + SPARK_PROFILE: ${{ matrix.sparkProfile }} + SPARK_RUNTIME: ${{ matrix.sparkRuntime }} + SCALA_PROFILE: 'scala-2.12' +if: ${{ env.SPARK_PROFILE >= 'spark3.4' }} # Only support Spark 3.4 for now Review Comment: nit: not required. ## packaging/bundle-validation/run_docker_java17.sh: ## @@ -0,0 +1,116 @@ +#!/bin/bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +echo "SPARK_RUNTIME: $SPARK_RUNTIME SPARK_PROFILE (optional): $SPARK_PROFILE" +echo "SCALA_PROFILE: $SCALA_PROFILE" +CONTAINER_NAME=hudi_docker +DOCKER_TEST_DIR=/opt/bundle-validation/docker-test + +# choose versions based on build profiles +if [[ ${SPARK_RUNTIME} == 'spark2.4.8' ]]; then + HADOOP_VERSION=2.7.7 + HIVE_VERSION=2.3.9 + DERBY_VERSION=10.10.2.0 + FLINK_VERSION=1.13.6 + SPARK_VERSION=2.4.8 + SPARK_HADOOP_VERSION=2.7 + CONFLUENT_VERSION=5.5.12 + KAFKA_CONNECT_HDFS_VERSION=10.1.13 + IMAGE_TAG=flink1136hive239spark248 +elif [[ ${SPARK_RUNTIME} == 'spark3.0.2' ]]; then + HADOOP_VERSION=2.7.7 + HIVE_VERSION=3.1.3 + DERBY_VERSION=10.14.1.0 + FLINK_VERSION=1.14.6 + SPARK_VERSION=3.0.2 + SPARK_HADOOP_VERSION=2.7 + CONFLUENT_VERSION=5.5.12 + KAFKA_CONNECT_HDFS_VERSION=10.1.13 + IMAGE_TAG=flink1146hive313spark302 +elif [[
[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
hudi-bot commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650373193 ## CI report: * 4e64258913f8f19b139ab1407f0c08d812f65669 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18833) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] amrishlal commented on issue #7244: [SUPPORT] DBT Merge creates duplicates
amrishlal commented on issue #7244: URL: https://github.com/apache/hudi/issues/7244#issuecomment-1650363772 Verified using the latest master using the same model as @ad1happy2go above and successfully ran the model **DBT run** ```amrish@Amrishs-MBP github-issue-7244 % dbt run 18:45:29 Running with dbt=1.5.3 18:45:29 [WARNING]: Deprecated functionality The `source-paths` config has been renamed to `model-paths`. Please update your `dbt_project.yml` configuration to reflect this change. 18:45:29 [WARNING]: Deprecated functionality The `data-paths` config has been renamed to `seed-paths`. Please update your `dbt_project.yml` configuration to reflect this change. 18:45:29 Registered adapter: spark=1.5.0 18:45:29 Found 1 model, 2 tests, 0 snapshots, 0 analyses, 357 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups 18:45:29 18:45:31 Concurrency: 1 threads (target='dev') 18:45:31 18:45:31 1 of 1 START sql incremental model default.issue_7244_model [RUN] 18:45:38 1 of 1 OK created sql incremental model default.issue_7244_model ... [OK in 7.93s] 18:45:39 18:45:39 Finished running 1 incremental model in 0 hours 0 minutes and 9.22 seconds (9.22s). 18:45:39 18:45:39 Completed successfully 18:45:39 18:45:39 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 amrish@Amrishs-MBP github-issue-7244 % ``` **spark-sql verification** ``` spark-sql> show databases; default test_database1 Time taken: 2.562 seconds, Fetched 2 row(s) spark-sql> use default > ; 23/07/25 11:47:20 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Time taken: 0.125 seconds spark-sql> show tables; issue_7244_model my_first_dbt_model my_first_dbt_model1 my_second_dbt_model Time taken: 0.263 seconds, Fetched 4 row(s) spark-sql> select * from issue_7244_model > ; 23/07/25 11:47:43 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf 23/07/25 11:47:43 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file 2023072511453132720230725114531327_1_4 2 fbb84dbc-e72f-4ac6-990a-d0205e2aaab3-0_1-33-0_20230725114531327.parquet 2 anyway 2023-07-25 11:45:31.367 2023072511453132720230725114531327_2_5 3 c1d85730-7a1a-4845-bb4a-1b7128f6de3d-0_2-34-0_20230725114531327.parquet 3 bye 2023-07-25 11:45:31.367 2023072511453132720230725114531327_0_6 1 1da126fe-eb3a-4982-ab77-f294458eefea-0_0-32-0_20230725114531327.parquet 1 yo 2023-07-25 11:45:31.367 Time taken: 4.461 seconds, Fetched 3 row(s) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650303969 ## CI report: * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18832) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] chattarajoy commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix
chattarajoy commented on PR #8795: URL: https://github.com/apache/hudi/pull/8795#issuecomment-1650303459 Is there a place where I can find the timeline on when this can possibly be released? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig
xushiyan commented on code in PR #9221: URL: https://github.com/apache/hudi/pull/9221#discussion_r1273870474 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java: ## @@ -98,8 +98,9 @@ public HiveSyncConfig(Properties props) { public HiveSyncConfig(Properties props, Configuration hadoopConf) { super(props, hadoopConf); -HiveConf hiveConf = hadoopConf instanceof HiveConf -? (HiveConf) hadoopConf : new HiveConf(hadoopConf, HiveConf.class); +HiveConf hiveConf = new HiveConf(); +// HiveConf needs to load Hadoop conf to allow instantiation via AWSGlueClientFactory +hiveConf.addResource(hadoopConf); Review Comment: i think the ideal approach is to make the passed-in `hiveConf` load hadoop conf properly to use `AWSGlueClientFactory` at the very beginning (when creating hive sync config) so that nothing needs to load at this point. cc @yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rmnlchh commented on issue #9282: [ISSUE] Hudi 0.13.0. Spark 3.3.2 Deltastreamed table read failure
rmnlchh commented on issue #9282: URL: https://github.com/apache/hudi/issues/9282#issuecomment-1650228180 > @rmnlchh Just curious, Did you set these configs > > ``` > sc.set("spark.sql.legacy.parquet.nanosAsLong", "false"); > sc.set("spark.sql.parquet.binaryAsString", "false"); > sc.set("spark.sql.parquet.int96AsTimestamp", "true"); > sc.set("spark.sql.caseSensitive", "false"); > ``` > > with your deltastreamer also? I will try to reproduce this issue . Yes, adding all the DS configs println(s"hoodieDeltaStreamerConfig=$hoodieDeltaStreamerConfig") println(s"typedProperties=$typedProperties") println("HERE JSC" + jsc.getConf.getAll.mkString) val hoodieDeltaStreamer = new HoodieDeltaStreamer(hoodieDeltaStreamerConfig, jsc , FSUtils.getFs(hoodieDeltaStreamerConfig.targetBasePath, conf), jsc.hadoopConfiguration , org.apache.hudi.common.util.Option.of(typedProperties) ) hoodieDeltaStreamerConfig=Config{targetBasePath='/XXX/cdp-datapipeline-curation/cdp-datapipeline-curation/datalake-deltastreamer/./tmp/CreativeDeltaStreamerTest/Domain=CampaignBuild/Table=published_creative/', targetTableName='published_creative', tableType='MERGE_ON_READ', baseFileFormat='PARQUET', propsFilePath='file://XXX/cdp-datapipeline-curation/cdp-datapipeline-curation/datalake-deltastreamer/src/test/resources/delta-streamer-config/dfs-source.properties', configs=[], sourceClassName='org.apache.hudi.utilities.sources.AvroKafkaSource', sourceOrderingField='AssetValue', payloadClassName='org.apache.hudi.common.model.OverwriteWithLatestAvroPayload', schemaProviderClassName='com.cardlytics.datapipeline.deltastreamer.schema.ResourceBasedSchemaProvider', transformerClassNames=[org.apache.hudi.utilities.transform.SqlQueryBasedTransformer], sourceLimit=9223372036854775807, operation=UPSERT, filterDupes=false, enableHiveSync=false, enableMetaSync=false, forceEmptyMetaSync=false, syn cClientToolClassNames=org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool, maxPendingCompactions=5, maxPendingClustering=5, continuousMode=false, minSyncIntervalSeconds=0, sparkMaster='', commitOnErrors=false, deltaSyncSchedulingWeight=1, compactSchedulingWeight=1, clusterSchedulingWeight=1, deltaSyncSchedulingMinShare=0, compactSchedulingMinShare=0, clusterSchedulingMinShare=0, forceDisableCompaction=true, checkpoint='null', initialCheckpointProvider='null', help=false} typedProperties={spark.sql.avro.compression.codec=snappy, hoodie.datasource.hive_sync.table=published_creative, hoodie.datasource.hive_sync.partition_fields=Entity, hoodie.metadata.index.column.stats.enable=false, hoodie.index.type=BLOOM, hoodie.datasource.write.reconcile.schema=true, hoodie.deltastreamer.schemaprovider.source.schema.file=domain/campaignbuild/schema/creative.avsc, bootstrap.servers=PLAINTEXT://localhost:34873, hoodie.compact.inline=false, hoodie.deltastreamer.transformer.sql= SELECT 'Creative' Entity ,o.CreativeId ,o.PreMessageImpression ,o.PostMessageImpression ,o.Assets.Type AssetType ,o.Assets.Slot AssetSlot ,o.Assets.Label AssetLabel ,o.Assets.Value AssetValue FROM (SELECT a.CreativeId, a.PreMessageImpression, a.PostMessageImpression, explode(a.Assets) Assets FROM a) o , hoodie.parquet.max.file.size=6291456, hoodie.datasource.write.recordkey.field=CreativeId,AssetSlot, hoodie.index.bloom.num_entries=6, hoodie.datasource.hive_sync.support_timestamp=true, hoodie.metadata.enable=false, schema.registry.url=http://localhost:34874, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator, hoodie.datasource.write.table.type=MERGE_ON_READ, hoodie.deltastreamer.source.kafka.topic=CMPN-CmpnPub-AdServer-Creative, hoodie.datasource.write.hive_style_partitioning=true, hoodie.metadata.insert.parallelism=1, hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable=false, hoodie.parquet.compression.codec=snappy, spark.io.compression.codec=snappy, hoodie.deltastreamer.schemaprovider.target.schema.file=domain/campaignbuild/schema/published_creative_table.json, hoodie.bloom.index.prune.by.ranges=true, hoodie.datasource.write.partitionpath.field=Entity, hoodie.datasource.write.keygenerator.consistent.logical.time stamp.enabled=true, hoodie.parquet.block.size=6291456, hoodie.cleaner.fileversions.retained=2, hoodie.table.name=published_creative, hoodie.upsert.shuffle.parallelism=4, hoodie.meta.sync.client.tool.class=org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool, spark.sql.parquet.compression.codec=snappy, hoodie.datasource.write.precombine.field=AssetValue, hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload,
[GitHub] [hudi] ad1happy2go commented on issue #9282: [ISSUE] Hudi 0.13.0. Spark 3.3.2 Deltastreamed table read failure
ad1happy2go commented on issue #9282: URL: https://github.com/apache/hudi/issues/9282#issuecomment-1650223209 @rmnlchh Just curious, Did you set these configs ``` sc.set("spark.sql.legacy.parquet.nanosAsLong", "false"); sc.set("spark.sql.parquet.binaryAsString", "false"); sc.set("spark.sql.parquet.int96AsTimestamp", "true"); sc.set("spark.sql.caseSensitive", "false"); ``` with your deltastreamer also? I will try to reproduce this issue . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6589) Upsert failing for array type if value given [null]
[ https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Goenka updated HUDI-6589: Fix Version/s: (was: 0.12.1) > Upsert failing for array type if value given [null] > --- > > Key: HUDI-6589 > URL: https://issues.apache.org/jira/browse/HUDI-6589 > Project: Apache Hudi > Issue Type: Bug >Reporter: Aditya Goenka >Priority: Critical > > Hudi Upserts are failing when data in a nested field is [null], > Details in GitHub issue (see last comment) - > [https://github.com/apache/hudi/issues/9141] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-6589) Upsert failing for array type if value given [null]
[ https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Goenka closed HUDI-6589. --- Fix Version/s: 0.12.1 (was: 0.15.0) Resolution: Resolved > Upsert failing for array type if value given [null] > --- > > Key: HUDI-6589 > URL: https://issues.apache.org/jira/browse/HUDI-6589 > Project: Apache Hudi > Issue Type: Bug >Reporter: Aditya Goenka >Priority: Critical > Fix For: 0.12.1 > > > Hudi Upserts are failing when data in a nested field is [null], > Details in GitHub issue (see last comment) - > [https://github.com/apache/hudi/issues/9141] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-6589) Upsert failing for array type if value given [null]
[ https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747076#comment-17747076 ] Aditya Goenka commented on HUDI-6589: - Upon further investigation and debugging, it has been determined that to address the issue related to Avro-parquet compatibility and allow arrays with null elements, you need to set the Spark configuration parameter spark.hadoop.parquet.avro.write-old-list-structure to false. This configuration parameter controls the behavior of how Avro arrays with null elements are written to Parquet format. By default, Avro arrays with null elements are written in a way that preserves their internal structure, which can cause compatibility problems with certain tools. By setting spark.hadoop.parquet.avro.write-old-list-structure to false, you enable support for arrays with null elements and ensure they are handled correctly during the write process. This was not a Hudi issue. I was able to insert the record you pasted by just setting this --conf 'spark.hadoop.parquet.avro.write-old-list-structure=false > Upsert failing for array type if value given [null] > --- > > Key: HUDI-6589 > URL: https://issues.apache.org/jira/browse/HUDI-6589 > Project: Apache Hudi > Issue Type: Bug >Reporter: Aditya Goenka >Priority: Critical > Fix For: 0.15.0 > > > Hudi Upserts are failing when data in a nested field is [null], > Details in GitHub issue (see last comment) - > [https://github.com/apache/hudi/issues/9141] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-6589) Upsert failing for array type if value given [null]
[ https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Goenka resolved HUDI-6589. - > Upsert failing for array type if value given [null] > --- > > Key: HUDI-6589 > URL: https://issues.apache.org/jira/browse/HUDI-6589 > Project: Apache Hudi > Issue Type: Bug >Reporter: Aditya Goenka >Priority: Critical > Fix For: 0.15.0 > > > Hudi Upserts are failing when data in a nested field is [null], > Details in GitHub issue (see last comment) - > [https://github.com/apache/hudi/issues/9141] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] bhasudha commented on issue #9282: [ISSUE] Hudi 0.13.0. Spark 3.3.2 Deltastreamed table read failure
bhasudha commented on issue #9282: URL: https://github.com/apache/hudi/issues/9282#issuecomment-1650203382 @yihua @ad1happy2go if you can help reproduce and trige this further. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9105: [HUDI-6459] Add Rollback and multi-writer tests for Record Level Index
hudi-bot commented on PR #9105: URL: https://github.com/apache/hudi/pull/9105#issuecomment-1650200162 ## CI report: * 6bd80d5ce84b468293bc292f43dd0ca236c646d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18830) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rmnlchh opened a new issue, #9282: [ISSUE] Hudi 0.13.0. Spark 3.3.2 Deltastreamed table read failure
rmnlchh opened a new issue, #9282: URL: https://github.com/apache/hudi/issues/9282 As part of our pipelines, we use tables that are being deltastreamed. Trying to upgrade to EMR 6.11 (which bring hudi 0.13.0/spark 3.3.2) we started facing issue which is discussed in https://github.com/apache/hudi/issues/8061#issuecomment-1447657892 The fix with sc.set("spark.sql.legacy.parquet.nanosAsLong", "false"); sc.set("spark.sql.parquet.binaryAsString", "false"); sc.set("spark.sql.parquet.int96AsTimestamp", "true"); sc.set("spark.sql.caseSensitive", "false"); worked for all the cases except for those where we query delta streamed tables. Steps to reproduce the behavior: 1. Use hudi 0.13.0, spark 3.3.2 2. Used spark configs: spark.shuffle.spill.compress -> true spark.serializer -> org.apache.spark.serializer.KryoSerializer spark.sql.warehouse.dir -> file:/XXX/cdp-datapipeline-curation/datalake-deltastreamer/spark-warehouse spark.sql.parquet.int96AsTimestamp -> true spark.io.compression.lz4.blockSize -> 64k spark.executor.extraJavaOptions -> -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED spark.driver.host -> 127.0.0.1 spark.sql.hive.convertMetastoreParquet -> false spark.broadcast.compress -> true spark.io.compression.codec -> snappy spark.sql.adaptive.skewJoin.enabled -> true spark.sql.parquet.binaryAsString -> false spark.driver.port -> 36083 spark.rdd.compress -> true spark.io.compression.zstd.level -> 1 spark.sql.caseSensitive -> false spark.shuffle.compress -> true spark.io.compression.zstd.bufferSize -> 64k spark.sql.catalog -> org.apache.spark.sql.hudi.catalog.HoodieCatalog spark.sql.parquet.int96RebaseModeInRead -> LEGACY spark.memory.storageFraction -> 0.20 spark.app.name -> CreativeDeltaStreamerTest-creative-deltastreamer-1689954313 spark.sql.parquet.datetimeRebaseModeInWrite -> LEGACY spark.sql.parquet.outputTimestampType -> TIMESTAMP_MICROS spark.sql.avro.datetimeRebaseModeInWrite -> LEGACY spark.sql.avro.compression.codec -> snappy spark.sql.legacy.parquet.nanosAsLong -> false spark.sql.extension -> org.apache.spark.sql.hudi.HoodieSparkSessionExtension spark.app.startTime -> 1689968713919 spark.executor.id -> driver spark.sql.parquet.enableVectorizedReader -> true spark.sql.legacy.timeParserPolicy -> LEGACY spark.driver.extraJavaOptions -> -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED spark.sql.parquet.datetimeRebaseModeInRead -> LEGACY spark.driver.memoryOverheadFactor -> 0.15 spark.master -> local[*] spark.sql.parquet.filterPushdown -> true spark.executor.cores -> 1 spark.memory.fraction -> 0.50 spark.sql.avro.datetimeRebaseModeInRead -> LEGACY spark.executor.memoryOverheadFactor -> 0.20 spark.sql.parquet.compression.codec -> snappy spark.sql.parquet.recordLevelFilter.enabled -> true spark.app.id -> local-1689968714613 3. Used Delta streamer configs hoodie.datasource.hive_sync.database -> datalake_ods_local hoodie.datasource.hive_sync.support_timestamp -> true hoodie.datasource.write.precombine.field -> StartDateUtc hoodie.datasource.hive_sync.partition_fields -> CampaignId hoodie.metadata.index.column.stats.enable -> true hoodie.cleaner.fileversions.retained -> 2 hoodie.parquet.max.file.size -> 6291456 hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled -> true hoodie.bloom.index.prune.by.ranges -> true hoodie.parquet.block.size -> 6291456 hoodie.metadata.enable -> true hoodie.datasource.hive_sync.table
[GitHub] [hudi] hudi-bot commented on pull request #9246: [HUDI-6548] Two log compaction instants can be scheduled at the same time
hudi-bot commented on PR #9246: URL: https://github.com/apache/hudi/pull/9246#issuecomment-1650135501 ## CI report: * c2effa1ea1fdd82828efbf88afbf6cd6be019eb3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18831) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
hudi-bot commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650135633 ## CI report: * 257b18bc9faffdf7d063fb153e5ee1b53d57 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18797) * 4e64258913f8f19b139ab1407f0c08d812f65669 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18833) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650121068 ## CI report: * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828) * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18832) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…
hudi-bot commented on PR #9255: URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650120837 ## CI report: * 257b18bc9faffdf7d063fb153e5ee1b53d57 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18797) * 4e64258913f8f19b139ab1407f0c08d812f65669 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650105696 ## CI report: * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828) * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6590) Improve BigQuery Sync Schema and Partition Handling
[ https://issues.apache.org/jira/browse/HUDI-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown updated HUDI-6590: Summary: Improve BigQuery Sync Schema and Partition Handling (was: Improve BigQuery Sync Support) > Improve BigQuery Sync Schema and Partition Handling > --- > > Key: HUDI-6590 > URL: https://issues.apache.org/jira/browse/HUDI-6590 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Minor > > Add features for Schema evolution and listing only required base files while > querying the table to cut down on BigQuery usage costs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6590) Improve BigQuery Sync Support
[ https://issues.apache.org/jira/browse/HUDI-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-6590: --- Assignee: Timothy Brown > Improve BigQuery Sync Support > - > > Key: HUDI-6590 > URL: https://issues.apache.org/jira/browse/HUDI-6590 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Minor > > Add features for Schema evolution and listing only required base files while > querying the table to cut down on BigQuery usage costs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6590) Improve BigQuery Sync Support
Timothy Brown created HUDI-6590: --- Summary: Improve BigQuery Sync Support Key: HUDI-6590 URL: https://issues.apache.org/jira/browse/HUDI-6590 Project: Apache Hudi Issue Type: Improvement Reporter: Timothy Brown Add features for Schema evolution and listing only required base files while querying the table to cut down on BigQuery usage costs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650029611 ## CI report: * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828) * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9212: [HUDI-6541] Multiple writers should create new and different instant time to avoid marker conflict of same instant
hudi-bot commented on PR #9212: URL: https://github.com/apache/hudi/pull/9212#issuecomment-1650029030 ## CI report: * 32766783236e3f0b5adcc973a77ff9cf782726e5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18829) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649902580 ## CI report: * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #7244: [SUPPORT] DBT Merge creates duplicates
ad1happy2go commented on issue #7244: URL: https://github.com/apache/hudi/issues/7244#issuecomment-1649877906 @faizhasan @rshanmugam1 Apologies for the delay here. I tried to reproduce and found out that it is working fine. I tried with 0.12.1 version. Model I used, exactly like we have in ticket ``` {{ config( materialized = 'incremental', incremental_strategy = 'merge', file_format = 'hudi', options={ 'type': 'cow', 'primaryKey': 'id', 'preCombineKey': 'ts', }, unique_key = 'id', location_root='file:///tmp/dbt/issue_7244_1/' ) }} {% if not is_incremental() %} select cast(1 as bigint) as id, 'yo' as msg, current_timestamp() as ts union all select cast(2 as bigint) as id, 'anyway' as msg, current_timestamp() as ts union all select cast(3 as bigint) as id, 'bye' as msg, current_timestamp() as ts {% else %} select cast(1 as bigint) as id, 'yo_updated' as msg, current_timestamp() as ts union all select cast(2 as bigint) as id, 'anyway_updated' as msg, current_timestamp() as ts union all select cast(3 as bigint) as id, 'bye_updated' as msg, current_timestamp() as ts {% endif %} ``` here are the results after first and second run -- ![image](https://github.com/apache/hudi/assets/63430370/1d8b2c1e-7bee-44ff-a146-b62e45227c90) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9246: [HUDI-6548] Two log compaction instants can be scheduled at the same time
hudi-bot commented on PR #9246: URL: https://github.com/apache/hudi/pull/9246#issuecomment-1649827320 ## CI report: * 136d780d0a9ca38f88c613433f05f868be01d0d5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18734) * c2effa1ea1fdd82828efbf88afbf6cd6be019eb3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18831) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9105: [HUDI-6459] Add Rollback and multi-writer tests for Record Level Index
hudi-bot commented on PR #9105: URL: https://github.com/apache/hudi/pull/9105#issuecomment-1649826685 ## CI report: * 5611851113d971d2f76fe2072ca87c1df0eae6ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18404) * 6bd80d5ce84b468293bc292f43dd0ca236c646d8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18830) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9246: [HUDI-6548] Two log compaction instants can be scheduled at the same time
hudi-bot commented on PR #9246: URL: https://github.com/apache/hudi/pull/9246#issuecomment-1649811839 ## CI report: * 136d780d0a9ca38f88c613433f05f868be01d0d5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18734) * c2effa1ea1fdd82828efbf88afbf6cd6be019eb3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9105: [HUDI-6459] Add Rollback and multi-writer tests for Record Level Index
hudi-bot commented on PR #9105: URL: https://github.com/apache/hudi/pull/9105#issuecomment-1649811247 ## CI report: * 5611851113d971d2f76fe2072ca87c1df0eae6ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18404) * 6bd80d5ce84b468293bc292f43dd0ca236c646d8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink
hudi-bot commented on PR #9211: URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649795214 ## CI report: * b6afe889ca6b47f4d1d934bb552cc1c489f9d0af UNKNOWN * f8607c6bd9ecf09e8da2d6b372a80eff2221108d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18826) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #9246: [HUDI-6548] Two log compaction instants can be scheduled at the same time
lokeshj1703 commented on code in PR #9246: URL: https://github.com/apache/hudi/pull/9246#discussion_r1273472332 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java: ## @@ -277,7 +278,7 @@ public HoodieTimeline getCommitsTimeline() { * Get all instants (commits, delta commits, replace, compaction) that produce new data or merge file, in the active timeline. */ public HoodieTimeline getCommitsAndCompactionTimeline() { -return getTimelineOfActions(CollectionUtils.createSet(COMMIT_ACTION, DELTA_COMMIT_ACTION, REPLACE_COMMIT_ACTION, COMPACTION_ACTION)); +return getTimelineOfActions(CollectionUtils.createSet(COMMIT_ACTION, DELTA_COMMIT_ACTION, REPLACE_COMMIT_ACTION, COMPACTION_ACTION, LOG_COMPACTION_ACTION)); Review Comment: @nsivabalan This was added as part of https://github.com/apache/hudi/pull/9038. Seems like this API should also consider inflight logcompaction. I have removed it from this PR but if it makes sense I will create a separate PR for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zaza commented on issue #6900: [SUPPORT]Hudi Failed to read MARKERS file
zaza commented on issue #6900: URL: https://github.com/apache/hudi/issues/6900#issuecomment-1649736303 This is definitely still an issue, we were hit by an error that looks identical to what @umehrot2 reported a while ago: ``` ERROR UpsertPartitioner: Error trying to compute average bytes/record org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://tasktop-data-platform-dev-analytical-data/simulator/workstreams/.hoodie/20230714152804208.commit at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:824) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:310) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.common.table.timeline.HoodieDefaultTimeline.getInstantDetails(HoodieDefaultTimeline.java:438) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.UpsertPartitioner.averageBytesPerRecord(UpsertPartitioner.java:380) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.UpsertPartitioner.assignInserts(UpsertPartitioner.java:169) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.UpsertPartitioner.(UpsertPartitioner.java:98) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpsertPartitioner(BaseSparkCommitActionExecutor.java:404) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getPartitioner(BaseSparkCommitActionExecutor.java:224) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:170) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:83) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:68) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:44) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:107) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:96) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:140) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:372) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1] at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) ~[spark-catalyst_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
[GitHub] [hudi] hudi-bot commented on pull request #9280: [HUDI-6587] Handle hollow commit for time travel query
hudi-bot commented on PR #9280: URL: https://github.com/apache/hudi/pull/9280#issuecomment-1649724112 ## CI report: * db146be5542714a978e1d6fcdbd146e2aa834931 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18825) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9212: [HUDI-6541] Multiple writers should create new and different instant time to avoid marker conflict of same instant
hudi-bot commented on PR #9212: URL: https://github.com/apache/hudi/pull/9212#issuecomment-1649723729 ## CI report: * f494be8b2b8d4e9d5a6d595eea8bc907602efd35 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18823) * 32766783236e3f0b5adcc973a77ff9cf782726e5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18829) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9212: [HUDI-6541] Multiple writers should create new and different instant time to avoid marker conflict of same instant
hudi-bot commented on PR #9212: URL: https://github.com/apache/hudi/pull/9212#issuecomment-1649706487 ## CI report: * f494be8b2b8d4e9d5a6d595eea8bc907602efd35 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18823) * 32766783236e3f0b5adcc973a77ff9cf782726e5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6589) Upsert failing for array type if value given [null]
[ https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Goenka updated HUDI-6589: Priority: Critical (was: Major) > Upsert failing for array type if value given [null] > --- > > Key: HUDI-6589 > URL: https://issues.apache.org/jira/browse/HUDI-6589 > Project: Apache Hudi > Issue Type: Bug >Reporter: Aditya Goenka >Priority: Critical > Fix For: 0.15.0 > > > Hudi Upserts are failing when data in a nested field is [null], > Details in GitHub issue (see last comment) - > [https://github.com/apache/hudi/issues/9141] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6589) Upsert failing for array type if value given [null]
Aditya Goenka created HUDI-6589: --- Summary: Upsert failing for array type if value given [null] Key: HUDI-6589 URL: https://issues.apache.org/jira/browse/HUDI-6589 Project: Apache Hudi Issue Type: Bug Reporter: Aditya Goenka Fix For: 0.15.0 Hudi Upserts are failing when data in a nested field is [null], Details in GitHub issue (see last comment) - [https://github.com/apache/hudi/issues/9141] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649630856 ## CI report: * 4d363f192f951fb54799602270fb0ca16ce19d39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18812) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18827) * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649620494 ## CI report: * 4d363f192f951fb54799602270fb0ca16ce19d39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18812) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18827) * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline
hudi-bot commented on PR #9209: URL: https://github.com/apache/hudi/pull/9209#issuecomment-1649620034 ## CI report: * 8f2dc4ec3e26f1908ae5d15f194bf70ca7dab27e UNKNOWN * a7f8558aaffdaab4850780224e1385c3e682372a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18824) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
hudi-bot commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649609922 ## CI report: * 4d363f192f951fb54799602270fb0ca16ce19d39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18812) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18827) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9212: [HUDI-6541] Multiple writers should create new and different instant time to avoid marker conflict of same instant
hudi-bot commented on PR #9212: URL: https://github.com/apache/hudi/pull/9212#issuecomment-1649609463 ## CI report: * f494be8b2b8d4e9d5a6d595eea8bc907602efd35 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18823) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables
kazdy commented on PR #9277: URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649530790 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink
SteNicholas commented on PR #9211: URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649522478 @danny0405, could you take a look at this pull request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink
hudi-bot commented on PR #9211: URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649517541 ## CI report: * b6afe889ca6b47f4d1d934bb552cc1c489f9d0af UNKNOWN * 278399029bc5dc3ab81d0366b65aaed3cf019b7c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18821) * f8607c6bd9ecf09e8da2d6b372a80eff2221108d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18826) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink
hudi-bot commented on PR #9211: URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649500610 ## CI report: * b6afe889ca6b47f4d1d934bb552cc1c489f9d0af UNKNOWN * d8e39cb69480b8eb9014f09f6b84e741b9092a9f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18635) * 278399029bc5dc3ab81d0366b65aaed3cf019b7c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18821) * f8607c6bd9ecf09e8da2d6b372a80eff2221108d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy opened a new pull request, #9281: [WIP] Add HooideTable in BaseHoodieClient
Zouxxyy opened a new pull request, #9281: URL: https://github.com/apache/hudi/pull/9281 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stream2000 commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink
stream2000 commented on PR #9211: URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649457125 > even if the failed writes clean policy could be inferred from optimistic concurrent control is enabled, this support has no conflict with the inference. Do you mean by even if we can infer the lazy clean config, we still won't add the clean operator to the pipeline so we still need this pr? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] adityaverma1997 commented on issue #9257: [SUPPORT] Parquet files got cleaned up even when cleaning operation failed hence leading to subsequent failed clustering and cleaning
adityaverma1997 commented on issue #9257: URL: https://github.com/apache/hudi/issues/9257#issuecomment-1649440527 Correct me if I am wrong here, though I am running async cleaning but cleaning frequency is controlled by the following hudi configuration, which is: ``` hoodie.clean.max.commits ``` which is set as 10 in my case, so cleaner will get scheduled and executed after every 10th commit. On the other hand, we can retain no of commits when cleaning is executed based on below configuration: ``` hoodie.cleaner.commits.retained ``` I have set it as 2, so it will retain latest 2 commits and clean remaining commits on every cleaning execution. Looking forward for your reply @danny0405 and @ad1happy2go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9280: [HUDI-6587] Handle hollow commit for time travel query
hudi-bot commented on PR #9280: URL: https://github.com/apache/hudi/pull/9280#issuecomment-1649431922 ## CI report: * db146be5542714a978e1d6fcdbd146e2aa834931 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18825) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9274: [MINOR] fix millis append format error
hudi-bot commented on PR #9274: URL: https://github.com/apache/hudi/pull/9274#issuecomment-1649431799 ## CI report: * 94d9dbcb05d1505d4a1d5e82dca8a8ba946f47da Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18806) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18818) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on a diff in pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink
SteNicholas commented on code in PR #9211: URL: https://github.com/apache/hudi/pull/9211#discussion_r1273233193 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java: ## @@ -95,6 +95,9 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context context) { DataStream pipeline = Pipelines.append(conf, rowType, dataStream, context.isBounded()); if (OptionsResolver.needsAsyncClustering(conf)) { return Pipelines.cluster(conf, rowType, pipeline); +} else if (OptionsResolver.isLazyFailedWritesCleanPolicy(conf)) { Review Comment: @stream2000, thanks for the reminder. I have modified `HoodieFlinkStreamer`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9280: [HUDI-6587] Handle hollow commit for time travel query
hudi-bot commented on PR #9280: URL: https://github.com/apache/hudi/pull/9280#issuecomment-1649418587 ## CI report: * db146be5542714a978e1d6fcdbd146e2aa834931 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig
xushiyan commented on code in PR #9221: URL: https://github.com/apache/hudi/pull/9221#discussion_r1273223195 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java: ## @@ -98,8 +98,9 @@ public HiveSyncConfig(Properties props) { public HiveSyncConfig(Properties props, Configuration hadoopConf) { super(props, hadoopConf); -HiveConf hiveConf = hadoopConf instanceof HiveConf -? (HiveConf) hadoopConf : new HiveConf(hadoopConf, HiveConf.class); +HiveConf hiveConf = new HiveConf(); +// HiveConf needs to load Hadoop conf to allow instantiation via AWSGlueClientFactory +hiveConf.addResource(hadoopConf); Review Comment: not so sure if this is equivalent to holding the original `hadoopConf` as this changes the order of addResources() during constructing. We should be good only if we can verify the equivalence. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig
xushiyan commented on PR #9221: URL: https://github.com/apache/hudi/pull/9221#issuecomment-1649394492 > Hi @xushiyan, I noticed the casting from hadoopConf to hiveConf was introduced by this PR from you(#6202) but I couldn't find any context. Could you help me learn why we made that change? > > ``` > HiveConf hiveConf = hadoopConf instanceof HiveConf > ? (HiveConf) hadoopConf : new HiveConf(hadoopConf, HiveConf.class); > ``` hey @CTTY it's probably meant for being fully compatible with the original code, as it was done for refactoring. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6587) Handle hollow commit for time travel query
[ https://issues.apache.org/jira/browse/HUDI-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6587: - Labels: pull-request-available (was: ) > Handle hollow commit for time travel query > -- > > Key: HUDI-6587 > URL: https://issues.apache.org/jira/browse/HUDI-6587 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: Raymond Xu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan opened a new pull request, #9280: [HUDI-6587] Handle hollow commit for time travel query
xushiyan opened a new pull request, #9280: URL: https://github.com/apache/hudi/pull/9280 ### Change Logs Fail time-travel query when the given timestamp covers any hollow commit. ### Impact Time travel query behavior. Time travel usually won't cover hollow commit, which mostly exists within recent time frame. ### Risk level Low ### Documentation Update NA ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stream2000 commented on a diff in pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink
stream2000 commented on code in PR #9211: URL: https://github.com/apache/hudi/pull/9211#discussion_r1273176064 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java: ## @@ -95,6 +95,9 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context context) { DataStream pipeline = Pipelines.append(conf, rowType, dataStream, context.isBounded()); if (OptionsResolver.needsAsyncClustering(conf)) { return Pipelines.cluster(conf, rowType, pipeline); +} else if (OptionsResolver.isLazyFailedWritesCleanPolicy(conf)) { Review Comment: Should we also modify `HoodieFlinkStreamer` here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline
hudi-bot commented on PR #9209: URL: https://github.com/apache/hudi/pull/9209#issuecomment-1649344787 ## CI report: * 8f2dc4ec3e26f1908ae5d15f194bf70ca7dab27e UNKNOWN * c281ded6d554350dfe362cce496d6d72cfe0bbbe Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18822) * a7f8558aaffdaab4850780224e1385c3e682372a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18824) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ksmou commented on a diff in pull request #9229: [HUDI-6565] Spark offline compaction add failed retry mechanism
ksmou commented on code in PR #9229: URL: https://github.com/apache/hudi/pull/9229#discussion_r1273169390 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java: ## @@ -101,6 +104,12 @@ public static class Config implements Serializable { public String runningMode = null; @Parameter(names = {"--strategy", "-st"}, description = "Strategy Class", required = false) public String strategyClassName = LogFileSizeBasedCompactionStrategy.class.getName(); +@Parameter(names = {"--job-max-processing-time-ms", "-mt"}, description = "Take effect when using --mode/-m execute or scheduleAndExecute. " ++ "If maxProcessingTimeMs passed but compaction job is still unfinished, hoodie would consider this job as failed and relaunch.") +public long maxProcessingTimeMs = 0; +@Parameter(names = {"--retry-last-failed-compaction-job", "-rc"}, description = "Take effect when using --mode/-m execute or scheduleAndExecute. " Review Comment: Yes, we need it to process the failed inflight compaction plan which will never been re-run in default way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] big-doudou commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
big-doudou commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1649335027 > What if this is an new task that has not yet had a successful checkpoint -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline
hudi-bot commented on PR #9209: URL: https://github.com/apache/hudi/pull/9209#issuecomment-1649331921 ## CI report: * 8f2dc4ec3e26f1908ae5d15f194bf70ca7dab27e UNKNOWN * c281ded6d554350dfe362cce496d6d72cfe0bbbe Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18822) * a7f8558aaffdaab4850780224e1385c3e682372a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline
hudi-bot commented on PR #9209: URL: https://github.com/apache/hudi/pull/9209#issuecomment-1649319585 ## CI report: * 9889e40cdf17f6f24ddefff010a063d4dd2c58e7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18820) * 8f2dc4ec3e26f1908ae5d15f194bf70ca7dab27e UNKNOWN * c281ded6d554350dfe362cce496d6d72cfe0bbbe Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18822) * a7f8558aaffdaab4850780224e1385c3e682372a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stream2000 commented on a diff in pull request #9199: [HUDI-6534]Support consistent hashing row writer
stream2000 commented on code in PR #9199: URL: https://github.com/apache/hudi/pull/9199#discussion_r1273148438 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/ConsistentBucketIndexBulkInsertPartitionerWithRows.java: ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.execution.bulkinsert; + +import org.apache.hudi.common.model.ConsistentHashingNode; +import org.apache.hudi.common.model.HoodieConsistentHashingMetadata; +import org.apache.hudi.common.model.HoodieTableType; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.ValidationUtils; +import org.apache.hudi.index.bucket.ConsistentBucketIdentifier; +import org.apache.hudi.index.bucket.ConsistentBucketIndexUtils; +import org.apache.hudi.index.bucket.HoodieSparkConsistentBucketIndex; +import org.apache.hudi.keygen.BuiltinKeyGenerator; +import org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory; +import org.apache.hudi.table.BulkInsertPartitioner; +import org.apache.hudi.table.ConsistentHashingBucketInsertPartitioner; +import org.apache.hudi.table.HoodieTable; + +import org.apache.spark.Partitioner; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import scala.Tuple2; + +/** + * Bulk_insert partitioner of Spark row using consistent hashing bucket index. + */ +public class ConsistentBucketIndexBulkInsertPartitionerWithRows +implements BulkInsertPartitioner>, ConsistentHashingBucketInsertPartitioner { + + private final HoodieTable table; + + private final String indexKeyFields; + + private final List fileIdPfxList = new ArrayList<>(); + private final Map> hashingChildrenNodes; + + private Map partitionToIdentifier; + + private final Option keyGeneratorOpt; + + private Map> partitionToFileIdPfxIdxMap; + + private final RowRecordKeyExtractor extractor; + + public ConsistentBucketIndexBulkInsertPartitionerWithRows(HoodieTable table, boolean populateMetaFields) { +this.indexKeyFields = table.getConfig().getBucketIndexHashField(); +this.table = table; +this.hashingChildrenNodes = new HashMap<>(); +if (!populateMetaFields) { + this.keyGeneratorOpt = HoodieSparkKeyGeneratorFactory.getKeyGenerator(table.getConfig().getProps()); +} else { + this.keyGeneratorOpt = Option.empty(); +} +this.extractor = RowRecordKeyExtractor.getRowRecordKeyExtractor(populateMetaFields, keyGeneratorOpt); + ValidationUtils.checkArgument(table.getMetaClient().getTableType().equals(HoodieTableType.MERGE_ON_READ), Review Comment: Yes, we do dual write during consistent hashing bucket index resizing but CoW table do not support writing logs. And It's a little bit hard to move it to parent class since the closest parent class for these two ConsistentHashingPartitioner is `BulkInsertPartitioner` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1649313876 Yeap, we ensured that has happened. In our internal version a rollback will be performed to remove all the files that was written before checkpoint. Afterwhich, a write will be performed again from the last successful checkpoint. I'll do a check on this again on the community's master version later in the week. Sorry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6588) Fix duplicate fileId on TM partial-failover and recovery
[ https://issues.apache.org/jira/browse/HUDI-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6588: - Labels: pull-request-available (was: ) > Fix duplicate fileId on TM partial-failover and recovery > > > Key: HUDI-6588 > URL: https://issues.apache.org/jira/browse/HUDI-6588 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] weimingdiit commented on a diff in pull request #9252: [HUDI-6500] Fix bug when Using the RuntimeReplaceable function in the…
weimingdiit commented on code in PR #9252: URL: https://github.com/apache/hudi/pull/9252#discussion_r1273127379 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala: ## @@ -391,63 +392,65 @@ case class ResolveImplementationsEarly() extends Rule[LogicalPlan] { case class ResolveImplementations() extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { -plan match { - // Convert to MergeIntoHoodieTableCommand - case mit@MatchMergeIntoTable(target@ResolvesToHudiTable(_), _, _) if mit.resolved => -MergeIntoHoodieTableCommand(mit.asInstanceOf[MergeIntoTable]) - - // Convert to UpdateHoodieTableCommand - case ut@UpdateTable(plan@ResolvesToHudiTable(_), _, _) if ut.resolved => -UpdateHoodieTableCommand(ut) - - // Convert to DeleteHoodieTableCommand - case dft@DeleteFromTable(plan@ResolvesToHudiTable(_), _) if dft.resolved => -DeleteHoodieTableCommand(dft) - - // Convert to CompactionHoodieTableCommand - case ct @ CompactionTable(plan @ ResolvesToHudiTable(table), operation, options) if ct.resolved => -CompactionHoodieTableCommand(table, operation, options) - - // Convert to CompactionHoodiePathCommand - case cp @ CompactionPath(path, operation, options) if cp.resolved => -CompactionHoodiePathCommand(path, operation, options) - - // Convert to CompactionShowOnTable - case csot @ CompactionShowOnTable(plan @ ResolvesToHudiTable(table), limit) if csot.resolved => -CompactionShowHoodieTableCommand(table, limit) - - // Convert to CompactionShowHoodiePathCommand - case csop @ CompactionShowOnPath(path, limit) if csop.resolved => -CompactionShowHoodiePathCommand(path, limit) - - // Convert to HoodieCallProcedureCommand - case c @ CallCommand(_, _) => -val procedure: Option[Procedure] = loadProcedure(c.name) -val input = buildProcedureArgs(c.args) -if (procedure.nonEmpty) { - CallProcedureHoodieCommand(procedure.get, input) -} else { - c -} - - // Convert to CreateIndexCommand - case ci @ CreateIndex(plan @ ResolvesToHudiTable(table), indexName, indexType, ignoreIfExists, columns, options, output) => -// TODO need to resolve columns -CreateIndexCommand(table, indexName, indexType, ignoreIfExists, columns, options, output) - - // Convert to DropIndexCommand - case di @ DropIndex(plan @ ResolvesToHudiTable(table), indexName, ignoreIfNotExists, output) if di.resolved => -DropIndexCommand(table, indexName, ignoreIfNotExists, output) - - // Convert to ShowIndexesCommand - case si @ ShowIndexes(plan @ ResolvesToHudiTable(table), output) if si.resolved => -ShowIndexesCommand(table, output) - - // Covert to RefreshCommand - case ri @ RefreshIndex(plan @ ResolvesToHudiTable(table), indexName, output) if ri.resolved => -RefreshIndexCommand(table, indexName, output) - - case _ => plan +AnalysisHelper.allowInvokingTransformsInAnalyzer { + plan match { +// Convert to MergeIntoHoodieTableCommand Review Comment: > And can you also check the test failures Ok, In my local env, ut can pass, I will take a closer look at the problem of ut failure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6588) Fix duplicate fileId on TM partial-failover and recovery
Danny Chen created HUDI-6588: Summary: Fix duplicate fileId on TM partial-failover and recovery Key: HUDI-6588 URL: https://issues.apache.org/jira/browse/HUDI-6588 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: Danny Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6588) Fix duplicate fileId on TM partial-failover and recovery
[ https://issues.apache.org/jira/browse/HUDI-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6588: - Fix Version/s: 0.14.0 > Fix duplicate fileId on TM partial-failover and recovery > > > Key: HUDI-6588 > URL: https://issues.apache.org/jira/browse/HUDI-6588 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Danny Chen >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)