[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive
xiarixiaoyao commented on code in PR #7385: URL: https://github.com/apache/hudi/pull/7385#discussion_r1058137576 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java: ## @@ -78,7 +79,14 @@ public class HMSDDLExecutor implements DDLExecutor { public HMSDDLExecutor(HiveSyncConfig syncConfig) throws HiveException, MetaException { this.syncConfig = syncConfig; this.databaseName = syncConfig.getStringOrDefault(META_SYNC_DATABASE_NAME); -this.client = Hive.get(syncConfig.getHiveConf()).getMSC(); +HiveConf hiveConf = syncConfig.getHiveConf(); +IMetaStoreClient tempMetaStoreClient; +try { + tempMetaStoreClient = ((Hive) Hive.class.getMethod("getWithoutRegisterFns", HiveConf.class).invoke(null, hiveConf)).getMSC(); +} catch (Exception ex) { Review Comment: NoSuchMethodException ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-4541) Flink job fails with column stats enabled in metadata table due to NotSerializableException
[ https://issues.apache.org/jira/browse/HUDI-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652424#comment-17652424 ] Alexander Trushev commented on HUDI-4541: - I guess this issue has already been fixed by https://issues.apache.org/jira/browse/HUDI-4548 > Flink job fails with column stats enabled in metadata table due to > NotSerializableException > > > Key: HUDI-4541 > URL: https://issues.apache.org/jira/browse/HUDI-4541 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > Attachments: Screen Shot 2022-08-04 at 17.10.05.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Run the following from the Flink quick start guide with metadata table, > column stats, and data skipping enabled > {code:java} > CREATE TABLE t1( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', -- this creates a MERGE_ON_READ table, by > default is COPY_ON_WRITE > 'metadata.enabled' = 'true', -- enables multi-modal index and metadata table > 'hoodie.metadata.index.column.stats.enable' = 'true', -- enables column > stats in metadata table > 'read.data.skipping.enabled' = 'true' -- enables data skipping > ); > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); {code} > !Screen Shot 2022-08-04 at 17.10.05.png|width=1130,height=463! > Exception: > {code:java} > 2022-08-04 17:04:41 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) > at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >
[jira] [Updated] (HUDI-5482) Nulls should be counted in the value count stats for mor table
[ https://issues.apache.org/jira/browse/HUDI-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5482: - Component/s: metadata > Nulls should be counted in the value count stats for mor table > -- > > Key: HUDI-5482 > URL: https://issues.apache.org/jira/browse/HUDI-5482 > Project: Apache Hudi > Issue Type: Bug > Components: core, metadata >Reporter: Danny Chen >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5482) Nulls should be counted in the value count stats for mor table
[ https://issues.apache.org/jira/browse/HUDI-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-5482. Assignee: Hui An Resolution: Fixed > Nulls should be counted in the value count stats for mor table > -- > > Key: HUDI-5482 > URL: https://issues.apache.org/jira/browse/HUDI-5482 > Project: Apache Hudi > Issue Type: Bug > Components: core, metadata >Reporter: Danny Chen >Assignee: Hui An >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] ZeyuQiu-Rinze commented on issue #7520: [SUPPORT] Hudi took a very long time in "Getting small files from partitions" stage
ZeyuQiu-Rinze commented on issue #7520: URL: https://github.com/apache/hudi/issues/7520#issuecomment-1366461385 Yes, as @yihua says, `users_activity_create_date` contains an actual timestamp. I change it to a date str and it work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ZeyuQiu-Rinze closed issue #7520: [SUPPORT] Hudi took a very long time in "Getting small files from partitions" stage
ZeyuQiu-Rinze closed issue #7520: [SUPPORT] Hudi took a very long time in "Getting small files from partitions" stage URL: https://github.com/apache/hudi/issues/7520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #7441: [HUDI-5378] Remove minlog.Log
xushiyan commented on PR #7441: URL: https://github.com/apache/hudi/pull/7441#issuecomment-1366461848 @XuQianJin-Stars compliance issue: merge commit message does not contain jira id or [MINOR] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
SteNicholas commented on PR #7568: URL: https://github.com/apache/hudi/pull/7568#issuecomment-1366462432 @danny0405, @zhuanshenbsj1, I have created this pull request to fix the problem mentioned in #7405 in which the implementation is a little complex. PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
danny0405 commented on code in PR #7568: URL: https://github.com/apache/hudi/pull/7568#discussion_r1058168598 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -487,7 +487,24 @@ public Option getEarliestCommitToRetain() { int hoursRetained = config.getCleanerHoursRetained(); if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS && commitTimeline.countInstants() > commitsRetained) { - earliestCommitToRetain = commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); //15 instants total, 10 commits to retain, this gives 6th instant in the list + Option earliestPendingCommits = hoodieTable.getMetaClient() + .getActiveTimeline() + .getCommitsTimeline() + .filter(s -> !s.isCompleted()).firstInstant(); + if (earliestPendingCommits.isPresent()) { +// Earliest commit to retain must not be later than the earliest pending commit +earliestCommitToRetain = +commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained).map(nthInstant -> { + if (nthInstant.compareTo(earliestPendingCommits.get()) <= 0) { Review Comment: Why we need this check ? Can the cleaner cleans an instant that is not complete ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7550: [MINOR] Avoid running tests as part of bundle uploads
hudi-bot commented on PR #7550: URL: https://github.com/apache/hudi/pull/7550#issuecomment-1366480173 ## CI report: * 144c9ef6e5f5b1ee2f0bbdd8a7684a7cdb52a352 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13945) * 00ea42c662445ccce81e186d9f857564c2ce5c7f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7550: [MINOR] Avoid running tests as part of bundle uploads
hudi-bot commented on PR #7550: URL: https://github.com/apache/hudi/pull/7550#issuecomment-1366484135 ## CI report: * 144c9ef6e5f5b1ee2f0bbdd8a7684a7cdb52a352 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13945) * 00ea42c662445ccce81e186d9f857564c2ce5c7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14009) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
SteNicholas commented on code in PR #7568: URL: https://github.com/apache/hudi/pull/7568#discussion_r1058184448 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -487,7 +487,24 @@ public Option getEarliestCommitToRetain() { int hoursRetained = config.getCleanerHoursRetained(); if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS && commitTimeline.countInstants() > commitsRetained) { - earliestCommitToRetain = commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); //15 instants total, 10 commits to retain, this gives 6th instant in the list + Option earliestPendingCommits = hoodieTable.getMetaClient() + .getActiveTimeline() + .getCommitsTimeline() + .filter(s -> !s.isCompleted()).firstInstant(); + if (earliestPendingCommits.isPresent()) { +// Earliest commit to retain must not be later than the earliest pending commit +earliestCommitToRetain = +commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained).map(nthInstant -> { + if (nthInstant.compareTo(earliestPendingCommits.get()) <= 0) { Review Comment: @danny0405, this check is to ensure the earliest commit to retain must not be later than the earliest pending commit. The cleaner couldn't clean uncompleted instant and should keep the linear cleaning for incremental clean mode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
danny0405 commented on code in PR #7568: URL: https://github.com/apache/hudi/pull/7568#discussion_r1058188680 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -487,7 +487,24 @@ public Option getEarliestCommitToRetain() { int hoursRetained = config.getCleanerHoursRetained(); if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS && commitTimeline.countInstants() > commitsRetained) { - earliestCommitToRetain = commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); //15 instants total, 10 commits to retain, this gives 6th instant in the list + Option earliestPendingCommits = hoodieTable.getMetaClient() + .getActiveTimeline() + .getCommitsTimeline() + .filter(s -> !s.isCompleted()).firstInstant(); + if (earliestPendingCommits.isPresent()) { +// Earliest commit to retain must not be later than the earliest pending commit +earliestCommitToRetain = +commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained).map(nthInstant -> { + if (nthInstant.compareTo(earliestPendingCommits.get()) <= 0) { Review Comment: Oh, i see, this change has effect for incremental cleaning, for normal cleaning, do we need this change ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
danny0405 commented on code in PR #7568: URL: https://github.com/apache/hudi/pull/7568#discussion_r1058188976 ## hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java: ## @@ -224,4 +225,36 @@ public static List getPendingClusteringInstantTimes(HoodieTableMe public static boolean isPendingClusteringInstant(HoodieTableMetaClient metaClient, HoodieInstant instant) { return getClusteringPlan(metaClient, instant).isPresent(); } + + /** + * Checks whether the latest clustering instant has a subsequent cleaning action. Returns + * the clustering instant if there is such cleaning action or empty. + * + * @param activeTimeline The active timeline + * @return the oldest instant to retain for clustering + */ + public static Option getOldestInstantToRetainForClustering(HoodieActiveTimeline activeTimeline) + throws IOException { +Option cleanInstantOpt = +activeTimeline.getCleanerTimeline().filter(instant -> !instant.isCompleted()).firstInstant(); +if (cleanInstantOpt.isPresent()) { + // The first clustering instant of which timestamp is greater than or equal to the earliest commit to retain of + // the clean metadata. Review Comment: There is no need to do this check if there is no clustering instant on the timeline at all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
SteNicholas commented on code in PR #7568: URL: https://github.com/apache/hudi/pull/7568#discussion_r1058192365 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -487,7 +487,24 @@ public Option getEarliestCommitToRetain() { int hoursRetained = config.getCleanerHoursRetained(); if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS && commitTimeline.countInstants() > commitsRetained) { - earliestCommitToRetain = commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); //15 instants total, 10 commits to retain, this gives 6th instant in the list + Option earliestPendingCommits = hoodieTable.getMetaClient() + .getActiveTimeline() + .getCommitsTimeline() + .filter(s -> !s.isCompleted()).firstInstant(); + if (earliestPendingCommits.isPresent()) { +// Earliest commit to retain must not be later than the earliest pending commit +earliestCommitToRetain = +commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained).map(nthInstant -> { + if (nthInstant.compareTo(earliestPendingCommits.get()) <= 0) { Review Comment: @danny0405, the `getPartitionPathsForFullCleaning` doesn't invoke this method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
SteNicholas commented on code in PR #7568: URL: https://github.com/apache/hudi/pull/7568#discussion_r1058196194 ## hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java: ## @@ -224,4 +225,36 @@ public static List getPendingClusteringInstantTimes(HoodieTableMe public static boolean isPendingClusteringInstant(HoodieTableMetaClient metaClient, HoodieInstant instant) { return getClusteringPlan(metaClient, instant).isPresent(); } + + /** + * Checks whether the latest clustering instant has a subsequent cleaning action. Returns + * the clustering instant if there is such cleaning action or empty. + * + * @param activeTimeline The active timeline + * @return the oldest instant to retain for clustering + */ + public static Option getOldestInstantToRetainForClustering(HoodieActiveTimeline activeTimeline) + throws IOException { +Option cleanInstantOpt = +activeTimeline.getCleanerTimeline().filter(instant -> !instant.isCompleted()).firstInstant(); +if (cleanInstantOpt.isPresent()) { + // The first clustering instant of which timestamp is greater than or equal to the earliest commit to retain of + // the clean metadata. Review Comment: @danny0405, +1, I have added the check whether the completed replace timeline is empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2508) Build GA for the dependeny diff check workflow
[ https://issues.apache.org/jira/browse/HUDI-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2508: - Description: Configure a GitHub actions job that, for each PR, - deploys maven artifacts (as per [release deploy script|https://github.com/apache/hudi/blob/master/scripts/release/deploy_staging_jars.sh]) to a local temporary maven repo (needs to research how to setup a local maven repo) - run scripts as per https://issues.apache.org/jira/browse/HUDI-5475 and https://github.com/xushiyan/hudi/pull/15/files to generate dependency trees in a temp dir - compare existing dependency trees with the generated ones - if there is any difference, fail the job so that the author will need to manually generate the dep trees and commit the updates in the PR itself This is to enforce dependency governance. This job should only run for PRs. was: Configure a GitHub actions job that, for each PR, - deploys maven artifacts (as per release deploy script) to a local temporary maven repo (needs to research how to setup a local maven repo) - run scripts as per https://issues.apache.org/jira/browse/HUDI-5475 and https://github.com/xushiyan/hudi/pull/15/files to generate dependency trees in a temp dir - compare existing dependency trees with the generated ones - if there is any difference, fail the job so that the author will need to manually generate the dep trees and commit the updates in the PR itself This is to enforce dependency governance. This job should only run for PRs. > Build GA for the dependeny diff check workflow > -- > > Key: HUDI-2508 > URL: https://issues.apache.org/jira/browse/HUDI-2508 > Project: Apache Hudi > Issue Type: Sub-task > Components: Usability >Reporter: vinoyang >Assignee: Lokesh Jain >Priority: Major > > Configure a GitHub actions job that, for each PR, > - deploys maven artifacts (as per [release deploy > script|https://github.com/apache/hudi/blob/master/scripts/release/deploy_staging_jars.sh]) > to a local temporary maven repo (needs to research how to setup a local > maven repo) > - run scripts as per https://issues.apache.org/jira/browse/HUDI-5475 and > https://github.com/xushiyan/hudi/pull/15/files to generate dependency trees > in a temp dir > - compare existing dependency trees with the generated ones - if there is any > difference, fail the job so that the author will need to manually generate > the dep trees and commit the updates in the PR itself > This is to enforce dependency governance. > This job should only run for PRs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-2508) Build GA for the dependeny diff check workflow
[ https://issues.apache.org/jira/browse/HUDI-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2508: - Priority: Critical (was: Major) > Build GA for the dependeny diff check workflow > -- > > Key: HUDI-2508 > URL: https://issues.apache.org/jira/browse/HUDI-2508 > Project: Apache Hudi > Issue Type: Sub-task > Components: Usability >Reporter: vinoyang >Assignee: Lokesh Jain >Priority: Critical > > Configure a GitHub actions job that, for each PR, > - deploys maven artifacts (as per [release deploy > script|https://github.com/apache/hudi/blob/master/scripts/release/deploy_staging_jars.sh]) > to a local temporary maven repo (needs to research how to setup a local > maven repo) > - run scripts as per https://issues.apache.org/jira/browse/HUDI-5475 and > https://github.com/xushiyan/hudi/pull/15/files to generate dependency trees > in a temp dir > - compare existing dependency trees with the generated ones - if there is any > difference, fail the job so that the author will need to manually generate > the dep trees and commit the updates in the PR itself > This is to enforce dependency governance. > This job should only run for PRs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] perfectcw opened a new issue, #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time
perfectcw opened a new issue, #7570: URL: https://github.com/apache/hudi/issues/7570 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Issue: Lost some partitions when sync hive Background: We have a data ingest pipeline, which ingest about 500 partitions per day. And the pipeline will submit multiple commits at the same time to insert different partitions. The sync hive function is enabled for each commit. _**And after all of commits succeed, we found that some partitions are missing in the hive table.**_ The following is the analysis of log and hoodie files: For the hoodie files, shows six of the commits. Then it was found that only _20221227042858342_ & _20221227042906103_ two commits were synced to hive, and the rest of the partitions did not appear in hive table. I think the root cause is because of the mechanism of sync hive. When hudi sync hive after the commit is succeed, it will first get the latest synced commit, and then use the timestamp of this commit as a benchmark to check whether the new column and partition are added to the commit behind it, and if so, it will sync to hive. So if a commit A is submmitted before this latest synced commit B, but succeeds after commit B, so it will not be synced hive. Because of commit A's timestamp < commit B's timestamp, it won't be detected. Here is the log of commit 20221227042859357, we can see it get latest synced commit is 20221227042906103, which commit after 20221227042859357 itself. So the partition inserted by 20221227042859357 commit has not been detected, and the partition that needs to be synced is 0. `2022-12-27 04:30:16,449 INFO hive.metastore: Opened a connection to metastore, current connections: 1 2022-12-27 04:30:16,465 INFO hive.metastore: Connected to metastore. 2022-12-27 04:30:16,676 INFO hive.HiveSyncTool: Syncing target hoodie table with hive table(forecast_agg_hoover_multi_publish). Hive metastore URL :jdbc:hive2://hs2.presto.stg.aws.fwmrm.net:1/;auth=noSasl, basePath :s3a://fw1-stg-af-dip/hudi/forecast_agg_hoover_multi_publish 2022-12-27 04:30:16,676 INFO hive.HiveSyncTool: Trying to sync hoodie table forecast_agg_hoover_multi_publish with base path s3a://fw1-stg-af-dip/hudi/forecast_agg_hoover_multi_publish of type COPY_ON_WRITE 2022-12-27 04:30:16,815 INFO table.TableSchemaResolver: Reading schema from s3a://fw1-stg-af-dip/hudi/forecast_agg_hoover_multi_publish/20221227/0/20230108/9820ce59-03a8-4efa-8978-3c3cf61298d8-0_1-11-3890_20221227042906103.parquet 2022-12-27 04:30:16,904 INFO s3a.S3AInputStream: Switching to Random IO seek policy 2022-12-27 04:30:17,477 INFO hive.HiveSyncTool: No Schema difference for forecast_agg_hoover_multi_publish 2022-12-27 04:30:17,477 INFO hive.HiveSyncTool: Schema sync complete. Syncing partitions for forecast_agg_hoover_multi_publish 2022-12-27 04:30:17,525 INFO hive.HiveSyncTool: Last commit time synced was found to be 20221227042906103 2022-12-27 04:30:17,525 INFO common.AbstractSyncHoodieClient: Last commit time synced is 20221227042906103, Getting commits since then 2022-12-27 04:30:17,527 INFO hive.HiveSyncTool: Storage partitions scan complete. Found 0 2022-12-27 04:30:17,697 INFO hive.HiveSyncTool: Sync complete for forecast_agg_hoover_multi_publish` `order by time name type last modify time partition if exist in hive 20221227042855832.commit.requested requested 2022-12-27 pm12:28:59 CST 20221227/0/20230101 no 20221227042858342.commit.requested requested 2022-12-27 pm12:29:00 CST 20221227/0/20230106yes 20221227042858801.commit.requested requested 2022-12-27 pm12:29:01 CST20221227/0/20230107no 20221227042859357.commit.requested requested 2022-12-27 pm12:29:01 CST20221227/0/20221229no 20221227042901993.commit.requested requested 2022-12-27 pm12:29:04 CST 20221227/0/20230103no 20221227042906103.commit.requested requested 2022-12-27 pm12:29:08 CST 20221227/0/20230108yes ... 20221227042855832.inflightinflight 2022-12-27 pm12:29:16 CST 20221227042858342.inflightinflight 2022-12-27 pm12:29:16 CST 20221227042858801.inflightinflight 2022-12-27 pm12:29:17 CST 20221227042859357.inflightinflight 2022-
[GitHub] [hudi] perfectcw closed issue #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time
perfectcw closed issue #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time URL: https://github.com/apache/hudi/issues/7570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table
hudi-bot commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366542661 ## CI report: * e58d4db34dea4225808760126be11d3c559da896 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13711) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13712) * a65e5950a53b6b655c385d002170c0900a6303d8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
hudi-bot commented on PR #7568: URL: https://github.com/apache/hudi/pull/7568#issuecomment-1366542976 ## CI report: * 93c16ab8c496989a928304cc2fe91e38ca678147 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14008) * cc53c8c67d4e86b4be6dafdf25cafae87c1ed152 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table
hudi-bot commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366546216 ## CI report: * e58d4db34dea4225808760126be11d3c559da896 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13711) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13712) * a65e5950a53b6b655c385d002170c0900a6303d8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14010) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
hudi-bot commented on PR #7568: URL: https://github.com/apache/hudi/pull/7568#issuecomment-1366546555 ## CI report: * 93c16ab8c496989a928304cc2fe91e38ca678147 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14008) * cc53c8c67d4e86b4be6dafdf25cafae87c1ed152 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14011) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lokeshj1703 commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables
lokeshj1703 commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1366554885 @soumilshah1995 We are still trying to root cause this w.r.t AWS. Will update here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs opened a new pull request, #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
boneanxs opened a new pull request, #7571: URL: https://github.com/apache/hudi/pull/7571 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ In test `testFileGroupLookUpManyEntriesWithSameStartValue`, before the fix, the endKey could be large than 1000, say if endKey is 1024, KeyRangeNode use String to compare the value, so "1024" could be smaller than "2xx", causing the test failure. here we don't allow endKey to exceed 1000 to fix the issue. ### Impact _Describe any public API or user-facing feature change or any performance impact._ None ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4710) Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
[ https://issues.apache.org/jira/browse/HUDI-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4710: - Labels: pull-request-available (was: ) > Fix flaky: > TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue > -- > > Key: HUDI-4710 > URL: https://issues.apache.org/jira/browse/HUDI-4710 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Instance occurance: > Aug 24th: > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10923/logs/22] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table
hudi-bot commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366590514 ## CI report: * a65e5950a53b6b655c385d002170c0900a6303d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14010) * f10a71c844934f9682391986f2cbe69566179341 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7403: [HUDI-5343] HoodieFlinkStreamer supports async clustering for append mode
hudi-bot commented on PR #7403: URL: https://github.com/apache/hudi/pull/7403#issuecomment-1366590587 ## CI report: * a0317ba801314bd06d99727b2bba0c383ed95cdf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14007) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
hudi-bot commented on PR #7571: URL: https://github.com/apache/hudi/pull/7571#issuecomment-1366590844 ## CI report: * 2baf115f1b4d1474fe8161919797ae68f979fc80 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table
hudi-bot commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366593631 ## CI report: * a65e5950a53b6b655c385d002170c0900a6303d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14010) * f10a71c844934f9682391986f2cbe69566179341 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14012) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
hudi-bot commented on PR #7571: URL: https://github.com/apache/hudi/pull/7571#issuecomment-1366593911 ## CI report: * 2baf115f1b4d1474fe8161919797ae68f979fc80 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14013) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table
hudi-bot commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366597097 ## CI report: * f10a71c844934f9682391986f2cbe69566179341 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14012) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] cxzl25 commented on a diff in pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive
cxzl25 commented on code in PR #7385: URL: https://github.com/apache/hudi/pull/7385#discussion_r1058301005 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java: ## @@ -78,7 +79,14 @@ public class HMSDDLExecutor implements DDLExecutor { public HMSDDLExecutor(HiveSyncConfig syncConfig) throws HiveException, MetaException { this.syncConfig = syncConfig; this.databaseName = syncConfig.getStringOrDefault(META_SYNC_DATABASE_NAME); -this.client = Hive.get(syncConfig.getHiveConf()).getMSC(); +HiveConf hiveConf = syncConfig.getHiveConf(); +IMetaStoreClient tempMetaStoreClient; +try { + tempMetaStoreClient = ((Hive) Hive.class.getMethod("getWithoutRegisterFns", HiveConf.class).invoke(null, hiveConf)).getMSC(); +} catch (Exception ex) { Review Comment: Because invoke has three checked exceptions ```java } catch (NoSuchMethodException | IllegalAccessException | IllegalArgumentException | InvocationTargetException ex) { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
yuzhaojing commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058027049 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.config; + +import org.apache.hudi.common.model.ActionType; + +import javax.annotation.concurrent.Immutable; + +import java.util.Properties; + +/** + * Configurations used by the Hudi Table Service Manager. + */ +@Immutable +@ConfigClassProperty(name = "Table Service Manager Configs", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "Configurations used by the Hudi Table Service Manager.") +public class HoodieTableServiceManagerConfig extends HoodieConfig { + + public static final String TABLE_SERVICE_MANAGER_PREFIX = "hoodie.table.service.manager"; + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable") + .defaultValue(false) + .withDocumentation("Use table manager service to execute table service"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris") + .defaultValue("http://localhost:9091";) + .withDocumentation("Table service manager uris"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions") + .defaultValue("") + .withDocumentation("Which action deploy on table service manager such as compaction:clean, default null"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username") + .defaultValue("default") + .withDocumentation("The user name to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue") + .defaultValue("default") + .withDocumentation("The queue to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource") + .defaultValue("4g:4g") + .withDocumentation("The resource to deploy for table service of this table, default driver 4g, executor 4g"); Review Comment: Spark engine and flink engine can share this configuration, so I don't think there is any need to make a distinction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on pull request #7403: [HUDI-5343] HoodieFlinkStreamer supports async clustering for append mode
leesf commented on PR #7403: URL: https://github.com/apache/hudi/pull/7403#issuecomment-1366608450 merging as the flink module CI success. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf merged pull request #7403: [HUDI-5343] HoodieFlinkStreamer supports async clustering for append mode
leesf merged PR #7403: URL: https://github.com/apache/hudi/pull/7403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (bd57282f248 -> f2b2ec9539d)
This is an automated email from the ASF dual-hosted git repository. leesf pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from bd57282f248 [HUDI-5482] Nulls should be counted in the value count stats for mor table (#7482) add f2b2ec9539d [HUDI-5343] HoodieFlinkStreamer supports async clustering for append mode (#7403) No new revisions were added by this update. Summary of changes: .../sink/clustering/FlinkClusteringConfig.java | 37 +++ .../hudi/sink/compact/FlinkCompactionConfig.java | 30 ++-- .../apache/hudi/streamer/FlinkStreamerConfig.java | 53 -- .../apache/hudi/streamer/HoodieFlinkStreamer.java | 21 +++-- 4 files changed, 95 insertions(+), 46 deletions(-)
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
yuzhaojing commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058305796 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.config; + +import org.apache.hudi.common.model.ActionType; + +import javax.annotation.concurrent.Immutable; + +import java.util.Properties; + +/** + * Configurations used by the Hudi Table Service Manager. + */ +@Immutable +@ConfigClassProperty(name = "Table Service Manager Configs", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "Configurations used by the Hudi Table Service Manager.") +public class HoodieTableServiceManagerConfig extends HoodieConfig { + + public static final String TABLE_SERVICE_MANAGER_PREFIX = "hoodie.table.service.manager"; + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable") + .defaultValue(false) + .withDocumentation("Use table manager service to execute table service"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris") + .defaultValue("http://localhost:9091";) + .withDocumentation("Table service manager uris"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions") + .defaultValue("") + .withDocumentation("Which action deploy on table service manager such as compaction:clean, default null"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username") + .defaultValue("default") + .withDocumentation("The user name to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue") + .defaultValue("default") + .withDocumentation("The queue to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource") + .defaultValue("4g:4g") + .withDocumentation("The resource to deploy for table service of this table, default driver 4g, executor 4g"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_PARALLELISM = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.parallelism") + .defaultValue(100) + .withDocumentation("The max parallelism to deploy for table service of this table, default 100"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_EXECUTION_ENGINE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".execution.engine") + .defaultValue("spark") + .withDocumentation("The execution engine to deploy for table service of this table, default spark"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_EXTRA_PARAMS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.extra.params") + .defaultValue("") + .withDocumentation("The extra params to deploy for table service of this table, split by ';'"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_TIMEOUT = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".timeout") + .defaultValue(300) + .withDocumentation("Connection timeout for client"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_RETRIES = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".connect.retries") + .defaultValue(3) + .withDocumentation("Number of retries while opening a connection to table service manager"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_RETRY_DELAY = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".connect.retry.delay") + .defaultValue(1) + .withDocumentation("Number of seconds for the client to wait between consecutive connection
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
yuzhaojing commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058305966 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.config; + +import org.apache.hudi.common.model.ActionType; + +import javax.annotation.concurrent.Immutable; + +import java.util.Properties; + +/** + * Configurations used by the Hudi Table Service Manager. + */ +@Immutable +@ConfigClassProperty(name = "Table Service Manager Configs", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "Configurations used by the Hudi Table Service Manager.") +public class HoodieTableServiceManagerConfig extends HoodieConfig { + + public static final String TABLE_SERVICE_MANAGER_PREFIX = "hoodie.table.service.manager"; + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable") + .defaultValue(false) + .withDocumentation("Use table manager service to execute table service"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris") + .defaultValue("http://localhost:9091";) + .withDocumentation("Table service manager uris"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions") + .defaultValue("") + .withDocumentation("Which action deploy on table service manager such as compaction:clean, default null"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username") + .defaultValue("default") + .withDocumentation("The user name to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue") + .defaultValue("default") + .withDocumentation("The queue to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource") + .defaultValue("4g:4g") + .withDocumentation("The resource to deploy for table service of this table, default driver 4g, executor 4g"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_PARALLELISM = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.parallelism") + .defaultValue(100) + .withDocumentation("The max parallelism to deploy for table service of this table, default 100"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_EXECUTION_ENGINE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".execution.engine") + .defaultValue("spark") + .withDocumentation("The execution engine to deploy for table service of this table, default spark"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_EXTRA_PARAMS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.extra.params") + .defaultValue("") + .withDocumentation("The extra params to deploy for table service of this table, split by ';'"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_TIMEOUT = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".timeout") + .defaultValue(300) + .withDocumentation("Connection timeout for client"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_RETRIES = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".connect.retries") + .defaultValue(3) + .withDocumentation("Number of retries while opening a connection to table service manager"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_RETRY_DELAY = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".connect.retry.delay") + .defaultValue(1) + .withDocumentation("Number of seconds for the client to wait between consecutive connection
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
yuzhaojing commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058314660 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTableServiceManagerClient.java: ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.client; + +import org.apache.hudi.common.config.HoodieTableServiceManagerConfig; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.timeline.HoodieInstant; +import org.apache.hudi.common.util.ClusteringUtils; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.RetryHelper; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.ValidationUtils; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.exception.HoodieRemoteException; + +import org.apache.http.client.fluent.Request; +import org.apache.http.client.utils.URIBuilder; +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.Logger; + +import java.io.IOException; +import java.net.URI; +import java.net.URISyntaxException; +import java.util.HashMap; +import java.util.Map; + +/** + * Client which send the table service instants to the table service manager. + */ +public class HoodieTableServiceManagerClient { + + /** + * Rollback commands, that trigger a specific handling for rollback. + */ Review Comment: Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7550: [MINOR] Avoid running tests as part of bundle uploads
hudi-bot commented on PR #7550: URL: https://github.com/apache/hudi/pull/7550#issuecomment-1366639613 ## CI report: * 00ea42c662445ccce81e186d9f857564c2ce5c7f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14009) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit
hudi-bot commented on PR #7568: URL: https://github.com/apache/hudi/pull/7568#issuecomment-1366639697 ## CI report: * cc53c8c67d4e86b4be6dafdf25cafae87c1ed152 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14011) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager
hudi-bot commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366643203 ## CI report: * 64ecea100e226b7fd539cab05c03bc9902e36db1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive
hudi-bot commented on PR #7385: URL: https://github.com/apache/hudi/pull/7385#issuecomment-1366643860 ## CI report: * 315fbe3897a36665ddad6b0a0d723bf7db0b9e5c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13477) * 9882c15708236cd4b66a9c54329f055db846ade8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager
hudi-bot commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366647277 ## CI report: * 64ecea100e226b7fd539cab05c03bc9902e36db1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14015) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive
hudi-bot commented on PR #7385: URL: https://github.com/apache/hudi/pull/7385#issuecomment-1366647737 ## CI report: * 315fbe3897a36665ddad6b0a0d723bf7db0b9e5c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13477) * 9882c15708236cd4b66a9c54329f055db846ade8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14016) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] viverlxl commented on issue #7162: [SUPPORT] Flink stream api(HoodieFlinkStreamer) write data to hudi create much rollbackfile
viverlxl commented on issue #7162: URL: https://github.com/apache/hudi/issues/7162#issuecomment-1366653860 @yihua Yes,this problem has been solved. will created new HoodieClient in OperatorCoordinator、StreamWriteFunction compactFunction when the schema change, also sync new schema to hive... we work in pro env, but muti diff talbe write to hudi will cause performance problems -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] minihippo opened a new pull request, #7572: Make retryhelper more suitable for common use.
minihippo opened a new pull request, #7572: URL: https://github.com/apache/hudi/pull/7572 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5483) Make RetryHelper more suitable for common use
XiaoyuGeng created HUDI-5483: Summary: Make RetryHelper more suitable for common use Key: HUDI-5483 URL: https://issues.apache.org/jira/browse/HUDI-5483 Project: Apache Hudi Issue Type: Improvement Components: core Reporter: XiaoyuGeng Assignee: XiaoyuGeng Fix For: 0.13.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5484) Avoid using GenericRecord in ColumnStatMetadata
dzcxzl created HUDI-5484: Summary: Avoid using GenericRecord in ColumnStatMetadata Key: HUDI-5484 URL: https://issues.apache.org/jira/browse/HUDI-5484 Project: Apache Hudi Issue Type: Bug Reporter: dzcxzl {code:java} org.apache.hudi.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apache.avro.Schema$RecordSchema) schema (org.apache.avro.generic.GenericData$Record) maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats) columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45) at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101) at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54) at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257) at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68) at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231) at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5484) Avoid using GenericRecord in ColumnStatMetadata
[ https://issues.apache.org/jira/browse/HUDI-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated HUDI-5484: - Description: {code:java} org.apache.hudi.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apache.avro.Schema$RecordSchema) schema (org.apache.avro.generic.GenericData$Record) maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats) columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45) at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101) at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54) at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257) at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68) at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231) at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125){code} was: {code:java} org.apache.hudi.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apache.avro.Schema$RecordSchema) schema (org.apache.avro.generic.GenericData$Record) maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats) columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45) at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101) at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199) at org.apache.hudi.co
[GitHub] [hudi] cxzl25 opened a new pull request, #7573: [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata
cxzl25 opened a new pull request, #7573: URL: https://github.com/apache/hudi/pull/7573 ### Change Logs Avoid using GenericRecord in ColumnStatMetadata. `HoodieMetadataPayload` is constructed using `GenericRecord` with reflection, and `columnStatMetadata` stores `minValue` and `maxValue`, both of which are `GenericRecord` types. Once spill is generated, kryo deserialization fails. Write fail log ```java org.apache.hudi.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apache.avro.Schema$RecordSchema) schema (org.apache.avro.generic.GenericData$Record) maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats) columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45) at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101) at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54) at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257) at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68) at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231) at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) ``` construct HoodieMetadataPayload ```java at org.apache.hudi.metadata.HoodieMetadataPayload.(HoodieMetadataPayload.java:233) at org.apache.hudi.metadata.HoodieMetadataPayload.(HoodieMetadataPayload.java:182) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hudi.common.util.HoodieRecordUtils.loadPayload(HoodieRecordUtils.java:99) at org.apache.hudi.common.util.SpillableMapUtils.convertToHoodieRecordPayload(SpillableMapUtils.java:140) at org.apache.hudi.avro.HoodieAvroUtils.createHoodieRecordFromAvro(HoodieAvroUtils.java:1078) at org.apache.hudi.common.model.HoodieAvroIndexedRecord.wrapIntoHoodieRecordPayloadWithParams(HoodieAvroIndexedRecord.java:168) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:644) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(Abstr
[jira] [Updated] (HUDI-5484) Avoid using GenericRecord in ColumnStatMetadata
[ https://issues.apache.org/jira/browse/HUDI-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5484: - Labels: pull-request-available (was: ) > Avoid using GenericRecord in ColumnStatMetadata > --- > > Key: HUDI-5484 > URL: https://issues.apache.org/jira/browse/HUDI-5484 > Project: Apache Hudi > Issue Type: Bug >Reporter: dzcxzl >Priority: Critical > Labels: pull-request-available > > > > {code:java} > org.apache.hudi.com.esotericsoftware.kryo.KryoException: > java.lang.UnsupportedOperationException > Serialization trace: > reserved (org.apache.avro.Schema$Field) > fieldMap (org.apache.avro.Schema$RecordSchema) > schema (org.apache.avro.generic.GenericData$Record) > maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats) > columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload) > at > org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) > > at > org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232) > at > org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45) > at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339) > at > org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520) > at > org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512) > at > org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101) > at > org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75) > at > org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210) > at > org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203) > at > org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199) > at > org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68) > at > org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195) > at > org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54) > at > org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257) > at > org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68) > at > org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231) > at > org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)Caused > by: java.lang.UnsupportedOperationException > at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055) > at > org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) > at > org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) > at > org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
hudi-bot commented on PR #7571: URL: https://github.com/apache/hudi/pull/7571#issuecomment-1366696718 ## CI report: * 2baf115f1b4d1474fe8161919797ae68f979fc80 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14013) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`
hudi-bot commented on PR #5064: URL: https://github.com/apache/hudi/pull/5064#issuecomment-1366698891 ## CI report: * 53aa21bf23d2f8b0404743e6d016cfb2fac444f7 UNKNOWN * 07a3ea3956e5ce02a33a55eae4a0339796275f9d UNKNOWN * 810af96ee856bd94cfc82b01b67765a735f29c44 UNKNOWN * 1c1faa8f063f98023921e4f6015ac2f28adde2b1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13967) * a616f0831d6dfd7a55b5f52331e43d87374e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.
hudi-bot commented on PR #7572: URL: https://github.com/apache/hudi/pull/7572#issuecomment-1366700583 ## CI report: * 5a4fae5d3d42446c19894406dc53a4d7327a9b48 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata
hudi-bot commented on PR #7573: URL: https://github.com/apache/hudi/pull/7573#issuecomment-1366700614 ## CI report: * bfec3be3263c21f7533fc16e19eec3598617a5bf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5483) Make RetryHelper more suitable for common use
[ https://issues.apache.org/jira/browse/HUDI-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5483: - Labels: pull-request-available (was: ) > Make RetryHelper more suitable for common use > - > > Key: HUDI-5483 > URL: https://issues.apache.org/jira/browse/HUDI-5483 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: XiaoyuGeng >Assignee: XiaoyuGeng >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`
hudi-bot commented on PR #5064: URL: https://github.com/apache/hudi/pull/5064#issuecomment-1366702795 ## CI report: * 53aa21bf23d2f8b0404743e6d016cfb2fac444f7 UNKNOWN * 07a3ea3956e5ce02a33a55eae4a0339796275f9d UNKNOWN * 810af96ee856bd94cfc82b01b67765a735f29c44 UNKNOWN * 1c1faa8f063f98023921e4f6015ac2f28adde2b1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13967) * a616f0831d6dfd7a55b5f52331e43d87374e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14017) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.
hudi-bot commented on PR #7572: URL: https://github.com/apache/hudi/pull/7572#issuecomment-1366704389 ## CI report: * 5a4fae5d3d42446c19894406dc53a4d7327a9b48 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14018) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata
hudi-bot commented on PR #7573: URL: https://github.com/apache/hudi/pull/7573#issuecomment-1366704453 ## CI report: * bfec3be3263c21f7533fc16e19eec3598617a5bf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14019) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables
soumilshah1995 commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1366712825 Roger that captain :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] maddy2u commented on issue #4502: [QUESTION] Athena Hudi Time Travel Queries
maddy2u commented on issue #4502: URL: https://github.com/apache/hudi/issues/4502#issuecomment-1366718510 Hi, Is this accessible today via AWS Athena? Any one who has tried it ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 commented on issue #7459: [SUPPORT] Glue 3.0 with HUDI marketplace Connector
soumilshah1995 commented on issue #7459: URL: https://github.com/apache/hudi/issues/7459#issuecomment-1366746652 closing this ticket as its resolved after speaking to support :D cheers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 closed issue #7459: [SUPPORT] Glue 3.0 with HUDI marketplace Connector
soumilshah1995 closed issue #7459: [SUPPORT] Glue 3.0 with HUDI marketplace Connector URL: https://github.com/apache/hudi/issues/7459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`
hudi-bot commented on PR #5064: URL: https://github.com/apache/hudi/pull/5064#issuecomment-1366754283 ## CI report: * 53aa21bf23d2f8b0404743e6d016cfb2fac444f7 UNKNOWN * 07a3ea3956e5ce02a33a55eae4a0339796275f9d UNKNOWN * 810af96ee856bd94cfc82b01b67765a735f29c44 UNKNOWN * a616f0831d6dfd7a55b5f52331e43d87374e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14017) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager
hudi-bot commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366755308 ## CI report: * 64ecea100e226b7fd539cab05c03bc9902e36db1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14015) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.
xushiyan commented on code in PR #7572: URL: https://github.com/apache/hudi/pull/7572#discussion_r1058438675 ## hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java: ## @@ -36,9 +36,10 @@ * * @param Type of return value for checked function. */ -public class RetryHelper implements Serializable { +public class RetryHelper implements Serializable { private static final Logger LOG = LogManager.getLogger(RetryHelper.class); - private transient CheckedFunction func; + private static final List> RETRY_EXCEPTION_CLASS = Arrays.asList(IOException.class, RuntimeException.class); Review Comment: better name: `DEFAULT_RETRY_EXCEPTIONS` ## hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java: ## @@ -120,7 +118,7 @@ private boolean checkIfExceptionInRetryList(Exception e) { // if users didn't set hoodie.filesystem.operation.retry.exceptions // we will retry all the IOException and RuntimeException -if (retryExceptionsClasses.isEmpty()) { +if (retryExceptionsClasses.equals(RETRY_EXCEPTION_CLASS)) { return true; } Review Comment: but this check being true does not mean `e` is in the list, does it? this check looks redundant now, given you've set a default list of exceptions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
xushiyan commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058454424 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.config; + +import org.apache.hudi.common.model.ActionType; + +import javax.annotation.concurrent.Immutable; + +import java.util.Properties; + +/** + * Configurations used by the Hudi Table Service Manager. + */ +@Immutable +@ConfigClassProperty(name = "Table Service Manager Configs", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "Configurations used by the Hudi Table Service Manager.") +public class HoodieTableServiceManagerConfig extends HoodieConfig { + + public static final String TABLE_SERVICE_MANAGER_PREFIX = "hoodie.table.service.manager"; + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable") + .defaultValue(false) + .withDocumentation("Use table manager service to execute table service"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris") + .defaultValue("http://localhost:9091";) + .withDocumentation("Table service manager uris"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions") + .defaultValue("") + .withDocumentation("Which action deploy on table service manager such as compaction:clean, default null"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username") + .defaultValue("default") + .withDocumentation("The user name to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue") + .defaultValue("default") + .withDocumentation("The queue to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource") + .defaultValue("4g:4g") + .withDocumentation("The resource to deploy for table service of this table, default driver 4g, executor 4g"); Review Comment: we also support java engine. so do you agree with this pattern with engine prefix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
yuzhaojing commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058461044 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.config; + +import org.apache.hudi.common.model.ActionType; + +import javax.annotation.concurrent.Immutable; + +import java.util.Properties; + +/** + * Configurations used by the Hudi Table Service Manager. + */ +@Immutable +@ConfigClassProperty(name = "Table Service Manager Configs", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "Configurations used by the Hudi Table Service Manager.") +public class HoodieTableServiceManagerConfig extends HoodieConfig { + + public static final String TABLE_SERVICE_MANAGER_PREFIX = "hoodie.table.service.manager"; + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable") + .defaultValue(false) + .withDocumentation("Use table manager service to execute table service"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris") + .defaultValue("http://localhost:9091";) + .withDocumentation("Table service manager uris"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions") + .defaultValue("") + .withDocumentation("Which action deploy on table service manager such as compaction:clean, default null"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username") + .defaultValue("default") + .withDocumentation("The user name to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue") + .defaultValue("default") + .withDocumentation("The queue to deploy for table service of this table"); + + public static final ConfigProperty TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty + .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource") + .defaultValue("4g:4g") + .withDocumentation("The resource to deploy for table service of this table, default driver 4g, executor 4g"); Review Comment: Agree it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager
yuzhaojing commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366779990 Add BaseTableServiceClient. Move some method from BaseHoodieWriteClient to BaseTableServiceClient. - asyncClean - asyncArchive - inlineCompaction(org.apache.hudi.common.util.Option>) - inlineCompaction(org.apache.hudi.table.HoodieTable, org.apache.hudi.common.util.Option>) - logCompact(java.lang.String, boolean) - inlineLogCompact - runAnyPendingCompactions - runAnyPendingLogCompactions - inlineScheduleCompaction - scheduleCompaction - compact - commitCompaction - completeCompaction - scheduleLogCompaction - scheduleLogCompactionAtInstant - logCompact(java.lang.String) - completeLogCompaction - scheduleCompactionAtInstant - scheduleClustering - scheduleClusteringAtInstant - scheduleCleaning - scheduleCleaningAtInstant - cluster - runTableServicesInline - scheduleTableService - scheduleTableServiceInternal - inlineClustering(org.apache.hudi.common.util.Option>) - inlineClustering(org.apache.hudi.table.HoodieTable, org.apache.hudi.common.util.Option>) - inlineScheduleClustering - runAnyPendingClustering - finalizeWrite - writeTableMetadata - clean - archive - getInflightTimelineExcludeCompactionAndClustering - getPendingRollbackInfo(org.apache.hudi.common.table.HoodieTableMetaClient, java.lang.String) - getPendingRollbackInfo(org.apache.hudi.common.table.HoodieTableMetaClient, java.lang.String, boolean) - getPendingRollbackInfos(org.apache.hudi.common.table.HoodieTableMetaClient) - getPendingRollbackInfos(org.apache.hudi.common.table.HoodieTableMetaClient, boolean) - rollbackFailedWrites() - rollbackFailedWrites(boolean) - rollbackFailedWrites(java.util.Map>, boolean) - getInstantsToRollback - rollback - rollbackFailedBootstrap Move HoodieMetrics and TransactionManager from BaseHoodieWriteClient to BaseHoodieClient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] amitbans opened a new issue, #7574: [SUPPORT]
amitbans opened a new issue, #7574: URL: https://github.com/apache/hudi/issues/7574 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** We are upgrading Hudi from 0.7 to 0.10.1 (part of EMR 5.33.1 to EMR 5.36.0) and facing stage failures at stage "Doing partition and writing data isEmpty at HoodieSparkSqlWriter.scala:627". We have tried increasing executor memory from 30g to 50g but error persists. A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : 0.10.1 * Spark version : Spark 2.4.8 * Hive version : Hive 2.3.9 * Hadoop version : * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] amitbans commented on issue #7574: [SUPPORT] Upsert job failing while upgrading from 0.7 to 0.10.1
amitbans commented on issue #7574: URL: https://github.com/apache/hudi/issues/7574#issuecomment-1366785404 https://user-images.githubusercontent.com/6244582/209845108-4b1ae9f9-6b88-457e-a16b-ae3ea72c6dc0.png";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xccui opened a new pull request, #7575: [MINOR] Set engine when creating meta writer config
xccui opened a new pull request, #7575: URL: https://github.com/apache/hudi/pull/7575 ### Change Logs Properly set engine type when creating `MetadataWriteConfig` from `HoodieWriteConfig`. ### Impact Won't get warning `Embedded timeline server is disabled, fallback to use direct marker type for spark` ### Risk level (write none, low medium or high below) none ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
xushiyan commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058482453 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java: ## @@ -277,7 +277,7 @@ protected void writeTableMetadata(HoodieTable table, String instantTime, String * Initialized the metadata table on start up, should only be called once on driver. */ public void initMetadataTable() { -((HoodieFlinkTableServiceClient) tableServiceClient).initMetadataTable(); +((HoodieFlinkHoodieTableServiceClient) tableServiceClient).initMetadataTable(); Review Comment: redundant `Hoodie` in the name -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
yuzhaojing commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058484066 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java: ## @@ -277,7 +277,7 @@ protected void writeTableMetadata(HoodieTable table, String instantTime, String * Initialized the metadata table on start up, should only be called once on driver. */ public void initMetadataTable() { -((HoodieFlinkTableServiceClient) tableServiceClient).initMetadataTable(); +((HoodieFlinkHoodieTableServiceClient) tableServiceClient).initMetadataTable(); Review Comment: Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
xushiyan commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058485340 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java: ## @@ -277,7 +277,7 @@ protected void writeTableMetadata(HoodieTable table, String instantTime, String * Initialized the metadata table on start up, should only be called once on driver. */ public void initMetadataTable() { -((HoodieFlinkTableServiceClient) tableServiceClient).initMetadataTable(); +((HoodieFlinkHoodieTableServiceClient) tableServiceClient).initMetadataTable(); Review Comment: it has the same problem with Java and Spark table service client classes. It should be HoodieJavaTableServiceClient and HoodieSparkRDDTableServiceClient -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager
xushiyan commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r1058485652 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java: ## @@ -277,7 +277,7 @@ protected void writeTableMetadata(HoodieTable table, String instantTime, String * Initialized the metadata table on start up, should only be called once on driver. */ public void initMetadataTable() { -((HoodieFlinkTableServiceClient) tableServiceClient).initMetadataTable(); +((HoodieFlinkHoodieTableServiceClient) tableServiceClient).initMetadataTable(); Review Comment: the diagram needs update too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex opened a new pull request, #7576: attempt at ssl implementation
jonvex opened a new pull request, #7576: URL: https://github.com/apache/hudi/pull/7576 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive
hudi-bot commented on PR #7385: URL: https://github.com/apache/hudi/pull/7385#issuecomment-1366807963 ## CI report: * 9882c15708236cd4b66a9c54329f055db846ade8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14016) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.
hudi-bot commented on PR #7572: URL: https://github.com/apache/hudi/pull/7572#issuecomment-1366808337 ## CI report: * 5a4fae5d3d42446c19894406dc53a4d7327a9b48 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14018) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager
hudi-bot commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366811068 ## CI report: * 64ecea100e226b7fd539cab05c03bc9902e36db1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14015) * c20aa589730546c0c7bb82969c92aa6d364af101 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7575: [MINOR] Set engine when creating meta write config
hudi-bot commented on PR #7575: URL: https://github.com/apache/hudi/pull/7575#issuecomment-1366811719 ## CI report: * a35c9c05aec17c775e39c0472fbe952178b2f60e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7576: attempt at ssl implementation
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1366811749 ## CI report: * f665a724e2450d7c27f6e3d44bb443404f114036 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager
hudi-bot commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366814215 ## CI report: * 64ecea100e226b7fd539cab05c03bc9902e36db1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14015) * c20aa589730546c0c7bb82969c92aa6d364af101 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14020) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7575: [MINOR] Set engine when creating meta write config
hudi-bot commented on PR #7575: URL: https://github.com/apache/hudi/pull/7575#issuecomment-1366814849 ## CI report: * a35c9c05aec17c775e39c0472fbe952178b2f60e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14021) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7576: attempt at ssl implementation
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1366814866 ## CI report: * f665a724e2450d7c27f6e3d44bb443404f114036 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14022) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5485) Improve performance of savepoint with MDT
Ethan Guo created HUDI-5485: --- Summary: Improve performance of savepoint with MDT Key: HUDI-5485 URL: https://issues.apache.org/jira/browse/HUDI-5485 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata
hudi-bot commented on PR #7573: URL: https://github.com/apache/hudi/pull/7573#issuecomment-1366853421 ## CI report: * bfec3be3263c21f7533fc16e19eec3598617a5bf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14019) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Shagish opened a new issue, #7577: [SUPPORT]
Shagish opened a new issue, #7577: URL: https://github.com/apache/hudi/issues/7577 Hi Team We are facing an issue in our Prod environment for Hoodie table The application was running fine, and it was writing to the hoodie table and all sudden the application failed when we are trying to bring back the application. it run for 5-10 mints and while writing to the file it throws an error. Below are the details What table type cow or mor - MOR What spark version - 3.2.1 What hudi version - 0.11.0 Where r u running spark jobs - In EMR 6.7.0 What is Hadoop version What were you trying to The application is a Spark Hoodie streaming Job. It reads the message from the Kafka topic, process the message and then write to hoodie table. The application Runs for a while and later while writing the data to the Hoodie table, it fails with file not found exception The file which it complains as not found is very old (12/01/2022) parquet file. What have you tried We tried with changing the Hoodie properties and restarting the steps, but it is failed Below is the log details 022-12-22 22:52:32 INFO YarnClusterScheduler:57 - Killing all running tasks in stage 497: Stage cancelled 2022-12-22 22:52:33 INFO DAGScheduler:57 - ResultStage 497 (start at Application.java:101) failed in 2.542 s due to Job aborted due to stage failure: Task 0 in stage 497.0 failed 4 times, most recent failure: Lost task 0.3 in stage 497.0 (TID 762) ([ip-10-220-71-253.emr.awsw.cld.ds.dtvops.net](http://ip-10-220-71-253.emr.awsw.cld.ds.dtvops.net/) executor 2): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleInsertPartition(BaseSparkCommitActionExecutor.java:335) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:246) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at [org.apache.spark.storage.BlockManager.org](http://org.apache.spark.storage.blockmanager.org/)$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://X/up_md_info/table/df245ac4-eafb-491b-8f5f-fcbb920b30ee-0_20-1773-8703_20221201102406279.parquet' at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate
[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Fix Version/s: 0.13.0 > Improve performance of savepoint with MDT > - > > Key: HUDI-5485 > URL: https://issues.apache.org/jira/browse/HUDI-5485 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > > [https://github.com/apache/hudi/issues/7541] > When metadata table is enabled, the savepoint operation is slow for a large > number of partitions (e.g., 75k). The root cause is that for each partition, > the metadata table is scanned, which is unnecessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Description: [https://github.com/apache/hudi/issues/7541] When metadata table is enabled, the savepoint operation is slow for a large number of partitions (e.g., 75k). The root cause is that for each partition, the metadata table is scanned, which is unnecessary. > Improve performance of savepoint with MDT > - > > Key: HUDI-5485 > URL: https://issues.apache.org/jira/browse/HUDI-5485 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > > [https://github.com/apache/hudi/issues/7541] > When metadata table is enabled, the savepoint operation is slow for a large > number of partitions (e.g., 75k). The root cause is that for each partition, > the metadata table is scanned, which is unnecessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5485) Improve performance of savepoint with MDT
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-5485: --- Assignee: Ethan Guo > Improve performance of savepoint with MDT > - > > Key: HUDI-5485 > URL: https://issues.apache.org/jira/browse/HUDI-5485 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > > [https://github.com/apache/hudi/issues/7541] > When metadata table is enabled, the savepoint operation is slow for a large > number of partitions (e.g., 75k). The root cause is that for each partition, > the metadata table is scanned, which is unnecessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Priority: Critical (was: Major) > Improve performance of savepoint with MDT > - > > Key: HUDI-5485 > URL: https://issues.apache.org/jira/browse/HUDI-5485 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Critical > > [https://github.com/apache/hudi/issues/7541] > When metadata table is enabled, the savepoint operation is slow for a large > number of partitions (e.g., 75k). The root cause is that for each partition, > the metadata table is scanned, which is unnecessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on a diff in pull request #7569: [DOCS] Add aws module dependency for config generation and update new configs
yihua commented on code in PR #7569: URL: https://github.com/apache/hudi/pull/7569#discussion_r1058553590 ## hudi-utils/pom.xml: ## @@ -55,6 +55,13 @@ ${hudi.version} + Review Comment: nit: remove the empty line. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5486) Update 0.12.x release notes with Long Term Support
[ https://issues.apache.org/jira/browse/HUDI-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5486: Fix Version/s: 0.12.2 > Update 0.12.x release notes with Long Term Support > --- > > Key: HUDI-5486 > URL: https://issues.apache.org/jira/browse/HUDI-5486 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 0.12.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5486) Update 0.12.x release notes with Long Term Support
Ethan Guo created HUDI-5486: --- Summary: Update 0.12.x release notes with Long Term Support Key: HUDI-5486 URL: https://issues.apache.org/jira/browse/HUDI-5486 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5486) Update 0.12.x release notes with Long Term Support
[ https://issues.apache.org/jira/browse/HUDI-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-5486: --- Assignee: Ethan Guo > Update 0.12.x release notes with Long Term Support > --- > > Key: HUDI-5486 > URL: https://issues.apache.org/jira/browse/HUDI-5486 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5486) Update 0.12.x release notes with Long Term Support
[ https://issues.apache.org/jira/browse/HUDI-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5486: Fix Version/s: 0.13.0 (was: 0.12.2) > Update 0.12.x release notes with Long Term Support > --- > > Key: HUDI-5486 > URL: https://issues.apache.org/jira/browse/HUDI-5486 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)