[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive

2022-12-28 Thread GitBox


xiarixiaoyao commented on code in PR #7385:
URL: https://github.com/apache/hudi/pull/7385#discussion_r1058137576


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java:
##
@@ -78,7 +79,14 @@ public class HMSDDLExecutor implements DDLExecutor {
   public HMSDDLExecutor(HiveSyncConfig syncConfig) throws HiveException, 
MetaException {
 this.syncConfig = syncConfig;
 this.databaseName = syncConfig.getStringOrDefault(META_SYNC_DATABASE_NAME);
-this.client = Hive.get(syncConfig.getHiveConf()).getMSC();
+HiveConf hiveConf = syncConfig.getHiveConf();
+IMetaStoreClient tempMetaStoreClient;
+try {
+  tempMetaStoreClient = ((Hive) 
Hive.class.getMethod("getWithoutRegisterFns", HiveConf.class).invoke(null, 
hiveConf)).getMSC();
+} catch (Exception ex) {

Review Comment:
   NoSuchMethodException ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-4541) Flink job fails with column stats enabled in metadata table due to NotSerializableException

2022-12-28 Thread Alexander Trushev (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652424#comment-17652424
 ] 

Alexander Trushev commented on HUDI-4541:
-

I guess this issue has already been fixed by 
https://issues.apache.org/jira/browse/HUDI-4548

> Flink job fails with column stats enabled in metadata table due to 
> NotSerializableException

> 
>
> Key: HUDI-4541
> URL: https://issues.apache.org/jira/browse/HUDI-4541
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink-sql
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: Screen Shot 2022-08-04 at 17.10.05.png
>
>
> Environment: EMR 6.7.0 Flink 1.14.2
> Reproducible steps: Build Hudi Flink bundle from master
> {code:java}
> mvn clean package -DskipTests  -pl :hudi-flink1.14-bundle -am {code}
> Copy to EMR master node /lib/flink/lib
> Launch Flink SQL client:
> {code:java}
> cd /lib/flink && ./bin/yarn-session.sh --detached
> ./bin/sql-client.sh {code}
> Run the following from the Flink quick start guide with metadata table, 
> column stats, and data skipping enabled
> {code:java}
> CREATE TABLE t1(
>   uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
>   name VARCHAR(10),
>   age INT,
>   ts TIMESTAMP(3),
>   `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
>   'connector' = 'hudi',
>   'path' = 's3a://',
>   'table.type' = 'MERGE_ON_READ', -- this creates a MERGE_ON_READ table, by 
> default is COPY_ON_WRITE
>   'metadata.enabled' = 'true', -- enables multi-modal index and metadata table
>   'hoodie.metadata.index.column.stats.enable' = 'true', -- enables column 
> stats in metadata table
>   'read.data.skipping.enabled' = 'true' -- enables data skipping
> );
> INSERT INTO t1 VALUES
>   ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
>   ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
>   ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
>   ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
>   ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
>   ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
>   ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
>   ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); {code}
> !Screen Shot 2022-08-04 at 17.10.05.png|width=1130,height=463!
> Exception:
> {code:java}
> 2022-08-04 17:04:41
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
>     at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
>     at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
>     at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228)
>     at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218)
>     at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209)
>     at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679)
>     at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
>     at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444)
>     at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
>     at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
>     at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>     at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>     at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>     at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>  

[jira] [Updated] (HUDI-5482) Nulls should be counted in the value count stats for mor table

2022-12-28 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5482:
-
Component/s: metadata

> Nulls should be counted in the value count stats for mor table
> --
>
> Key: HUDI-5482
> URL: https://issues.apache.org/jira/browse/HUDI-5482
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: core, metadata
>Reporter: Danny Chen
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5482) Nulls should be counted in the value count stats for mor table

2022-12-28 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-5482.

  Assignee: Hui An
Resolution: Fixed

> Nulls should be counted in the value count stats for mor table
> --
>
> Key: HUDI-5482
> URL: https://issues.apache.org/jira/browse/HUDI-5482
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: core, metadata
>Reporter: Danny Chen
>Assignee: Hui An
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] ZeyuQiu-Rinze commented on issue #7520: [SUPPORT] Hudi took a very long time in "Getting small files from partitions" stage

2022-12-28 Thread GitBox


ZeyuQiu-Rinze commented on issue #7520:
URL: https://github.com/apache/hudi/issues/7520#issuecomment-1366461385

   Yes, as @yihua says, `users_activity_create_date` contains an actual 
timestamp. I change it to a date str and it work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ZeyuQiu-Rinze closed issue #7520: [SUPPORT] Hudi took a very long time in "Getting small files from partitions" stage

2022-12-28 Thread GitBox


ZeyuQiu-Rinze closed issue #7520: [SUPPORT] Hudi took a very long time in 
"Getting small files from partitions" stage
URL: https://github.com/apache/hudi/issues/7520


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #7441: [HUDI-5378] Remove minlog.Log

2022-12-28 Thread GitBox


xushiyan commented on PR #7441:
URL: https://github.com/apache/hudi/pull/7441#issuecomment-1366461848

   @XuQianJin-Stars compliance issue: merge commit message does not contain 
jira id or [MINOR]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


SteNicholas commented on PR #7568:
URL: https://github.com/apache/hudi/pull/7568#issuecomment-1366462432

   @danny0405, @zhuanshenbsj1, I have created this pull request to fix the 
problem mentioned in #7405 in which the implementation is a little complex. 
PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


danny0405 commented on code in PR #7568:
URL: https://github.com/apache/hudi/pull/7568#discussion_r1058168598


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -487,7 +487,24 @@ public Option getEarliestCommitToRetain() {
 int hoursRetained = config.getCleanerHoursRetained();
 if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS
 && commitTimeline.countInstants() > commitsRetained) {
-  earliestCommitToRetain = 
commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); 
//15 instants total, 10 commits to retain, this gives 6th instant in the list
+  Option earliestPendingCommits = 
hoodieTable.getMetaClient()
+  .getActiveTimeline()
+  .getCommitsTimeline()
+  .filter(s -> !s.isCompleted()).firstInstant();
+  if (earliestPendingCommits.isPresent()) {
+// Earliest commit to retain must not be later than the earliest 
pending commit
+earliestCommitToRetain =
+commitTimeline.nthInstant(commitTimeline.countInstants() - 
commitsRetained).map(nthInstant -> {
+  if (nthInstant.compareTo(earliestPendingCommits.get()) <= 0) {

Review Comment:
   Why we need this check ? Can the cleaner cleans an instant that is not 
complete ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7550: [MINOR] Avoid running tests as part of bundle uploads

2022-12-28 Thread GitBox


hudi-bot commented on PR #7550:
URL: https://github.com/apache/hudi/pull/7550#issuecomment-1366480173

   
   ## CI report:
   
   * 144c9ef6e5f5b1ee2f0bbdd8a7684a7cdb52a352 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13945)
 
   * 00ea42c662445ccce81e186d9f857564c2ce5c7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7550: [MINOR] Avoid running tests as part of bundle uploads

2022-12-28 Thread GitBox


hudi-bot commented on PR #7550:
URL: https://github.com/apache/hudi/pull/7550#issuecomment-1366484135

   
   ## CI report:
   
   * 144c9ef6e5f5b1ee2f0bbdd8a7684a7cdb52a352 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13945)
 
   * 00ea42c662445ccce81e186d9f857564c2ce5c7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14009)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


SteNicholas commented on code in PR #7568:
URL: https://github.com/apache/hudi/pull/7568#discussion_r1058184448


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -487,7 +487,24 @@ public Option getEarliestCommitToRetain() {
 int hoursRetained = config.getCleanerHoursRetained();
 if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS
 && commitTimeline.countInstants() > commitsRetained) {
-  earliestCommitToRetain = 
commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); 
//15 instants total, 10 commits to retain, this gives 6th instant in the list
+  Option earliestPendingCommits = 
hoodieTable.getMetaClient()
+  .getActiveTimeline()
+  .getCommitsTimeline()
+  .filter(s -> !s.isCompleted()).firstInstant();
+  if (earliestPendingCommits.isPresent()) {
+// Earliest commit to retain must not be later than the earliest 
pending commit
+earliestCommitToRetain =
+commitTimeline.nthInstant(commitTimeline.countInstants() - 
commitsRetained).map(nthInstant -> {
+  if (nthInstant.compareTo(earliestPendingCommits.get()) <= 0) {

Review Comment:
   @danny0405, this check is to ensure the earliest commit to retain must not 
be later than the earliest pending commit. The cleaner couldn't clean 
uncompleted instant and should keep the linear cleaning for incremental clean 
mode.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


danny0405 commented on code in PR #7568:
URL: https://github.com/apache/hudi/pull/7568#discussion_r1058188680


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -487,7 +487,24 @@ public Option getEarliestCommitToRetain() {
 int hoursRetained = config.getCleanerHoursRetained();
 if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS
 && commitTimeline.countInstants() > commitsRetained) {
-  earliestCommitToRetain = 
commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); 
//15 instants total, 10 commits to retain, this gives 6th instant in the list
+  Option earliestPendingCommits = 
hoodieTable.getMetaClient()
+  .getActiveTimeline()
+  .getCommitsTimeline()
+  .filter(s -> !s.isCompleted()).firstInstant();
+  if (earliestPendingCommits.isPresent()) {
+// Earliest commit to retain must not be later than the earliest 
pending commit
+earliestCommitToRetain =
+commitTimeline.nthInstant(commitTimeline.countInstants() - 
commitsRetained).map(nthInstant -> {
+  if (nthInstant.compareTo(earliestPendingCommits.get()) <= 0) {

Review Comment:
   Oh, i see, this change has effect for incremental cleaning, for normal 
cleaning, do we need this change ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


danny0405 commented on code in PR #7568:
URL: https://github.com/apache/hudi/pull/7568#discussion_r1058188976


##
hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java:
##
@@ -224,4 +225,36 @@ public static List 
getPendingClusteringInstantTimes(HoodieTableMe
   public static boolean isPendingClusteringInstant(HoodieTableMetaClient 
metaClient, HoodieInstant instant) {
 return getClusteringPlan(metaClient, instant).isPresent();
   }
+
+  /**
+   * Checks whether the latest clustering instant has a subsequent cleaning 
action. Returns
+   * the clustering instant if there is such cleaning action or empty.
+   *
+   * @param activeTimeline The active timeline
+   * @return the oldest instant to retain for clustering
+   */
+  public static Option 
getOldestInstantToRetainForClustering(HoodieActiveTimeline activeTimeline)
+  throws IOException {
+Option cleanInstantOpt =
+activeTimeline.getCleanerTimeline().filter(instant -> 
!instant.isCompleted()).firstInstant();
+if (cleanInstantOpt.isPresent()) {
+  // The first clustering instant of which timestamp is greater than or 
equal to the earliest commit to retain of
+  // the clean metadata.

Review Comment:
   There is no need to do this check if there is no clustering instant on the 
timeline at all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


SteNicholas commented on code in PR #7568:
URL: https://github.com/apache/hudi/pull/7568#discussion_r1058192365


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -487,7 +487,24 @@ public Option getEarliestCommitToRetain() {
 int hoursRetained = config.getCleanerHoursRetained();
 if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS
 && commitTimeline.countInstants() > commitsRetained) {
-  earliestCommitToRetain = 
commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); 
//15 instants total, 10 commits to retain, this gives 6th instant in the list
+  Option earliestPendingCommits = 
hoodieTable.getMetaClient()
+  .getActiveTimeline()
+  .getCommitsTimeline()
+  .filter(s -> !s.isCompleted()).firstInstant();
+  if (earliestPendingCommits.isPresent()) {
+// Earliest commit to retain must not be later than the earliest 
pending commit
+earliestCommitToRetain =
+commitTimeline.nthInstant(commitTimeline.countInstants() - 
commitsRetained).map(nthInstant -> {
+  if (nthInstant.compareTo(earliestPendingCommits.get()) <= 0) {

Review Comment:
   @danny0405, the `getPartitionPathsForFullCleaning` doesn't invoke this 
method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


SteNicholas commented on code in PR #7568:
URL: https://github.com/apache/hudi/pull/7568#discussion_r1058196194


##
hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java:
##
@@ -224,4 +225,36 @@ public static List 
getPendingClusteringInstantTimes(HoodieTableMe
   public static boolean isPendingClusteringInstant(HoodieTableMetaClient 
metaClient, HoodieInstant instant) {
 return getClusteringPlan(metaClient, instant).isPresent();
   }
+
+  /**
+   * Checks whether the latest clustering instant has a subsequent cleaning 
action. Returns
+   * the clustering instant if there is such cleaning action or empty.
+   *
+   * @param activeTimeline The active timeline
+   * @return the oldest instant to retain for clustering
+   */
+  public static Option 
getOldestInstantToRetainForClustering(HoodieActiveTimeline activeTimeline)
+  throws IOException {
+Option cleanInstantOpt =
+activeTimeline.getCleanerTimeline().filter(instant -> 
!instant.isCompleted()).firstInstant();
+if (cleanInstantOpt.isPresent()) {
+  // The first clustering instant of which timestamp is greater than or 
equal to the earliest commit to retain of
+  // the clean metadata.

Review Comment:
   @danny0405, +1, I have added the check whether the completed replace 
timeline is empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-2508) Build GA for the dependeny diff check workflow

2022-12-28 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2508:
-
Description: 
Configure a GitHub actions job that, for each PR,

- deploys maven artifacts (as per [release deploy 
script|https://github.com/apache/hudi/blob/master/scripts/release/deploy_staging_jars.sh])
 to a local temporary maven repo (needs to research how to setup a local maven 
repo)
- run scripts as per https://issues.apache.org/jira/browse/HUDI-5475 and 
https://github.com/xushiyan/hudi/pull/15/files to generate dependency trees in 
a temp dir
- compare existing dependency trees with the generated ones - if there is any 
difference, fail the job so that the author will need to manually generate the 
dep trees and commit the updates in the PR itself


This is to enforce dependency governance.

This job should only run for PRs.

  was:
Configure a GitHub actions job that, for each PR,

- deploys maven artifacts (as per release deploy script) to a local temporary 
maven repo (needs to research how to setup a local maven repo)
- run scripts as per https://issues.apache.org/jira/browse/HUDI-5475 and 
https://github.com/xushiyan/hudi/pull/15/files to generate dependency trees in 
a temp dir
- compare existing dependency trees with the generated ones - if there is any 
difference, fail the job so that the author will need to manually generate the 
dep trees and commit the updates in the PR itself


This is to enforce dependency governance.

This job should only run for PRs.


> Build GA for the dependeny diff check workflow
> --
>
> Key: HUDI-2508
> URL: https://issues.apache.org/jira/browse/HUDI-2508
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>Reporter: vinoyang
>Assignee: Lokesh Jain
>Priority: Major
>
> Configure a GitHub actions job that, for each PR,
> - deploys maven artifacts (as per [release deploy 
> script|https://github.com/apache/hudi/blob/master/scripts/release/deploy_staging_jars.sh])
>  to a local temporary maven repo (needs to research how to setup a local 
> maven repo)
> - run scripts as per https://issues.apache.org/jira/browse/HUDI-5475 and 
> https://github.com/xushiyan/hudi/pull/15/files to generate dependency trees 
> in a temp dir
> - compare existing dependency trees with the generated ones - if there is any 
> difference, fail the job so that the author will need to manually generate 
> the dep trees and commit the updates in the PR itself
> This is to enforce dependency governance.
> This job should only run for PRs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-2508) Build GA for the dependeny diff check workflow

2022-12-28 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2508:
-
Priority: Critical  (was: Major)

> Build GA for the dependeny diff check workflow
> --
>
> Key: HUDI-2508
> URL: https://issues.apache.org/jira/browse/HUDI-2508
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>Reporter: vinoyang
>Assignee: Lokesh Jain
>Priority: Critical
>
> Configure a GitHub actions job that, for each PR,
> - deploys maven artifacts (as per [release deploy 
> script|https://github.com/apache/hudi/blob/master/scripts/release/deploy_staging_jars.sh])
>  to a local temporary maven repo (needs to research how to setup a local 
> maven repo)
> - run scripts as per https://issues.apache.org/jira/browse/HUDI-5475 and 
> https://github.com/xushiyan/hudi/pull/15/files to generate dependency trees 
> in a temp dir
> - compare existing dependency trees with the generated ones - if there is any 
> difference, fail the job so that the author will need to manually generate 
> the dep trees and commit the updates in the PR itself
> This is to enforce dependency governance.
> This job should only run for PRs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] perfectcw opened a new issue, #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time

2022-12-28 Thread GitBox


perfectcw opened a new issue, #7570:
URL: https://github.com/apache/hudi/issues/7570

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Issue:
   Lost some partitions when sync hive
   
   Background: 
   We have a data ingest pipeline, which ingest about 500 partitions per day. 
And the pipeline will submit multiple commits at the same time to insert 
different partitions. The sync hive function is enabled for each commit.
   
   _**And after all of commits succeed, we found that some partitions are 
missing in the hive table.**_
   
   The following is the analysis of log and hoodie files:
   For the hoodie files, shows six of the commits. Then it was found that only 
_20221227042858342_ & _20221227042906103_ two commits were synced to hive, and 
the rest of the partitions did not appear in hive table.
   
   I think the root cause is because of the mechanism of sync hive. When hudi 
sync hive after the commit is succeed, it will first get the latest synced 
commit, and then use the timestamp of this commit as a benchmark to check 
whether the new column and partition are added to the commit behind it, and if 
so, it will sync to hive.
   So if a commit A is submmitted before this latest synced commit B, but 
succeeds after commit B, so it will not be synced hive. Because of commit A's 
timestamp < commit B's timestamp, it won't be detected.
   
   Here is the log of commit 20221227042859357, we can see it get latest synced 
commit is 20221227042906103, which commit after 20221227042859357 itself. So 
the partition inserted by 20221227042859357 commit has not been detected, and 
the partition that needs to be synced is 0.
   
   `2022-12-27 04:30:16,449 INFO hive.metastore: Opened a connection to 
metastore, current connections: 1
   2022-12-27 04:30:16,465 INFO hive.metastore: Connected to metastore.
   2022-12-27 04:30:16,676 INFO hive.HiveSyncTool: Syncing target hoodie table 
with hive table(forecast_agg_hoover_multi_publish). Hive metastore URL 
:jdbc:hive2://hs2.presto.stg.aws.fwmrm.net:1/;auth=noSasl, basePath 
:s3a://fw1-stg-af-dip/hudi/forecast_agg_hoover_multi_publish
   2022-12-27 04:30:16,676 INFO hive.HiveSyncTool: Trying to sync hoodie table 
forecast_agg_hoover_multi_publish with base path 
s3a://fw1-stg-af-dip/hudi/forecast_agg_hoover_multi_publish of type 
COPY_ON_WRITE
   2022-12-27 04:30:16,815 INFO table.TableSchemaResolver: Reading schema from 
s3a://fw1-stg-af-dip/hudi/forecast_agg_hoover_multi_publish/20221227/0/20230108/9820ce59-03a8-4efa-8978-3c3cf61298d8-0_1-11-3890_20221227042906103.parquet
   2022-12-27 04:30:16,904 INFO s3a.S3AInputStream: Switching to Random IO seek 
policy
   2022-12-27 04:30:17,477 INFO hive.HiveSyncTool: No Schema difference for 
forecast_agg_hoover_multi_publish
   2022-12-27 04:30:17,477 INFO hive.HiveSyncTool: Schema sync complete. 
Syncing partitions for forecast_agg_hoover_multi_publish
   2022-12-27 04:30:17,525 INFO hive.HiveSyncTool: Last commit time synced was 
found to be 20221227042906103
   2022-12-27 04:30:17,525 INFO common.AbstractSyncHoodieClient: Last commit 
time synced is 20221227042906103, Getting commits since then
   2022-12-27 04:30:17,527 INFO hive.HiveSyncTool: Storage partitions scan 
complete. Found 0
   2022-12-27 04:30:17,697 INFO hive.HiveSyncTool: Sync complete for 
forecast_agg_hoover_multi_publish`
   
   
   `order by time 
   name   type  
   last modify time partition   
   if exist in hive
   20221227042855832.commit.requested   requested   2022-12-27 pm12:28:59 
CST   20221227/0/20230101 no
   20221227042858342.commit.requested   requested   2022-12-27 pm12:29:00 
CST   20221227/0/20230106yes
   20221227042858801.commit.requested   requested   2022-12-27 pm12:29:01 
CST20221227/0/20230107no
   20221227042859357.commit.requested   requested   2022-12-27 pm12:29:01 
CST20221227/0/20221229no
   20221227042901993.commit.requested   requested   2022-12-27 pm12:29:04 
CST   20221227/0/20230103no
   20221227042906103.commit.requested   requested   2022-12-27 pm12:29:08 
CST   20221227/0/20230108yes
   ...
   20221227042855832.inflightinflight   
2022-12-27 pm12:29:16 CST
   20221227042858342.inflightinflight   
2022-12-27 pm12:29:16 CST
   20221227042858801.inflightinflight   
2022-12-27 pm12:29:17 CST
   20221227042859357.inflightinflight   
2022-

[GitHub] [hudi] perfectcw closed issue #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time

2022-12-28 Thread GitBox


perfectcw closed issue #7570: [SUPPORT]Sync hive lost some partitions when 
submit multiple commits at the same time 
URL: https://github.com/apache/hudi/issues/7570


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-28 Thread GitBox


hudi-bot commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366542661

   
   ## CI report:
   
   * e58d4db34dea4225808760126be11d3c559da896 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13711)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13712)
 
   * a65e5950a53b6b655c385d002170c0900a6303d8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


hudi-bot commented on PR #7568:
URL: https://github.com/apache/hudi/pull/7568#issuecomment-1366542976

   
   ## CI report:
   
   * 93c16ab8c496989a928304cc2fe91e38ca678147 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14008)
 
   * cc53c8c67d4e86b4be6dafdf25cafae87c1ed152 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-28 Thread GitBox


hudi-bot commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366546216

   
   ## CI report:
   
   * e58d4db34dea4225808760126be11d3c559da896 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13711)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13712)
 
   * a65e5950a53b6b655c385d002170c0900a6303d8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14010)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


hudi-bot commented on PR #7568:
URL: https://github.com/apache/hudi/pull/7568#issuecomment-1366546555

   
   ## CI report:
   
   * 93c16ab8c496989a928304cc2fe91e38ca678147 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14008)
 
   * cc53c8c67d4e86b4be6dafdf25cafae87c1ed152 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14011)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lokeshj1703 commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2022-12-28 Thread GitBox


lokeshj1703 commented on issue #7430:
URL: https://github.com/apache/hudi/issues/7430#issuecomment-1366554885

   @soumilshah1995 We are still trying to root cause this w.r.t AWS. Will 
update here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs opened a new pull request, #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-28 Thread GitBox


boneanxs opened a new pull request, #7571:
URL: https://github.com/apache/hudi/pull/7571

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   In test `testFileGroupLookUpManyEntriesWithSameStartValue`, before the fix, 
the endKey could be large than 1000, say if endKey is 1024, KeyRangeNode use 
String to compare the value, so "1024" could be smaller than "2xx", causing the 
test failure.
   
   here we don't allow endKey to exceed 1000 to fix the issue.
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   None
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4710) Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4710:
-
Labels: pull-request-available  (was: )

> Fix flaky: 
> TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
> --
>
> Key: HUDI-4710
> URL: https://issues.apache.org/jira/browse/HUDI-4710
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Instance occurance:
> Aug 24th: 
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10923/logs/22]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-28 Thread GitBox


hudi-bot commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366590514

   
   ## CI report:
   
   * a65e5950a53b6b655c385d002170c0900a6303d8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14010)
 
   * f10a71c844934f9682391986f2cbe69566179341 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7403: [HUDI-5343] HoodieFlinkStreamer supports async clustering for append mode

2022-12-28 Thread GitBox


hudi-bot commented on PR #7403:
URL: https://github.com/apache/hudi/pull/7403#issuecomment-1366590587

   
   ## CI report:
   
   * a0317ba801314bd06d99727b2bba0c383ed95cdf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14007)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-28 Thread GitBox


hudi-bot commented on PR #7571:
URL: https://github.com/apache/hudi/pull/7571#issuecomment-1366590844

   
   ## CI report:
   
   * 2baf115f1b4d1474fe8161919797ae68f979fc80 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-28 Thread GitBox


hudi-bot commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366593631

   
   ## CI report:
   
   * a65e5950a53b6b655c385d002170c0900a6303d8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14010)
 
   * f10a71c844934f9682391986f2cbe69566179341 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14012)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-28 Thread GitBox


hudi-bot commented on PR #7571:
URL: https://github.com/apache/hudi/pull/7571#issuecomment-1366593911

   
   ## CI report:
   
   * 2baf115f1b4d1474fe8161919797ae68f979fc80 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14013)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-28 Thread GitBox


hudi-bot commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1366597097

   
   ## CI report:
   
   * f10a71c844934f9682391986f2cbe69566179341 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14012)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] cxzl25 commented on a diff in pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive

2022-12-28 Thread GitBox


cxzl25 commented on code in PR #7385:
URL: https://github.com/apache/hudi/pull/7385#discussion_r1058301005


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java:
##
@@ -78,7 +79,14 @@ public class HMSDDLExecutor implements DDLExecutor {
   public HMSDDLExecutor(HiveSyncConfig syncConfig) throws HiveException, 
MetaException {
 this.syncConfig = syncConfig;
 this.databaseName = syncConfig.getStringOrDefault(META_SYNC_DATABASE_NAME);
-this.client = Hive.get(syncConfig.getHiveConf()).getMSC();
+HiveConf hiveConf = syncConfig.getHiveConf();
+IMetaStoreClient tempMetaStoreClient;
+try {
+  tempMetaStoreClient = ((Hive) 
Hive.class.getMethod("getWithoutRegisterFns", HiveConf.class).invoke(null, 
hiveConf)).getMSC();
+} catch (Exception ex) {

Review Comment:
   Because invoke has three checked exceptions
   
   ```java
   } catch (NoSuchMethodException | IllegalAccessException | 
IllegalArgumentException | InvocationTargetException ex) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


yuzhaojing commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058027049


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.config;
+
+import org.apache.hudi.common.model.ActionType;
+
+import javax.annotation.concurrent.Immutable;
+
+import java.util.Properties;
+
+/**
+ * Configurations used by the Hudi Table Service Manager.
+ */
+@Immutable
+@ConfigClassProperty(name = "Table Service Manager Configs",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "Configurations used by the Hudi Table Service Manager.")
+public class HoodieTableServiceManagerConfig extends HoodieConfig {
+
+  public static final String TABLE_SERVICE_MANAGER_PREFIX = 
"hoodie.table.service.manager";
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable")
+  .defaultValue(false)
+  .withDocumentation("Use table manager service to execute table service");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris")
+  .defaultValue("http://localhost:9091";)
+  .withDocumentation("Table service manager uris");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions")
+  .defaultValue("")
+  .withDocumentation("Which action deploy on table service manager such as 
compaction:clean, default null");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username")
+  .defaultValue("default")
+  .withDocumentation("The user name to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue")
+  .defaultValue("default")
+  .withDocumentation("The queue to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource")
+  .defaultValue("4g:4g")
+  .withDocumentation("The resource to deploy for table service of this 
table, default driver 4g, executor 4g");

Review Comment:
   Spark engine and flink engine can share this configuration, so I don't think 
there is any need to make a distinction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #7403: [HUDI-5343] HoodieFlinkStreamer supports async clustering for append mode

2022-12-28 Thread GitBox


leesf commented on PR #7403:
URL: https://github.com/apache/hudi/pull/7403#issuecomment-1366608450

   merging as the flink module CI success.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] leesf merged pull request #7403: [HUDI-5343] HoodieFlinkStreamer supports async clustering for append mode

2022-12-28 Thread GitBox


leesf merged PR #7403:
URL: https://github.com/apache/hudi/pull/7403


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (bd57282f248 -> f2b2ec9539d)

2022-12-28 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from bd57282f248 [HUDI-5482] Nulls should be counted in the value count 
stats for mor table (#7482)
 add f2b2ec9539d [HUDI-5343] HoodieFlinkStreamer supports async clustering 
for append mode (#7403)

No new revisions were added by this update.

Summary of changes:
 .../sink/clustering/FlinkClusteringConfig.java | 37 +++
 .../hudi/sink/compact/FlinkCompactionConfig.java   | 30 ++--
 .../apache/hudi/streamer/FlinkStreamerConfig.java  | 53 --
 .../apache/hudi/streamer/HoodieFlinkStreamer.java  | 21 +++--
 4 files changed, 95 insertions(+), 46 deletions(-)



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


yuzhaojing commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058305796


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.config;
+
+import org.apache.hudi.common.model.ActionType;
+
+import javax.annotation.concurrent.Immutable;
+
+import java.util.Properties;
+
+/**
+ * Configurations used by the Hudi Table Service Manager.
+ */
+@Immutable
+@ConfigClassProperty(name = "Table Service Manager Configs",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "Configurations used by the Hudi Table Service Manager.")
+public class HoodieTableServiceManagerConfig extends HoodieConfig {
+
+  public static final String TABLE_SERVICE_MANAGER_PREFIX = 
"hoodie.table.service.manager";
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable")
+  .defaultValue(false)
+  .withDocumentation("Use table manager service to execute table service");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris")
+  .defaultValue("http://localhost:9091";)
+  .withDocumentation("Table service manager uris");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions")
+  .defaultValue("")
+  .withDocumentation("Which action deploy on table service manager such as 
compaction:clean, default null");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username")
+  .defaultValue("default")
+  .withDocumentation("The user name to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue")
+  .defaultValue("default")
+  .withDocumentation("The queue to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource")
+  .defaultValue("4g:4g")
+  .withDocumentation("The resource to deploy for table service of this 
table, default driver 4g, executor 4g");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_PARALLELISM = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.parallelism")
+  .defaultValue(100)
+  .withDocumentation("The max parallelism to deploy for table service of 
this table, default 100");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_EXECUTION_ENGINE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".execution.engine")
+  .defaultValue("spark")
+  .withDocumentation("The execution engine to deploy for table service of 
this table, default spark");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_EXTRA_PARAMS = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.extra.params")
+  .defaultValue("")
+  .withDocumentation("The extra params to deploy for table service of this 
table, split by ';'");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_TIMEOUT = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".timeout")
+  .defaultValue(300)
+  .withDocumentation("Connection timeout for client");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_RETRIES = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".connect.retries")
+  .defaultValue(3)
+  .withDocumentation("Number of retries while opening a connection to 
table service manager");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_RETRY_DELAY = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".connect.retry.delay")
+  .defaultValue(1)
+  .withDocumentation("Number of seconds for the client to wait between 
consecutive connection 

[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


yuzhaojing commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058305966


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.config;
+
+import org.apache.hudi.common.model.ActionType;
+
+import javax.annotation.concurrent.Immutable;
+
+import java.util.Properties;
+
+/**
+ * Configurations used by the Hudi Table Service Manager.
+ */
+@Immutable
+@ConfigClassProperty(name = "Table Service Manager Configs",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "Configurations used by the Hudi Table Service Manager.")
+public class HoodieTableServiceManagerConfig extends HoodieConfig {
+
+  public static final String TABLE_SERVICE_MANAGER_PREFIX = 
"hoodie.table.service.manager";
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable")
+  .defaultValue(false)
+  .withDocumentation("Use table manager service to execute table service");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris")
+  .defaultValue("http://localhost:9091";)
+  .withDocumentation("Table service manager uris");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions")
+  .defaultValue("")
+  .withDocumentation("Which action deploy on table service manager such as 
compaction:clean, default null");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username")
+  .defaultValue("default")
+  .withDocumentation("The user name to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue")
+  .defaultValue("default")
+  .withDocumentation("The queue to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource")
+  .defaultValue("4g:4g")
+  .withDocumentation("The resource to deploy for table service of this 
table, default driver 4g, executor 4g");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_PARALLELISM = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.parallelism")
+  .defaultValue(100)
+  .withDocumentation("The max parallelism to deploy for table service of 
this table, default 100");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_EXECUTION_ENGINE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".execution.engine")
+  .defaultValue("spark")
+  .withDocumentation("The execution engine to deploy for table service of 
this table, default spark");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_EXTRA_PARAMS = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.extra.params")
+  .defaultValue("")
+  .withDocumentation("The extra params to deploy for table service of this 
table, split by ';'");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_TIMEOUT = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".timeout")
+  .defaultValue(300)
+  .withDocumentation("Connection timeout for client");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_RETRIES = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".connect.retries")
+  .defaultValue(3)
+  .withDocumentation("Number of retries while opening a connection to 
table service manager");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_RETRY_DELAY = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".connect.retry.delay")
+  .defaultValue(1)
+  .withDocumentation("Number of seconds for the client to wait between 
consecutive connection 

[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


yuzhaojing commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058314660


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTableServiceManagerClient.java:
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.hudi.common.config.HoodieTableServiceManagerConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.ClusteringUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.RetryHelper;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieRemoteException;
+
+import org.apache.http.client.fluent.Request;
+import org.apache.http.client.utils.URIBuilder;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Client which send the table service instants to the table service manager.
+ */
+public class HoodieTableServiceManagerClient {
+
+  /**
+   * Rollback commands, that trigger a specific handling for rollback.
+   */

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7550: [MINOR] Avoid running tests as part of bundle uploads

2022-12-28 Thread GitBox


hudi-bot commented on PR #7550:
URL: https://github.com/apache/hudi/pull/7550#issuecomment-1366639613

   
   ## CI report:
   
   * 00ea42c662445ccce81e186d9f857564c2ce5c7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14009)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-28 Thread GitBox


hudi-bot commented on PR #7568:
URL: https://github.com/apache/hudi/pull/7568#issuecomment-1366639697

   
   ## CI report:
   
   * cc53c8c67d4e86b4be6dafdf25cafae87c1ed152 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14011)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366643203

   
   ## CI report:
   
   * 64ecea100e226b7fd539cab05c03bc9902e36db1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive

2022-12-28 Thread GitBox


hudi-bot commented on PR #7385:
URL: https://github.com/apache/hudi/pull/7385#issuecomment-1366643860

   
   ## CI report:
   
   * 315fbe3897a36665ddad6b0a0d723bf7db0b9e5c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13477)
 
   * 9882c15708236cd4b66a9c54329f055db846ade8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366647277

   
   ## CI report:
   
   * 64ecea100e226b7fd539cab05c03bc9902e36db1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14015)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive

2022-12-28 Thread GitBox


hudi-bot commented on PR #7385:
URL: https://github.com/apache/hudi/pull/7385#issuecomment-1366647737

   
   ## CI report:
   
   * 315fbe3897a36665ddad6b0a0d723bf7db0b9e5c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13477)
 
   * 9882c15708236cd4b66a9c54329f055db846ade8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14016)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] viverlxl commented on issue #7162: [SUPPORT] Flink stream api(HoodieFlinkStreamer) write data to hudi create much rollbackfile

2022-12-28 Thread GitBox


viverlxl commented on issue #7162:
URL: https://github.com/apache/hudi/issues/7162#issuecomment-1366653860

   @yihua  Yes,this problem has been solved.  will created new HoodieClient in 
OperatorCoordinator、StreamWriteFunction compactFunction when the schema change, 
also sync new schema to hive... we work in pro env, but muti diff talbe write 
to hudi will cause performance problems


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] minihippo opened a new pull request, #7572: Make retryhelper more suitable for common use.

2022-12-28 Thread GitBox


minihippo opened a new pull request, #7572:
URL: https://github.com/apache/hudi/pull/7572

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5483) Make RetryHelper more suitable for common use

2022-12-28 Thread XiaoyuGeng (Jira)
XiaoyuGeng created HUDI-5483:


 Summary: Make RetryHelper more suitable for common use
 Key: HUDI-5483
 URL: https://issues.apache.org/jira/browse/HUDI-5483
 Project: Apache Hudi
  Issue Type: Improvement
  Components: core
Reporter: XiaoyuGeng
Assignee: XiaoyuGeng
 Fix For: 0.13.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5484) Avoid using GenericRecord in ColumnStatMetadata

2022-12-28 Thread dzcxzl (Jira)
dzcxzl created HUDI-5484:


 Summary: Avoid using GenericRecord in ColumnStatMetadata
 Key: HUDI-5484
 URL: https://issues.apache.org/jira/browse/HUDI-5484
 Project: Apache Hudi
  Issue Type: Bug
Reporter: dzcxzl


 

 
{code:java}
org.apache.hudi.com.esotericsoftware.kryo.KryoException: 
java.lang.UnsupportedOperationException
Serialization trace:
reserved (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats)
columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
    at 
org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232)
    at 
org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45)
    at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512)
    at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
    at 
org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101)
    at 
org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68)
    at 
org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195)
    at 
org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54)
    at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188)
    at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257)
    at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68)
    at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231)
    at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129)
    at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)Caused
 by: java.lang.UnsupportedOperationException
    at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
    at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5484) Avoid using GenericRecord in ColumnStatMetadata

2022-12-28 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated HUDI-5484:
-
Description: 
 

 
{code:java}
org.apache.hudi.com.esotericsoftware.kryo.KryoException: 
java.lang.UnsupportedOperationException
Serialization trace:
reserved (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats)
columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
    
at 
org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232)
    at 
org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45)
    at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512)
    at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
    at 
org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101)
    at 
org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68)
    at 
org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195)
    at 
org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54)
    at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188)
    at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257)
    at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68)
    at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231)
    at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129)
    at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)Caused
 by: java.lang.UnsupportedOperationException
    at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
    at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125){code}
 

  was:
 

 
{code:java}
org.apache.hudi.com.esotericsoftware.kryo.KryoException: 
java.lang.UnsupportedOperationException
Serialization trace:
reserved (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats)
columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
    at 
org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232)
    at 
org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45)
    at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520)
    at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512)
    at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
    at 
org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101)
    at 
org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203)
    at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199)
    at 
org.apache.hudi.co

[GitHub] [hudi] cxzl25 opened a new pull request, #7573: [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata

2022-12-28 Thread GitBox


cxzl25 opened a new pull request, #7573:
URL: https://github.com/apache/hudi/pull/7573

   ### Change Logs
   
   Avoid using GenericRecord in ColumnStatMetadata.
   
   `HoodieMetadataPayload` is constructed using `GenericRecord` with 
reflection, and `columnStatMetadata` stores `minValue` and `maxValue`, both of 
which are `GenericRecord` types.
   
   Once spill is generated, kryo deserialization fails.
   
    Write fail log
   
   ```java
   org.apache.hudi.com.esotericsoftware.kryo.KryoException: 
java.lang.UnsupportedOperationException
   Serialization trace:
   reserved (org.apache.avro.Schema$Field)
   fieldMap (org.apache.avro.Schema$RecordSchema)
   schema (org.apache.avro.generic.GenericData$Record)
   maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats)
   columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload)
at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
   
at 
org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232)
at 
org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45)
at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339)
at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520)
at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512)
at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
at 
org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101)
at 
org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75)
at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210)
at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203)
at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199)
at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68)
at 
org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195)
at 
org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54)
at 
org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188)
at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257)
at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68)
at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231)
at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129)
at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
   
   Caused by: java.lang.UnsupportedOperationException
at 
java.util.Collections$UnmodifiableCollection.add(Collections.java:1055)
at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
   ```
    construct HoodieMetadataPayload 
   ```java
at 
org.apache.hudi.metadata.HoodieMetadataPayload.(HoodieMetadataPayload.java:233)
at 
org.apache.hudi.metadata.HoodieMetadataPayload.(HoodieMetadataPayload.java:182)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hudi.common.util.HoodieRecordUtils.loadPayload(HoodieRecordUtils.java:99)
at 
org.apache.hudi.common.util.SpillableMapUtils.convertToHoodieRecordPayload(SpillableMapUtils.java:140)
at 
org.apache.hudi.avro.HoodieAvroUtils.createHoodieRecordFromAvro(HoodieAvroUtils.java:1078)
at 
org.apache.hudi.common.model.HoodieAvroIndexedRecord.wrapIntoHoodieRecordPayloadWithParams(HoodieAvroIndexedRecord.java:168)
at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:644)
at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(Abstr

[jira] [Updated] (HUDI-5484) Avoid using GenericRecord in ColumnStatMetadata

2022-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5484:
-
Labels: pull-request-available  (was: )

> Avoid using GenericRecord in ColumnStatMetadata
> ---
>
> Key: HUDI-5484
> URL: https://issues.apache.org/jira/browse/HUDI-5484
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: dzcxzl
>Priority: Critical
>  Labels: pull-request-available
>
>  
>  
> {code:java}
> org.apache.hudi.com.esotericsoftware.kryo.KryoException: 
> java.lang.UnsupportedOperationException
> Serialization trace:
> reserved (org.apache.avro.Schema$Field)
> fieldMap (org.apache.avro.Schema$RecordSchema)
> schema (org.apache.avro.generic.GenericData$Record)
> maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats)
> columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload)
>     at 
> org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
>     
> at 
> org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232)
>     at 
> org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45)
>     at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339)
>     at 
> org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520)
>     at 
> org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512)
>     at 
> org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>     at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101)
>     at 
> org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75)
>     at 
> org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210)
>     at 
> org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203)
>     at 
> org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199)
>     at 
> org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68)
>     at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195)
>     at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54)
>     at 
> org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188)
>     at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257)
>     at 
> org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68)
>     at 
> org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231)
>     at 
> org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129)
>     at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)Caused
>  by: java.lang.UnsupportedOperationException
>     at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055)
>     at 
> org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>     at 
> org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>     at 
> org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>     at 
> org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-28 Thread GitBox


hudi-bot commented on PR #7571:
URL: https://github.com/apache/hudi/pull/7571#issuecomment-1366696718

   
   ## CI report:
   
   * 2baf115f1b4d1474fe8161919797ae68f979fc80 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14013)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-12-28 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1366698891

   
   ## CI report:
   
   * 53aa21bf23d2f8b0404743e6d016cfb2fac444f7 UNKNOWN
   * 07a3ea3956e5ce02a33a55eae4a0339796275f9d UNKNOWN
   * 810af96ee856bd94cfc82b01b67765a735f29c44 UNKNOWN
   * 1c1faa8f063f98023921e4f6015ac2f28adde2b1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13967)
 
   * a616f0831d6dfd7a55b5f52331e43d87374e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.

2022-12-28 Thread GitBox


hudi-bot commented on PR #7572:
URL: https://github.com/apache/hudi/pull/7572#issuecomment-1366700583

   
   ## CI report:
   
   * 5a4fae5d3d42446c19894406dc53a4d7327a9b48 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata

2022-12-28 Thread GitBox


hudi-bot commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1366700614

   
   ## CI report:
   
   * bfec3be3263c21f7533fc16e19eec3598617a5bf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5483) Make RetryHelper more suitable for common use

2022-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5483:
-
Labels: pull-request-available  (was: )

> Make RetryHelper more suitable for common use
> -
>
> Key: HUDI-5483
> URL: https://issues.apache.org/jira/browse/HUDI-5483
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-12-28 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1366702795

   
   ## CI report:
   
   * 53aa21bf23d2f8b0404743e6d016cfb2fac444f7 UNKNOWN
   * 07a3ea3956e5ce02a33a55eae4a0339796275f9d UNKNOWN
   * 810af96ee856bd94cfc82b01b67765a735f29c44 UNKNOWN
   * 1c1faa8f063f98023921e4f6015ac2f28adde2b1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13967)
 
   * a616f0831d6dfd7a55b5f52331e43d87374e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14017)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.

2022-12-28 Thread GitBox


hudi-bot commented on PR #7572:
URL: https://github.com/apache/hudi/pull/7572#issuecomment-1366704389

   
   ## CI report:
   
   * 5a4fae5d3d42446c19894406dc53a4d7327a9b48 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14018)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata

2022-12-28 Thread GitBox


hudi-bot commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1366704453

   
   ## CI report:
   
   * bfec3be3263c21f7533fc16e19eec3598617a5bf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14019)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2022-12-28 Thread GitBox


soumilshah1995 commented on issue #7430:
URL: https://github.com/apache/hudi/issues/7430#issuecomment-1366712825

   Roger that captain :D 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] maddy2u commented on issue #4502: [QUESTION] Athena Hudi Time Travel Queries

2022-12-28 Thread GitBox


maddy2u commented on issue #4502:
URL: https://github.com/apache/hudi/issues/4502#issuecomment-1366718510

   Hi,
   
   Is this accessible today via AWS Athena? Any one who has tried it ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #7459: [SUPPORT] Glue 3.0 with HUDI marketplace Connector

2022-12-28 Thread GitBox


soumilshah1995 commented on issue #7459:
URL: https://github.com/apache/hudi/issues/7459#issuecomment-1366746652

   closing this ticket as its resolved after speaking to support :D cheers 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 closed issue #7459: [SUPPORT] Glue 3.0 with HUDI marketplace Connector

2022-12-28 Thread GitBox


soumilshah1995 closed issue #7459: [SUPPORT] Glue 3.0 with HUDI marketplace 
Connector  
URL: https://github.com/apache/hudi/issues/7459


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-12-28 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1366754283

   
   ## CI report:
   
   * 53aa21bf23d2f8b0404743e6d016cfb2fac444f7 UNKNOWN
   * 07a3ea3956e5ce02a33a55eae4a0339796275f9d UNKNOWN
   * 810af96ee856bd94cfc82b01b67765a735f29c44 UNKNOWN
   * a616f0831d6dfd7a55b5f52331e43d87374e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14017)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366755308

   
   ## CI report:
   
   * 64ecea100e226b7fd539cab05c03bc9902e36db1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14015)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.

2022-12-28 Thread GitBox


xushiyan commented on code in PR #7572:
URL: https://github.com/apache/hudi/pull/7572#discussion_r1058438675


##
hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java:
##
@@ -36,9 +36,10 @@
  *
  * @param  Type of return value for checked function.
  */
-public class RetryHelper implements Serializable {
+public class RetryHelper implements Serializable {
   private static final Logger LOG = LogManager.getLogger(RetryHelper.class);
-  private transient CheckedFunction func;
+  private static final List> 
RETRY_EXCEPTION_CLASS = Arrays.asList(IOException.class, 
RuntimeException.class);

Review Comment:
   better name: `DEFAULT_RETRY_EXCEPTIONS`



##
hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java:
##
@@ -120,7 +118,7 @@ private boolean checkIfExceptionInRetryList(Exception e) {
 
 // if users didn't set hoodie.filesystem.operation.retry.exceptions
 // we will retry all the IOException and RuntimeException
-if (retryExceptionsClasses.isEmpty()) {
+if (retryExceptionsClasses.equals(RETRY_EXCEPTION_CLASS)) {
   return true;
 }

Review Comment:
   but this check being true does not mean `e` is in the list, does it? this 
check looks redundant now, given you've set a default list of exceptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


xushiyan commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058454424


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.config;
+
+import org.apache.hudi.common.model.ActionType;
+
+import javax.annotation.concurrent.Immutable;
+
+import java.util.Properties;
+
+/**
+ * Configurations used by the Hudi Table Service Manager.
+ */
+@Immutable
+@ConfigClassProperty(name = "Table Service Manager Configs",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "Configurations used by the Hudi Table Service Manager.")
+public class HoodieTableServiceManagerConfig extends HoodieConfig {
+
+  public static final String TABLE_SERVICE_MANAGER_PREFIX = 
"hoodie.table.service.manager";
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable")
+  .defaultValue(false)
+  .withDocumentation("Use table manager service to execute table service");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris")
+  .defaultValue("http://localhost:9091";)
+  .withDocumentation("Table service manager uris");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions")
+  .defaultValue("")
+  .withDocumentation("Which action deploy on table service manager such as 
compaction:clean, default null");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username")
+  .defaultValue("default")
+  .withDocumentation("The user name to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue")
+  .defaultValue("default")
+  .withDocumentation("The queue to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource")
+  .defaultValue("4g:4g")
+  .withDocumentation("The resource to deploy for table service of this 
table, default driver 4g, executor 4g");

Review Comment:
   we also support java engine. so do you agree with this pattern with engine 
prefix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


yuzhaojing commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058461044


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.config;
+
+import org.apache.hudi.common.model.ActionType;
+
+import javax.annotation.concurrent.Immutable;
+
+import java.util.Properties;
+
+/**
+ * Configurations used by the Hudi Table Service Manager.
+ */
+@Immutable
+@ConfigClassProperty(name = "Table Service Manager Configs",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "Configurations used by the Hudi Table Service Manager.")
+public class HoodieTableServiceManagerConfig extends HoodieConfig {
+
+  public static final String TABLE_SERVICE_MANAGER_PREFIX = 
"hoodie.table.service.manager";
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ENABLE = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".enable")
+  .defaultValue(false)
+  .withDocumentation("Use table manager service to execute table service");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_URIS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".uris")
+  .defaultValue("http://localhost:9091";)
+  .withDocumentation("Table service manager uris");
+
+  public static final ConfigProperty TABLE_SERVICE_MANAGER_ACTIONS = 
ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".actions")
+  .defaultValue("")
+  .withDocumentation("Which action deploy on table service manager such as 
compaction:clean, default null");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_USERNAME = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.username")
+  .defaultValue("default")
+  .withDocumentation("The user name to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_QUEUE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.queue")
+  .defaultValue("default")
+  .withDocumentation("The queue to deploy for table service of this 
table");
+
+  public static final ConfigProperty 
TABLE_SERVICE_MANAGER_DEPLOY_RESOURCE = ConfigProperty
+  .key(TABLE_SERVICE_MANAGER_PREFIX + ".deploy.resource")
+  .defaultValue("4g:4g")
+  .withDocumentation("The resource to deploy for table service of this 
table, default driver 4g, executor 4g");

Review Comment:
   Agree it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


yuzhaojing commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366779990

   Add BaseTableServiceClient.
   
   Move some method from BaseHoodieWriteClient to BaseTableServiceClient.
   
   - asyncClean
   - asyncArchive
   - 
inlineCompaction(org.apache.hudi.common.util.Option>)
   - inlineCompaction(org.apache.hudi.table.HoodieTable, 
org.apache.hudi.common.util.Option>)
   - logCompact(java.lang.String, boolean)
   - inlineLogCompact
   - runAnyPendingCompactions
   - runAnyPendingLogCompactions
   - inlineScheduleCompaction
   - scheduleCompaction
   - compact
   - commitCompaction
   - completeCompaction
   - scheduleLogCompaction
   - scheduleLogCompactionAtInstant
   - logCompact(java.lang.String)
   - completeLogCompaction
   - scheduleCompactionAtInstant
   - scheduleClustering
   - scheduleClusteringAtInstant
   - scheduleCleaning
   - scheduleCleaningAtInstant
   - cluster
   - runTableServicesInline
   - scheduleTableService
   - scheduleTableServiceInternal
   - 
inlineClustering(org.apache.hudi.common.util.Option>)
   - inlineClustering(org.apache.hudi.table.HoodieTable, 
org.apache.hudi.common.util.Option>)
   - inlineScheduleClustering
   - runAnyPendingClustering
   - finalizeWrite
   - writeTableMetadata
   - clean
   - archive
   - getInflightTimelineExcludeCompactionAndClustering
   - getPendingRollbackInfo(org.apache.hudi.common.table.HoodieTableMetaClient, 
java.lang.String)
   - getPendingRollbackInfo(org.apache.hudi.common.table.HoodieTableMetaClient, 
java.lang.String, boolean)
   - getPendingRollbackInfos(org.apache.hudi.common.table.HoodieTableMetaClient)
   - 
getPendingRollbackInfos(org.apache.hudi.common.table.HoodieTableMetaClient, 
boolean)
   - rollbackFailedWrites()
   - rollbackFailedWrites(boolean)
   - 
rollbackFailedWrites(java.util.Map>,
 boolean)
   - getInstantsToRollback
   - rollback
   - rollbackFailedBootstrap
   
   Move HoodieMetrics and TransactionManager from BaseHoodieWriteClient to 
BaseHoodieClient.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amitbans opened a new issue, #7574: [SUPPORT]

2022-12-28 Thread GitBox


amitbans opened a new issue, #7574:
URL: https://github.com/apache/hudi/issues/7574

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   We are upgrading Hudi from 0.7 to 0.10.1 (part of EMR 5.33.1 to EMR 5.36.0) 
and facing stage failures at stage "Doing partition and writing data isEmpty at 
HoodieSparkSqlWriter.scala:627". We have tried increasing executor memory from 
30g to 50g but error persists.
   

   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : Spark 2.4.8
   
   * Hive version : Hive 2.3.9
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amitbans commented on issue #7574: [SUPPORT] Upsert job failing while upgrading from 0.7 to 0.10.1

2022-12-28 Thread GitBox


amitbans commented on issue #7574:
URL: https://github.com/apache/hudi/issues/7574#issuecomment-1366785404

   https://user-images.githubusercontent.com/6244582/209845108-4b1ae9f9-6b88-457e-a16b-ae3ea72c6dc0.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xccui opened a new pull request, #7575: [MINOR] Set engine when creating meta writer config

2022-12-28 Thread GitBox


xccui opened a new pull request, #7575:
URL: https://github.com/apache/hudi/pull/7575

   ### Change Logs
   
   Properly set engine type when creating `MetadataWriteConfig` from 
`HoodieWriteConfig`.
   
   ### Impact
   
   Won't get warning `Embedded timeline server is disabled, fallback to use 
direct marker type for spark`
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


xushiyan commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058482453


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java:
##
@@ -277,7 +277,7 @@ protected void writeTableMetadata(HoodieTable table, String 
instantTime, String
* Initialized the metadata table on start up, should only be called once on 
driver.
*/
   public void initMetadataTable() {
-((HoodieFlinkTableServiceClient) 
tableServiceClient).initMetadataTable();
+((HoodieFlinkHoodieTableServiceClient) 
tableServiceClient).initMetadataTable();

Review Comment:
   redundant `Hoodie` in the name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


yuzhaojing commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058484066


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java:
##
@@ -277,7 +277,7 @@ protected void writeTableMetadata(HoodieTable table, String 
instantTime, String
* Initialized the metadata table on start up, should only be called once on 
driver.
*/
   public void initMetadataTable() {
-((HoodieFlinkTableServiceClient) 
tableServiceClient).initMetadataTable();
+((HoodieFlinkHoodieTableServiceClient) 
tableServiceClient).initMetadataTable();

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


xushiyan commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058485340


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java:
##
@@ -277,7 +277,7 @@ protected void writeTableMetadata(HoodieTable table, String 
instantTime, String
* Initialized the metadata table on start up, should only be called once on 
driver.
*/
   public void initMetadataTable() {
-((HoodieFlinkTableServiceClient) 
tableServiceClient).initMetadataTable();
+((HoodieFlinkHoodieTableServiceClient) 
tableServiceClient).initMetadataTable();

Review Comment:
   it has the same problem with Java and Spark table service client classes. It 
should be HoodieJavaTableServiceClient and HoodieSparkRDDTableServiceClient



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


xushiyan commented on code in PR #6732:
URL: https://github.com/apache/hudi/pull/6732#discussion_r1058485652


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java:
##
@@ -277,7 +277,7 @@ protected void writeTableMetadata(HoodieTable table, String 
instantTime, String
* Initialized the metadata table on start up, should only be called once on 
driver.
*/
   public void initMetadataTable() {
-((HoodieFlinkTableServiceClient) 
tableServiceClient).initMetadataTable();
+((HoodieFlinkHoodieTableServiceClient) 
tableServiceClient).initMetadataTable();

Review Comment:
   the diagram needs update too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex opened a new pull request, #7576: attempt at ssl implementation

2022-12-28 Thread GitBox


jonvex opened a new pull request, #7576:
URL: https://github.com/apache/hudi/pull/7576

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive

2022-12-28 Thread GitBox


hudi-bot commented on PR #7385:
URL: https://github.com/apache/hudi/pull/7385#issuecomment-1366807963

   
   ## CI report:
   
   * 9882c15708236cd4b66a9c54329f055db846ade8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14016)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.

2022-12-28 Thread GitBox


hudi-bot commented on PR #7572:
URL: https://github.com/apache/hudi/pull/7572#issuecomment-1366808337

   
   ## CI report:
   
   * 5a4fae5d3d42446c19894406dc53a4d7327a9b48 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14018)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366811068

   
   ## CI report:
   
   * 64ecea100e226b7fd539cab05c03bc9902e36db1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14015)
 
   * c20aa589730546c0c7bb82969c92aa6d364af101 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7575: [MINOR] Set engine when creating meta write config

2022-12-28 Thread GitBox


hudi-bot commented on PR #7575:
URL: https://github.com/apache/hudi/pull/7575#issuecomment-1366811719

   
   ## CI report:
   
   * a35c9c05aec17c775e39c0472fbe952178b2f60e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7576: attempt at ssl implementation

2022-12-28 Thread GitBox


hudi-bot commented on PR #7576:
URL: https://github.com/apache/hudi/pull/7576#issuecomment-1366811749

   
   ## CI report:
   
   * f665a724e2450d7c27f6e3d44bb443404f114036 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-28 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1366814215

   
   ## CI report:
   
   * 64ecea100e226b7fd539cab05c03bc9902e36db1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14015)
 
   * c20aa589730546c0c7bb82969c92aa6d364af101 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14020)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7575: [MINOR] Set engine when creating meta write config

2022-12-28 Thread GitBox


hudi-bot commented on PR #7575:
URL: https://github.com/apache/hudi/pull/7575#issuecomment-1366814849

   
   ## CI report:
   
   * a35c9c05aec17c775e39c0472fbe952178b2f60e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14021)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7576: attempt at ssl implementation

2022-12-28 Thread GitBox


hudi-bot commented on PR #7576:
URL: https://github.com/apache/hudi/pull/7576#issuecomment-1366814866

   
   ## CI report:
   
   * f665a724e2450d7c27f6e3d44bb443404f114036 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14022)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5485) Improve performance of savepoint with MDT

2022-12-28 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5485:
---

 Summary: Improve performance of savepoint with MDT
 Key: HUDI-5485
 URL: https://issues.apache.org/jira/browse/HUDI-5485
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata

2022-12-28 Thread GitBox


hudi-bot commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1366853421

   
   ## CI report:
   
   * bfec3be3263c21f7533fc16e19eec3598617a5bf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14019)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Shagish opened a new issue, #7577: [SUPPORT]

2022-12-28 Thread GitBox


Shagish opened a new issue, #7577:
URL: https://github.com/apache/hudi/issues/7577

   Hi Team
   
   We are facing an issue in our Prod environment for Hoodie table
   The application was running fine, and it was writing to the hoodie table and 
all sudden the application failed
   when we are trying to bring back the application. it run for 5-10 mints and 
while writing to the file it throws an error.
   Below are the details 
   What table type cow or mor -  MOR
   What spark version - 3.2.1
   What hudi version - 0.11.0
   Where r u running spark jobs - In EMR 6.7.0
   What is Hadoop version
   What were you trying to
   The application is a Spark Hoodie streaming Job. It reads the message from 
the Kafka topic, process the message and then write to hoodie table. The 
application Runs for a while and later while writing the data to the Hoodie 
table, it fails with file not found exception
   The file which it complains as not found is very old (12/01/2022) parquet 
file.
   What have you tried
   We tried with changing the Hoodie properties and restarting the steps, but 
it is failed
   
   Below is the log details 
   
   022-12-22 22:52:32 INFO  YarnClusterScheduler:57 - Killing all running tasks 
in stage 497: Stage cancelled
   2022-12-22 22:52:33 INFO  DAGScheduler:57 - ResultStage 497 (start at 
Application.java:101) failed in 2.542 s due to Job aborted due to stage 
failure: Task 0 in stage 497.0 failed 4 times, most recent failure: Lost task 
0.3 in stage 497.0 (TID 762) 
([ip-10-220-71-253.emr.awsw.cld.ds.dtvops.net](http://ip-10-220-71-253.emr.awsw.cld.ds.dtvops.net/)
 executor 2): org.apache.hudi.exception.HoodieUpsertException: Error upserting 
bucketType UPDATE for partition :0
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleInsertPartition(BaseSparkCommitActionExecutor.java:335)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:246)
at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
at 
[org.apache.spark.storage.BlockManager.org](http://org.apache.spark.storage.blockmanager.org/)$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:133)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieException: 
java.io.FileNotFoundException: No such file or directory 
's3://X/up_md_info/table/df245ac4-eafb-491b-8f5f-fcbb920b30ee-0_20-1773-8703_20221201102406279.parquet'
at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2022-12-28 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5485:

Fix Version/s: 0.13.0

> Improve performance of savepoint with MDT
> -
>
> Key: HUDI-5485
> URL: https://issues.apache.org/jira/browse/HUDI-5485
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Critical
> Fix For: 0.13.0
>
>
> [https://github.com/apache/hudi/issues/7541]
> When metadata table is enabled, the savepoint operation is slow for a large 
> number of partitions (e.g., 75k).  The root cause is that for each partition, 
> the metadata table is scanned, which is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2022-12-28 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5485:

Description: 
[https://github.com/apache/hudi/issues/7541]

When metadata table is enabled, the savepoint operation is slow for a large 
number of partitions (e.g., 75k).  The root cause is that for each partition, 
the metadata table is scanned, which is unnecessary.

> Improve performance of savepoint with MDT
> -
>
> Key: HUDI-5485
> URL: https://issues.apache.org/jira/browse/HUDI-5485
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>
> [https://github.com/apache/hudi/issues/7541]
> When metadata table is enabled, the savepoint operation is slow for a large 
> number of partitions (e.g., 75k).  The root cause is that for each partition, 
> the metadata table is scanned, which is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5485) Improve performance of savepoint with MDT

2022-12-28 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-5485:
---

Assignee: Ethan Guo

> Improve performance of savepoint with MDT
> -
>
> Key: HUDI-5485
> URL: https://issues.apache.org/jira/browse/HUDI-5485
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Critical
> Fix For: 0.13.0
>
>
> [https://github.com/apache/hudi/issues/7541]
> When metadata table is enabled, the savepoint operation is slow for a large 
> number of partitions (e.g., 75k).  The root cause is that for each partition, 
> the metadata table is scanned, which is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2022-12-28 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5485:

Priority: Critical  (was: Major)

> Improve performance of savepoint with MDT
> -
>
> Key: HUDI-5485
> URL: https://issues.apache.org/jira/browse/HUDI-5485
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Critical
>
> [https://github.com/apache/hudi/issues/7541]
> When metadata table is enabled, the savepoint operation is slow for a large 
> number of partitions (e.g., 75k).  The root cause is that for each partition, 
> the metadata table is scanned, which is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on a diff in pull request #7569: [DOCS] Add aws module dependency for config generation and update new configs

2022-12-28 Thread GitBox


yihua commented on code in PR #7569:
URL: https://github.com/apache/hudi/pull/7569#discussion_r1058553590


##
hudi-utils/pom.xml:
##
@@ -55,6 +55,13 @@
 ${hudi.version}
 
 
+

Review Comment:
   nit: remove the empty line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5486) Update 0.12.x release notes with Long Term Support

2022-12-28 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5486:

Fix Version/s: 0.12.2

> Update 0.12.x release notes with Long Term Support 
> ---
>
> Key: HUDI-5486
> URL: https://issues.apache.org/jira/browse/HUDI-5486
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.12.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5486) Update 0.12.x release notes with Long Term Support

2022-12-28 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5486:
---

 Summary: Update 0.12.x release notes with Long Term Support 
 Key: HUDI-5486
 URL: https://issues.apache.org/jira/browse/HUDI-5486
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5486) Update 0.12.x release notes with Long Term Support

2022-12-28 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-5486:
---

Assignee: Ethan Guo

> Update 0.12.x release notes with Long Term Support 
> ---
>
> Key: HUDI-5486
> URL: https://issues.apache.org/jira/browse/HUDI-5486
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5486) Update 0.12.x release notes with Long Term Support

2022-12-28 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5486:

Fix Version/s: 0.13.0
   (was: 0.12.2)

> Update 0.12.x release notes with Long Term Support 
> ---
>
> Key: HUDI-5486
> URL: https://issues.apache.org/jira/browse/HUDI-5486
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >