Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]

2023-11-01 Thread via GitHub


cuibo01 commented on PR #9959:
URL: https://github.com/apache/hudi/pull/9959#issuecomment-1790158935

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7022) RunClusteringProcedure support limit parameter

2023-11-01 Thread kwang (Jira)
kwang created HUDI-7022:
---

 Summary: RunClusteringProcedure support limit parameter
 Key: HUDI-7022
 URL: https://issues.apache.org/jira/browse/HUDI-7022
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang


Since clustering plan generation is non-blocking, all pending clustering plans 
will be executed at once when using `call run_clustering(table => '$table', op 
=> 'execute')` in default. Add limit parameter to controll it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] Flink CDC to HUDI cannot handle rowKind correctly [hudi]

2023-11-01 Thread via GitHub


zdl1 commented on issue #9940:
URL: https://github.com/apache/hudi/issues/9940#issuecomment-1790147281

   > with
   
   Sorry, I think the ts field for +U and -D is not same, they are 15000 and 
17000, and I got this from the Flink Web UI:
   https://github.com/apache/hudi/assets/149354640/6ce2e1fa-80eb-4d46-b0e1-2eaf52adf7fd";>
   it only get 3 hoodie record, but there are 4 rowdata
   Do you have any ideas?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-1790143831

   
   ## CI report:
   
   * fbef3b537a227e95a70d2c818c2e9ac5157fac62 UNKNOWN
   * 112d48336b49d830401aa59f25edf14537871e96 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20628)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-1790136471

   
   ## CI report:
   
   * fbef3b537a227e95a70d2c818c2e9ac5157fac62 UNKNOWN
   * ac92a41ec6733e537c100e715d1e743e83e50fb9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


LXin96 commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-1790118907

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9946:
URL: https://github.com/apache/hudi/pull/9946#discussion_r1379622763


##
packaging/hudi-flink-bundle/pom.xml:
##
@@ -84,7 +84,6 @@
   org.apache.hudi:hudi-sync-common
   org.apache.hudi:hudi-hadoop-mr
   org.apache.hudi:hudi-timeline-service
-  org.apache.hudi:hudi-aws
 

Review Comment:
   Makes sense to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7013) Drop table command cannot delete dir when purge is enable

2023-11-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7013:
-
Fix Version/s: 1.0.0

> Drop table command cannot delete dir when purge is enable 
> --
>
> Key: HUDI-7013
> URL: https://issues.apache.org/jira/browse/HUDI-7013
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Affects Versions: 0.14.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.14.1
>
>
> Drop table command cannot delete dir when purge is enable 
> In some case even when purge is set true, truncate or drop but dir cannot be 
> deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7013) Drop table command cannot delete dir when purge is enable

2023-11-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7013.

Resolution: Fixed

Fixed via master branch: 08361606de1a8a4b1a5471c8186ea1c8aa79eb9a

> Drop table command cannot delete dir when purge is enable 
> --
>
> Key: HUDI-7013
> URL: https://issues.apache.org/jira/browse/HUDI-7013
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Affects Versions: 0.14.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.14.1
>
>
> Drop table command cannot delete dir when purge is enable 
> In some case even when purge is set true, truncate or drop but dir cannot be 
> deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7013] Drop table command cannot delete dir even when purge is enable [hudi]

2023-11-01 Thread via GitHub


danny0405 merged PR #9960:
URL: https://github.com/apache/hudi/pull/9960


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7013] Drop table command cannot delete dir when purge is enable (#9960)

2023-11-01 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 08361606de1 [HUDI-7013] Drop table command cannot delete dir when 
purge is enable (#9960)
08361606de1 is described below

commit 08361606de1a8a4b1a5471c8186ea1c8aa79eb9a
Author: xuzifu666 
AuthorDate: Thu Nov 2 13:46:03 2023 +0800

[HUDI-7013] Drop table command cannot delete dir when purge is enable 
(#9960)

Co-authored-by: xuyu <11161...@vivo.com>
---
 hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java 
b/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
index 5d9896f8f2a..954fe75a0ac 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
@@ -702,7 +702,7 @@ public class FSUtils {
 pairOfSubPathAndConf -> deleteSubPath(
 pairOfSubPathAndConf.getKey(), 
pairOfSubPathAndConf.getValue(), true)
 );
-boolean result = fs.delete(dirPath, false);
+boolean result = fs.delete(dirPath, true);
 LOG.info("Removed directory at " + dirPath);
 return result;
   }



[jira] [Closed] (HUDI-6991) Fix parquet file size reset error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD

2023-11-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6991.

Resolution: Fixed

Fixed via master branch: 7a55ad341b69df4d2e04d56687e591612103c0b4

> Fix parquet file size reset error in 
> SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD
> --
>
> Key: HUDI-6991
> URL: https://issues.apache.org/jira/browse/HUDI-6991
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6991) Fix parquet file size reset error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD

2023-11-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6991:
-
Fix Version/s: 1.0.0

> Fix parquet file size reset error in 
> SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD
> --
>
> Key: HUDI-6991
> URL: https://issues.apache.org/jira/browse/HUDI-6991
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-6991] Fix hoodie.parquet.max.file.size conf reset error (#9924)

2023-11-01 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 7a55ad341b6 [HUDI-6991] Fix hoodie.parquet.max.file.size conf reset 
error (#9924)
7a55ad341b6 is described below

commit 7a55ad341b69df4d2e04d56687e591612103c0b4
Author: ksmou <135721692+ks...@users.noreply.github.com>
AuthorDate: Thu Nov 2 13:44:30 2023 +0800

[HUDI-6991] Fix hoodie.parquet.max.file.size conf reset error (#9924)
---
 .../SparkSortAndSizeExecutionStrategy.java |   4 +-
 .../functional/TestSparkSortAndSizeClustering.java | 167 +
 2 files changed, 169 insertions(+), 2 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkSortAndSizeExecutionStrategy.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkSortAndSizeExecutionStrategy.java
index 85ee7ec9d4b..843a638e4cf 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkSortAndSizeExecutionStrategy.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkSortAndSizeExecutionStrategy.java
@@ -68,7 +68,7 @@ public class SparkSortAndSizeExecutionStrategy
 .withBulkInsertParallelism(numOutputGroups)
 .withProps(getWriteConfig().getProps()).build();
 
-newConfig.setValue(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE, 
String.valueOf(getWriteConfig().getClusteringMaxBytesInGroup()));
+newConfig.setValue(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE, 
String.valueOf(getWriteConfig().getClusteringTargetFileMaxBytes()));
 
 BulkInsertPartitioner> partitioner = 
getRowPartitioner(strategyParams, schema);
 Dataset repartitionedRecords = 
partitioner.repartitionRecords(inputRecords, numOutputGroups);
@@ -92,7 +92,7 @@ public class SparkSortAndSizeExecutionStrategy
 .withBulkInsertParallelism(numOutputGroups)
 .withProps(getWriteConfig().getProps()).build();
 
-newConfig.setValue(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE, 
String.valueOf(getWriteConfig().getClusteringMaxBytesInGroup()));
+newConfig.setValue(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE, 
String.valueOf(getWriteConfig().getClusteringTargetFileMaxBytes()));
 
 return (HoodieData) 
SparkBulkInsertHelper.newInstance().bulkInsert(inputRecords, instantTime, 
getHoodieTable(),
 newConfig, false, getRDDPartitioner(strategyParams, schema), true, 
numOutputGroups, new CreateHandleFactory(shouldPreserveHoodieMetadata));
diff --git 
a/hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestSparkSortAndSizeClustering.java
 
b/hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestSparkSortAndSizeClustering.java
new file mode 100644
index 000..b1e7765fc8b
--- /dev/null
+++ 
b/hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestSparkSortAndSizeClustering.java
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional;
+
+import org.apache.hudi.avro.model.HoodieClusteringGroup;
+import org.apache.hudi.avro.model.HoodieClusteringPlan;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.config.HoodieStorageConfig;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.model.HoodieWriteStat;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.view.FileSystemViewStorageConfig;
+import org.apache.hudi.common.table.view.FileSystemViewStorageType;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.ClusteringUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieClusteringC

Re: [PR] [HUDI-6991] Fix hoodie.parquet.max.file.size conf reset error [hudi]

2023-11-01 Thread via GitHub


danny0405 merged PR #9924:
URL: https://github.com/apache/hudi/pull/9924


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


rmahindra123 commented on code in PR #9913:
URL: https://github.com/apache/hudi/pull/9913#discussion_r1379620190


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -297,13 +286,20 @@ public StreamSync(HoodieStreamer.Config cfg, SparkSession 
sparkSession, SchemaPr
   this.errorTableWriter = ErrorTableUtils.getErrorTableWriter(cfg, 
sparkSession, props, hoodieSparkContext, fs);
   this.errorWriteFailureStrategy = 
ErrorTableUtils.getErrorWriteFailureStrategy(props);
 }
-this.formatAdapter = new SourceFormatAdapter(
-UtilHelpers.createSource(cfg.sourceClassName, props, 
hoodieSparkContext.jsc(), sparkSession, schemaProvider, metrics),
-this.errorTableWriter, Option.of(props));
+Source source = UtilHelpers.createSource(cfg.sourceClassName, props, 
hoodieSparkContext.jsc(), sparkSession, schemaProvider, metrics);
+this.formatAdapter = new SourceFormatAdapter(source, 
this.errorTableWriter, Option.of(props));
 
 this.transformer = 
UtilHelpers.createTransformer(Option.ofNullable(cfg.transformerClassNames),
 
Option.ofNullable(schemaProvider).map(SchemaProvider::getSourceSchema), 
this.errorTableWriter.isPresent());
-
+if (this.cfg.operation == WriteOperationType.BULK_INSERT && 
source.getSourceType() == Source.SourceType.ROW
+&& this.props.getBoolean("hoodie.datasource.write.row.writer.enable", 
false)) {

Review Comment:
   Can you use the config variable instead of using the actual string?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9913:
URL: https://github.com/apache/hudi/pull/9913#issuecomment-1790091376

   
   ## CI report:
   
   * 5eb4bf14d826e60c412078762aa061f415bac51d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20630)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9904:
URL: https://github.com/apache/hudi/pull/9904#issuecomment-1790091327

   
   ## CI report:
   
   * 62b7696970bac4382a9b6467721de915116fb3a5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20629)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


LXin96 commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-1790068243

   @danny0405 OK, it get you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-01 Thread via GitHub


PrabhuJoseph commented on code in PR #9946:
URL: https://github.com/apache/hudi/pull/9946#discussion_r1379587122


##
packaging/hudi-flink-bundle/pom.xml:
##
@@ -84,7 +84,6 @@
   org.apache.hudi:hudi-sync-common
   org.apache.hudi:hudi-hadoop-mr
   org.apache.hudi:hudi-timeline-service
-  org.apache.hudi:hudi-aws
 

Review Comment:
   Yes, I also think reflection will be a better choice.
   
   1. If the user configures "hive_sync.mode" to "hms", then only 
hudi-flink-bundle is needed.
   2. If the user configures "hive_sync.mode" to "glue", then both 
hudi-flink-bundle and hudi-aws-bundle is needed. This needs to be documented 
[here](https://hudi.apache.org/docs/syncing_aws_glue_data_catalog).
   
   I see ReflectionUtils and SyncUtilHelpers have provided ways to easily load 
the class. I will use them and submit a new revision if you are also fine with 
this discussed approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] hudi 0.14 data process over,but not exit and OMM [hudi]

2023-11-01 Thread via GitHub


zyclove opened a new issue, #9973:
URL: https://github.com/apache/hudi/issues/9973

   
   
   **Describe the problem you faced**
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. SQL
   ```java
   CREATE  TABLE if NOT EXISTS bi_ods_real.smart_datapoint_report_rw_clear_rt(
 id STRING COMMENT 'id',
 uuid STRING COMMENT 'log uuid',
 data_id STRING,
 dev_id STRING COMMENT '设备id',
 gw_id STRING ,
 product_id STRING,
 uid STRING  COMMENT '用户ID',
 dp_code STRING,
 dp_id STRING COMMENT 'dp点',
 gmtModified STRING,
 dp_mode STRING ,
 dp_name STRING ,
 dp_time STRING ,
 dp_type STRING ,
 dp_value STRING ,
 gmt_modified bigint COMMENT 'ct 时间',
 dt STRING COMMENT '时间分区字段'
   )
   using hudi 
   tblproperties (
 type = 'mor',
 primaryKey = 'id',
 hoodie.combine.before.upsert='false',
 hoodie.bucket.index.num.buckets=50,
 preCombineField = 'gmt_modified',
 hoodie.compact.inline='false',
 hoodie.common.spillable.diskmap.type='ROCKS_DB',
 hoodie.datasource.write.partitionpath.field='dt,dp_mode'
)
   PARTITIONED BY (dt,dp_mode)
   COMMENT '';
   
   add jar 
/opt/resource/sucx/hadoop-analyse-udf-0.0.100-jar-with-dependencies.jar;
   set 
hoodie.write.lock.zookeeper.lock_key=bi_ods_real.smart_datapoint_report_rw_clear_rt;
   set hoodie.insert.shuffle.parallelism = 400;
   set hoodie.upsert.shuffle.parallelism = 400;
   set hoodie.delete.shuffle.parallelism = 400;
   set hoodie.write.markers.type=TIMELINE_SERVER_BASED;
   
   create temporary function toJsonArray as 'com.udf.ToJsonArray';
   
   create temporary function dataPointExplode as 'com.udf.DataPointExplode';
   
   call 
copy_to_temp_view(table=>'bi_ods_real.ods_log_smart_datapoint_report_batch_rt',view_name=>'report_view',query_type=>'incremental',begin_instance_time=>'2023103109350',end_instance_time=>'2023103110050');
   
   set hoodie.sql.insert.mode=non-strict;
   insert into bi_ods_real.smart_datapoint_report_rw_clear_rt 
   select
/*+ coalesce(400) */
 
md5(concat(coalesce(data_id,''),coalesce(dev_id,''),coalesce(gw_id,''),coalesce(product_id,''),coalesce(uid,''),coalesce(dp_code,''),coalesce(dp_id,''),coalesce(gmtModified,''),if(dp_mode
 in 
('ro','rw','wr'),dp_mode,'un'),coalesce(dp_name,''),coalesce(dp_time,''),coalesce(dp_type,''),coalesce(dp_value,''),coalesce(ct,'')))
 as id, 
 _hoodie_record_key as uuid,
 data_id,dev_id,gw_id,product_id,uid,
 dp_code,dp_id,gmtModified,if(dp_mode in ('ro','rw','wr'),dp_mode,'un') 
as dp_mode ,dp_name,dp_time,dp_type,dp_value,
 ct as gmt_modified,
 case 
 when length(ct)=10 then 
date_format(from_unixtime(ct),'MMddHH')  
 when length(ct)=13 then 
date_format(from_unixtime(ct/1000),'MMddHH') 
 else '1970010100' end as dt
   from 
   report_view  
   lateral  view dataPointExplode(split(value,'\001')[0]) dps as ct, 
data_id, dev_id, gw_id, product_id, uid, dp_code, dp_id, gmtModified, dp_mode, 
dp_name, dp_time, dp_type, dp_value
   where _hoodie_commit_time >2023103109350 and 
_hoodie_commit_time<=2023103110050
   ``` 
   2. 
   spark-sql  -f /tmp/VOLCANO_JOB_1698198652813_020309.sql --master yarn 
--driver-memory 8g --conf spark.driver.memoryOverhead=8G --num-executors 10 
--conf spark.dynamicAllocation.maxExecutors=20 --executor-memory 8G 
--executor-cores 2 --packages 
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.0 --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension 
--conf 
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog 
--conf spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar --conf 
spark.sql.autoBroadcastJoinThreshold=2G --conf spark.memory.storageFraction=0.7 
--conf spark.memory.storageFraction=0.75 --conf 
spark.sql.broadcastTimeout=6 --conf spark.yarn.priority=5 --conf 
spark.sql.broadcastTimeout=60 --conf spark.network.timeout=60 --conf 
spark.eventLog.enable=false --conf spark.driver.maxResultSize=4g
   
   
   
   **Expected behavior**
   ```
   23/11/02 02:20:16 INFO S3NativeFileSystem: Opening 
's3://big-data-us/hudi/bi/bi_ods_real/smart_datapoint_report_rw_clear_rt/.hoodie/metadata/files/files-_0-4-4_20231101184035767001.hfile'
 for reading
   23/11/02 02:21:12 INFO AsyncEventQueue: Process of event 
SparkListenerExecutorMetricsUpdate(driver,WrappedArray(),Map((-1,-1) -> 
org.apache.spark.executor.ExecutorMetrics@4de4ff12)) by listener 
SQLAppStatusListener took 3.068272976s.
   23/11/02 02:23:08 INFO AsyncEventQueue: Process of event 
SparkListenerExecutorMetricsUpdate(driver,WrappedArray(),Map((-1,-1) -> 
org.apache.spark.executor.ExecutorMetrics@141289a6)) by listener 
SQLAppStatusListener took 8.345486202s.
   23/11/02 02:24:28 INFO AsyncEventQueue: Process of event 
SparkLi

Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9923:
URL: https://github.com/apache/hudi/pull/9923#issuecomment-1790041117

   
   ## CI report:
   
   * 2f1b6536c1456fd0211740c90542bf25f53d1010 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20599)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-1790041262

   
   ## CI report:
   
   * fbef3b537a227e95a70d2c818c2e9ac5157fac62 UNKNOWN
   * 112d48336b49d830401aa59f25edf14537871e96 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20628)
 
   * ac92a41ec6733e537c100e715d1e743e83e50fb9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9946:
URL: https://github.com/apache/hudi/pull/9946#discussion_r1379565166


##
packaging/hudi-flink-bundle/pom.xml:
##
@@ -84,7 +84,6 @@
   org.apache.hudi:hudi-sync-common
   org.apache.hudi:hudi-hadoop-mr
   org.apache.hudi:hudi-timeline-service
-  org.apache.hudi:hudi-aws
 

Review Comment:
   Then what is the suggested way to load the `AwsGlueCatalogSyncTool`, maybe 
by reflection?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9871:
URL: https://github.com/apache/hudi/pull/9871#discussion_r1379469734


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java:
##
@@ -270,13 +270,41 @@ private List getCommitInstantsToArchive() 
throws IOException {
 // stop at first savepoint commit
 return !firstSavepoint.isPresent() || 
compareTimestamps(s.getTimestamp(), LESSER_THAN, 
firstSavepoint.get().getTimestamp());
   }
-}).filter(s -> earliestInstantToRetain
-.map(instant -> compareTimestamps(s.getTimestamp(), LESSER_THAN, 
instant.getTimestamp()))
+})
+.filter(s -> earliestInstantToRetain.map(
+instant -> compareTimestamps(s.getTimestamp(), LESSER_THAN, 
instant.getTimestamp())
+// for the compaction c2 in metadata table triggered by commit 
c1 in data table,
+// c2.getTimestamp() > c1.getTimestamp() because: c2 happens 
before c1 completes,
+// and we are generating new instant time for c2 after c1 has 
started. Effectively,

Review Comment:
   > Lines 208-222 is fine as it is trying to get the earliest candidate 
instant to retain. The main filtering happens after all such candidate instants 
have been collected. I think the fix should be where the filtering happens.
   
   Still think line 208 ~ 222 is the right place to fix. It actually does the 
similar thing before by filtering the instants with MDT compaction instant, why 
not just fix it by using the completion time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Flink CDC to HUDI cannot handle rowKind correctly [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on issue #9940:
URL: https://github.com/apache/hudi/issues/9940#issuecomment-1790025978

   Your ts field is the same for that `+U` and `-D`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Duplicate records when use overwrite mode [hudi]

2023-11-01 Thread via GitHub


njalan commented on issue #9967:
URL: https://github.com/apache/hudi/issues/9967#issuecomment-1790013274

   @ad1happy2go  It not happened every time and it just happened like once in 
two weeks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-1623][FOLLOW_UP] Fix test TestWaitBasedTimeGenerator & refine codes [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9972:
URL: https://github.com/apache/hudi/pull/9972#issuecomment-1790012121

   
   ## CI report:
   
   * b66c87be553dd7f7a5b362b3d808eb66f9c29d40 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20632)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-1790012077

   
   ## CI report:
   
   * fbef3b537a227e95a70d2c818c2e9ac5157fac62 UNKNOWN
   * ca7c98a42bab1fedf58629952f6d37574116e79e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20612)
 
   * 112d48336b49d830401aa59f25edf14537871e96 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20628)
 
   * ac92a41ec6733e537c100e715d1e743e83e50fb9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9923:
URL: https://github.com/apache/hudi/pull/9923#issuecomment-1790011976

   
   ## CI report:
   
   * 2f1b6536c1456fd0211740c90542bf25f53d1010 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20599)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-01 Thread via GitHub


zhuanshenbsj1 commented on PR #9923:
URL: https://github.com/apache/hudi/pull/9923#issuecomment-1790011726

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5591] HoodieSparkSqlWriter#getHiveTableNames needs to consider … [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #7718:
URL: https://github.com/apache/hudi/pull/7718#issuecomment-1790010506

   
   ## CI report:
   
   * 4864528010313c4bcf59866ff67ef11b8f1e14d0 UNKNOWN
   * ac50108a775e47dc5151eedf6645bccdebf2cb4b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20608)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-1790006788

   
   ## CI report:
   
   * fbef3b537a227e95a70d2c818c2e9ac5157fac62 UNKNOWN
   * ca7c98a42bab1fedf58629952f6d37574116e79e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20612)
 
   * 112d48336b49d830401aa59f25edf14537871e96 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20628)
 
   * ac92a41ec6733e537c100e715d1e743e83e50fb9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-1623][FOLLOW_UP] Fix test TestWaitBasedTimeGenerator & refine codes [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9972:
URL: https://github.com/apache/hudi/pull/9972#issuecomment-1790006848

   
   ## CI report:
   
   * b66c87be553dd7f7a5b362b3d808eb66f9c29d40 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9913:
URL: https://github.com/apache/hudi/pull/9913#issuecomment-1790006657

   
   ## CI report:
   
   * a8dd6df5fbf92a57e4c31e8d0954e0f68c17c9a2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20619)
 
   * 5eb4bf14d826e60c412078762aa061f415bac51d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20630)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5210] Implement functional indexes [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9872:
URL: https://github.com/apache/hudi/pull/9872#issuecomment-1790006589

   
   ## CI report:
   
   * 74cc5ad3650edbb7aee3e98811ee680398bc0dd0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20627)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5591] HoodieSparkSqlWriter#getHiveTableNames needs to consider … [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #7718:
URL: https://github.com/apache/hudi/pull/7718#issuecomment-1790004228

   
   ## CI report:
   
   * 4864528010313c4bcf59866ff67ef11b8f1e14d0 UNKNOWN
   * ac50108a775e47dc5151eedf6645bccdebf2cb4b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20608)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-179744

   
   ## CI report:
   
   * fbef3b537a227e95a70d2c818c2e9ac5157fac62 UNKNOWN
   * ca7c98a42bab1fedf58629952f6d37574116e79e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20612)
 
   * 112d48336b49d830401aa59f25edf14537871e96 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20628)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Flink CDC to HUDI cannot handle rowKind correctly [hudi]

2023-11-01 Thread via GitHub


zdl1 commented on issue #9940:
URL: https://github.com/apache/hudi/issues/9940#issuecomment-1789990183

   > you need to set up the `precombine.field` correctly so that the even 
stream can sort based on the event time, the processing time sequence does not 
guarantee the sementics.
   
   Thanks for your help, after setting up the `precombine.field`, the record 
can be updated, but cannot be deleted, is it correct?
   Here is the input: 
   
   Flink SQL> select * from source;
   | op |  id |   ts |   name | 
   description | weight |
   | +I | 111 |13000 |scooter | 
  Big 2-wheel scooter  |  5.17828338623 |
   | -U | 111 |   13000 |scooter |  
 Big 2-wheel scooter  |  5.17828338623 |
   | +U | 111 |   15000 |scooter |  
 Big 2-wheel scooter  |  5.17076293945 |
   | -D | 111 |17000 |scooter | 
  Big 2-wheel scooter  |  5.17076293945 |
   Received a total of 4 rows
   
   Then insert into a hudi table from this source, 'id' is the pk and 'ts' is 
the precombine.field
   Then query the data from the hudi table:
   **The recore with id "111" is still alive:
   | +I | 111 |   15000 |scooter |  
 Big 2-wheel scooter  |  5.17076293945 |**
   Should it be the correct result? I will appreciate it if you could help me 
with it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-1623][FOLLOW_UP] Fix test TestWaitBasedTimeGenerator & refine codes [hudi]

2023-11-01 Thread via GitHub


boneanxs opened a new pull request, #9972:
URL: https://github.com/apache/hudi/pull/9972

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   1. Change `TestWaitBasedTimeGenerator` to a unique path since if other 
tests(like `TestHoodieActiveTimeline`) initialize `TimeGenerator` firstly, then 
`MockInProcessLockProvider` cannot be used.
   2. Name change `locked` -> `skipLocking` to follow HUDI existing naming 
style.
   3. Fix usage of `DEFAULT_LOCK_ACQUIRE_WAIT_TIMEOUT_MS`
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   None
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   None
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5591] HoodieSparkSqlWriter#getHiveTableNames needs to consider … [hudi]

2023-11-01 Thread via GitHub


LinMingQiang commented on PR #7718:
URL: https://github.com/apache/hudi/pull/7718#issuecomment-1789982470

   
   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9913:
URL: https://github.com/apache/hudi/pull/9913#issuecomment-1789972795

   
   ## CI report:
   
   * a8dd6df5fbf92a57e4c31e8d0954e0f68c17c9a2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20619)
 
   * 5eb4bf14d826e60c412078762aa061f415bac51d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9904:
URL: https://github.com/apache/hudi/pull/9904#issuecomment-1789972737

   
   ## CI report:
   
   * 9aa65644532e1cbc3d7b13bd389dd0b5ea63b5e1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20553)
 
   * 62b7696970bac4382a9b6467721de915116fb3a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20629)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


nsivabalan commented on code in PR #9913:
URL: https://github.com/apache/hudi/pull/9913#discussion_r1379523701


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamerTestBase.java:
##
@@ -634,7 +634,7 @@ static void waitTillCondition(Function 
condition, Future dsFut
 boolean ret = false;
 while (!ret && !dsFuture.isDone()) {
   try {
-Thread.sleep(3000);
+Thread.sleep(2000);

Review Comment:
   it could reduce the total test run time. So reduced the wait time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


nsivabalan commented on code in PR #9913:
URL: https://github.com/apache/hudi/pull/9913#discussion_r1379523846


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamerTestBase.java:
##
@@ -634,7 +634,7 @@ static void waitTillCondition(Function 
condition, Future dsFut
 boolean ret = false;
 while (!ret && !dsFuture.isDone()) {
   try {
-Thread.sleep(3000);
+Thread.sleep(2000);

Review Comment:
   I mean, nothing specific to tests we added. just happened to notice and 
fixed it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9956:
URL: https://github.com/apache/hudi/pull/9956#issuecomment-1789967405

   
   ## CI report:
   
   * fbef3b537a227e95a70d2c818c2e9ac5157fac62 UNKNOWN
   * ca7c98a42bab1fedf58629952f6d37574116e79e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20612)
 
   * 112d48336b49d830401aa59f25edf14537871e96 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9904:
URL: https://github.com/apache/hudi/pull/9904#issuecomment-1789967218

   
   ## CI report:
   
   * 9aa65644532e1cbc3d7b13bd389dd0b5ea63b5e1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20553)
 
   * 62b7696970bac4382a9b6467721de915116fb3a5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9871:
URL: https://github.com/apache/hudi/pull/9871#issuecomment-1789967114

   
   ## CI report:
   
   * a6b85794428dcd7a7b45f28430bfb5f6c42fc910 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20626)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Row writer optimization for bulk insert [hudi]

2023-11-01 Thread via GitHub


nsivabalan closed pull request #9852: Row writer optimization for bulk insert
URL: https://github.com/apache/hudi/pull/9852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


rmahindra123 commented on code in PR #9913:
URL: https://github.com/apache/hudi/pull/9913#discussion_r1379509595


##
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/commit/HoodieStreamerDatasetBulkInsertCommitActionExecutor.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.commit;
+
+import org.apache.hudi.HoodieDatasetBulkInsertHelper;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieData;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.action.HoodieWriteMetadata;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+
+/**
+ * Executor to be used by stream sync. Directly invokes 
HoodieDatasetBulkInsertHelper.bulkInsert so that WriteStatus is

Review Comment:
   nit: use {@link StreamSync}



##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamerTestBase.java:
##
@@ -634,7 +634,7 @@ static void waitTillCondition(Function 
condition, Future dsFut
 boolean ret = false;
 while (!ret && !dsFuture.isDone()) {
   try {
-Thread.sleep(3000);
+Thread.sleep(2000);

Review Comment:
   any reason for this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7014] Optimize the code of BoundedPartitionAwareCompactionStrategy [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9961:
URL: https://github.com/apache/hudi/pull/9961#issuecomment-1789958272

   
   ## CI report:
   
   * 75c27611b7a2ceacf43aa903f5b542fffc7b27cf Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20594)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7010] Build clustering group reduces redundant traversals [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9957:
URL: https://github.com/apache/hudi/pull/9957#issuecomment-1789958221

   
   ## CI report:
   
   * 0c34110238584b4ec7862d8849e5ca5353769051 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20591)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1789958096

   
   ## CI report:
   
   * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-11-01 Thread via GitHub


zhuanshenbsj1 commented on code in PR #9904:
URL: https://github.com/apache/hudi/pull/9904#discussion_r1379507296


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##
@@ -272,6 +272,13 @@ public static boolean 
hasNoSpecificReadCommits(Configuration conf) {
 return !conf.contains(FlinkOptions.READ_START_COMMIT) && 
!conf.contains(FlinkOptions.READ_END_COMMIT);
   }
 
+  /**
+   * Returns whether the read commits limit is specified.
+   */
+  public static boolean isSpecificReadCommitsLimit(Configuration conf) {
+return conf.contains(FlinkOptions.READ_COMMITS_LIMIT);

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1789926773

   
   ## CI report:
   
   * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9871:
URL: https://github.com/apache/hudi/pull/9871#issuecomment-1789926631

   
   ## CI report:
   
   * ea5ea6e38a4eb27a0a061031804dabc42824945c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20609)
 
   * a6b85794428dcd7a7b45f28430bfb5f6c42fc910 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20626)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5210] Implement functional indexes [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9872:
URL: https://github.com/apache/hudi/pull/9872#issuecomment-1789926650

   
   ## CI report:
   
   * 120afe45c4eb6e5ca6fa51dc2614c90f50c3dec0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20611)
 
   * 74cc5ad3650edbb7aee3e98811ee680398bc0dd0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20627)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7014] Optimize the code of BoundedPartitionAwareCompactionStrategy [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9961:
URL: https://github.com/apache/hudi/pull/9961#issuecomment-1789926929

   
   ## CI report:
   
   * 75c27611b7a2ceacf43aa903f5b542fffc7b27cf Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20594)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7010] Build clustering group reduces redundant traversals [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9957:
URL: https://github.com/apache/hudi/pull/9957#issuecomment-1789926898

   
   ## CI report:
   
   * 0c34110238584b4ec7862d8849e5ca5353769051 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20591)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7010] Build clustering group reduces redundant traversals [hudi]

2023-11-01 Thread via GitHub


ksmou commented on PR #9957:
URL: https://github.com/apache/hudi/pull/9957#issuecomment-1789925627

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7014] Optimize the code of BoundedPartitionAwareCompactionStrategy [hudi]

2023-11-01 Thread via GitHub


ksmou commented on PR #9961:
URL: https://github.com/apache/hudi/pull/9961#issuecomment-1789925484

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-01 Thread via GitHub


ksmou commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1789925201

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7013] Drop table command cannot delete dir even when purge is enable [hudi]

2023-11-01 Thread via GitHub


xuzifu666 commented on PR #9960:
URL: https://github.com/apache/hudi/pull/9960#issuecomment-1789924249

   azure seems had some problem,would be cancel soon after run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7013] Drop table command cannot delete dir even when purge is enable [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9960:
URL: https://github.com/apache/hudi/pull/9960#issuecomment-1789919965

   
   ## CI report:
   
   * 0d690d42416e63d40ebd8cf88d5f640e488adf6e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20593)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5210] Implement functional indexes [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9872:
URL: https://github.com/apache/hudi/pull/9872#issuecomment-1789919716

   
   ## CI report:
   
   * 120afe45c4eb6e5ca6fa51dc2614c90f50c3dec0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20611)
 
   * 74cc5ad3650edbb7aee3e98811ee680398bc0dd0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9871:
URL: https://github.com/apache/hudi/pull/9871#issuecomment-1789919679

   
   ## CI report:
   
   * ea5ea6e38a4eb27a0a061031804dabc42824945c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20609)
 
   * a6b85794428dcd7a7b45f28430bfb5f6c42fc910 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1789919512

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * b544b18820ae3fe8fbf1c50a34e561ad36bfbaba Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20624)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7013] Drop table command cannot delete dir even when purge is enable [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9960:
URL: https://github.com/apache/hudi/pull/9960#issuecomment-1789914381

   
   ## CI report:
   
   * 0d690d42416e63d40ebd8cf88d5f640e488adf6e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20593)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1789914037

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * dbb4fd378f88372271167c2b89dc709b87c16e57 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20622)
 
   * b544b18820ae3fe8fbf1c50a34e561ad36bfbaba UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9923:
URL: https://github.com/apache/hudi/pull/9923#discussion_r1379481212


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/TestStreamReadMonitoringFunction.java:
##
@@ -270,8 +270,8 @@ public void testConsumingHollowInstants() throws Exception {
   "All instants should have range limit");
   assertTrue(sourceContext.splits.stream().allMatch(split -> 
isPointInstantRange(split.getInstantRange().get(), c2)),
   "All the splits should have point instant range");
-  assertTrue(sourceContext.splits.stream().allMatch(split -> 
split.getLatestCommit().equals(c2)),
-  "All the splits should be with specified instant time");
+  assertTrue(sourceContext.splits.stream().anyMatch(split -> 
split.getLatestCommit().equals(c2)),
+  "At least one input split's latest commit time should be equal to 
the specified instant time.");

Review Comment:
   Why this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7019) Add instant details consumer to HoodieArchivedTimeline

2023-11-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7019.

Resolution: Fixed

Fixed via master branch: b4f96440b562d575878afc5ff09435637cdac8d0

> Add instant details consumer to HoodieArchivedTimeline
> --
>
> Key: HUDI-7019
> URL: https://issues.apache.org/jira/browse/HUDI-7019
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7019] Add instant details consumer to HoodieArchivedTimeline (#9969)

2023-11-01 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b4f96440b56 [HUDI-7019] Add instant details consumer to 
HoodieArchivedTimeline (#9969)
b4f96440b56 is described below

commit b4f96440b562d575878afc5ff09435637cdac8d0
Author: Danny Chan 
AuthorDate: Thu Nov 2 09:17:36 2023 +0800

[HUDI-7019] Add instant details consumer to HoodieArchivedTimeline (#9969)
---
 .../table/timeline/HoodieArchivedTimeline.java | 38 --
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
index cdffd4c0b3c..c489362ae29 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
@@ -150,31 +150,34 @@ public class HoodieArchivedTimeline extends 
HoodieDefaultTimeline {
 return new HoodieArchivedTimeline(metaClient);
   }
 
-  private HoodieInstant readCommit(String instantTime, GenericRecord record, 
LoadMode loadMode) {
+  private HoodieInstant readCommit(String instantTime, GenericRecord record, 
Option> instantDetailsConsumer) {
 final String action = record.get(ACTION_ARCHIVED_META_FIELD).toString();
 final String completionTime = 
record.get(COMPLETION_TIME_ARCHIVED_META_FIELD).toString();
-loadInstantDetails(record, instantTime, loadMode);
+instantDetailsConsumer.ifPresent(consumer -> consumer.accept(instantTime, 
record));
 return new HoodieInstant(HoodieInstant.State.COMPLETED, action, 
instantTime, completionTime);
   }
 
-  private void loadInstantDetails(GenericRecord record, String instantTime, 
LoadMode loadMode) {
+  @Nullable
+  private BiConsumer getInstantDetailsFunc(LoadMode 
loadMode) {
 switch (loadMode) {
   case METADATA:
-ByteBuffer commitMeta = (ByteBuffer) 
record.get(METADATA_ARCHIVED_META_FIELD);
-if (commitMeta != null) {
-  // in case the entry comes from an empty completed meta file
-  this.readCommits.put(instantTime, commitMeta.array());
-}
-break;
+return (instant, record) -> {
+  ByteBuffer commitMeta = (ByteBuffer) 
record.get(METADATA_ARCHIVED_META_FIELD);
+  if (commitMeta != null) {
+// in case the entry comes from an empty completed meta file
+this.readCommits.put(instant, commitMeta.array());
+  }
+};
   case PLAN:
-ByteBuffer plan = (ByteBuffer) record.get(PLAN_ARCHIVED_META_FIELD);
-if (plan != null) {
-  // in case the entry comes from an empty completed meta file
-  this.readCommits.put(instantTime, plan.array());
-}
-break;
+return (instant, record) -> {
+  ByteBuffer plan = (ByteBuffer) record.get(PLAN_ARCHIVED_META_FIELD);
+  if (plan != null) {
+// in case the entry comes from an empty completed meta file
+this.readCommits.put(instant, plan.array());
+  }
+};
   default:
-// no operation
+return null;
 }
   }
 
@@ -201,7 +204,8 @@ public class HoodieArchivedTimeline extends 
HoodieDefaultTimeline {
   LoadMode loadMode,
   Function commitsFilter) {
 Map instantsInRange = new ConcurrentHashMap<>();
-loadInstants(metaClient, filter, loadMode, commitsFilter, (instantTime, 
avroRecord) -> instantsInRange.putIfAbsent(instantTime, readCommit(instantTime, 
avroRecord, loadMode)));
+Option> instantDetailsConsumer = 
Option.ofNullable(getInstantDetailsFunc(loadMode));
+loadInstants(metaClient, filter, loadMode, commitsFilter, (instantTime, 
avroRecord) -> instantsInRange.putIfAbsent(instantTime, readCommit(instantTime, 
avroRecord, instantDetailsConsumer)));
 ArrayList result = new 
ArrayList<>(instantsInRange.values());
 Collections.sort(result);
 return result;



Re: [PR] [HUDI-7019] Add instant details consumer to HoodieArchivedTimeline [hudi]

2023-11-01 Thread via GitHub


danny0405 merged PR #9969:
URL: https://github.com/apache/hudi/pull/9969


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7019] Add instant details consumer to HoodieArchivedTimeline [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on PR #9969:
URL: https://github.com/apache/hudi/pull/9969#issuecomment-1789912010

   All the test failures are caused by 
`TestWaitBasedTimeGenerator.testSlowerThreadLaterAcquiredLock`, which is known 
to be falky, will merge it soon~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9904:
URL: https://github.com/apache/hudi/pull/9904#discussion_r1379478910


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##
@@ -272,6 +272,13 @@ public static boolean 
hasNoSpecificReadCommits(Configuration conf) {
 return !conf.contains(FlinkOptions.READ_START_COMMIT) && 
!conf.contains(FlinkOptions.READ_END_COMMIT);
   }
 
+  /**
+   * Returns whether the read commits limit is specified.
+   */
+  public static boolean isSpecificReadCommitsLimit(Configuration conf) {
+return conf.contains(FlinkOptions.READ_COMMITS_LIMIT);

Review Comment:
   `isSpecificReadCommitsLimit` -> `hasReadCommitsLimit`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6991] Fix hoodie.parquet.max.file.size conf reset error [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on PR #9924:
URL: https://github.com/apache/hudi/pull/9924#issuecomment-1789905730

   The test is known to be falky: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20581&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7017] Prevent full schema evolution from wrongly falling back t… [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on PR #9966:
URL: https://github.com/apache/hudi/pull/9966#issuecomment-1789902507

   cc @xiarixiaoyao for the review~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7013] Drop table command cannot delete dir even when purge is enable [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on PR #9960:
URL: https://github.com/apache/hudi/pull/9960#issuecomment-1789894304

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9871:
URL: https://github.com/apache/hudi/pull/9871#discussion_r1379469734


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java:
##
@@ -270,13 +270,41 @@ private List getCommitInstantsToArchive() 
throws IOException {
 // stop at first savepoint commit
 return !firstSavepoint.isPresent() || 
compareTimestamps(s.getTimestamp(), LESSER_THAN, 
firstSavepoint.get().getTimestamp());
   }
-}).filter(s -> earliestInstantToRetain
-.map(instant -> compareTimestamps(s.getTimestamp(), LESSER_THAN, 
instant.getTimestamp()))
+})
+.filter(s -> earliestInstantToRetain.map(
+instant -> compareTimestamps(s.getTimestamp(), LESSER_THAN, 
instant.getTimestamp())
+// for the compaction c2 in metadata table triggered by commit 
c1 in data table,
+// c2.getTimestamp() > c1.getTimestamp() because: c2 happens 
before c1 completes,
+// and we are generating new instant time for c2 after c1 has 
started. Effectively,

Review Comment:
   > Lines 208-222 is fine as it is trying to get the earliest candidate 
instant to retain. The main filtering happens after all such candidate instants 
have been collected. I think the fix should be where the filtering happens.
   
   Still think line 208 ~ 222 is the right place to fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Solution for synchronizing the entire database table in flink [hudi]

2023-11-01 Thread via GitHub


bajiaolong commented on issue #9965:
URL: https://github.com/apache/hudi/issues/9965#issuecomment-1789890659

   > How many tables are there in your database, it is feasible if you have 
just handful of tables like 20, then you can consume the Kafka topic and 
partition the stream by table name, for each partitioned stream, you can 
pipeline with hudi sink, you need to write some DataStream pipelines manually, 
take 
https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/HoodiePipeline.java
 for a single table example.
   
   1. Why is the data in the table limited to 20 and what is the reason.
   
   2. Now the data of all tables in my library are synchronized to one table. 
Partition is done with the schema and table name of the database. When reading 
downstream, I filter the table name through stream. However, this method is 
very time-consuming. Is there a stream read operation that only reads fixed 
partitions, so that I can get a single table
   
   3.  Do you have any suggestions for the second one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9937:
URL: https://github.com/apache/hudi/pull/9937#discussion_r1379460737


##
hudi-cli/src/main/java/org/apache/hudi/cli/commands/TableCommand.java:
##
@@ -233,10 +237,28 @@ public String deleteTableConfig(
 return renderOldNewProps(newProps, oldProps);
   }
 
-  @ShellMethod(key = "table change-table-type", value = "Change hudi table 
type, COW_TO_MOR or MOR_TO_COW.")
+  @ShellMethod(key = "table change-table-type", value = "Change hudi table 
type to target type: COW or MOR. "
+  + "Note: before changing to COW, by default this command will execute 
all the pending compactions and execute a full compaction if needed.")
   public String changeTableType(
-  @ShellOption(value = {"--type"},
-  help = "COW_TO_MOR or MOR_TO_COW stands for changing table from COW 
to MOR or from MOR to COW respectively.") final String changeType) {
+  @ShellOption(value = {"--target-type"},
+  help = "the target hoodie table type: COW or MOR") final String 
changeType,
+  @ShellOption(value = {"--enable-compaction"}, defaultValue = "true",
+  help = "Valid in MOR to COW case. Before changing to COW, need to 
perform a full compaction to compact all log files. Default true") final 
boolean enableCompaction,
+  @ShellOption(value = {"--parallelism"}, defaultValue = "3",
+  help = "Valid in MOR to COW case. Parallelism for hoodie 
compaction") final String parallelism,
+  @ShellOption(value = "--schemaFilePath", defaultValue = "",
+  help = "Valid in MOR to COW case. Path for Avro schema file") final 
String schemaFilePath,

Review Comment:
   It can be used as a fallback, the `--schemaFilePath` is awkward to use.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7011] a metric to indicate whether rollback has occurred in final compaction state [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9956:
URL: https://github.com/apache/hudi/pull/9956#discussion_r1379460318


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkCompactionMetrics.java:
##
@@ -110,16 +110,18 @@ public void endCompaction() {
 this.compactionCost = stopTimer(COMPACTION_KEY);
   }
 
-  public void setCompactionFailedState(CompactionState compactionState){
-this.compactionFailedState = compactionState.state;
+  public void markCompactionCompleted(CompactionState compactionState) {
+this.compactionStateSignal = compactionState.state;
+  }

Review Comment:
   There is no need to pass around `CompactionState`, just set up the constant 
1 or 0 internally, the `CompactionState` can be eliminated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1789878422

   
   ## CI report:
   
   * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7012] The BootstrapOperator reduces the memory. [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on code in PR #9959:
URL: https://github.com/apache/hudi/pull/9959#discussion_r1379458224


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java:
##
@@ -155,6 +155,7 @@ protected void preLoadIndexRecords() throws Exception {
 
 // wait for the other bootstrap tasks finish bootstrapping.
 waitForBootstrapReady(getRuntimeContext().getIndexOfThisSubtask());
+hoodieTable = null;
   }

Review Comment:
   sure, can you add some comment to elaborate the details.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Solution for synchronizing the entire database table in flink [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on issue #9965:
URL: https://github.com/apache/hudi/issues/9965#issuecomment-1789875033

   How many tables are there in your database, it is feasible if you have just 
handful of tables like 20, then you can consume the Kafka topic and partition 
the stream by table name, for each partitioned stream, you can pipeline with 
hudi sink, you need to write some DataStream pipelines manually, take 
https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/HoodiePipeline.java
 for a single table example.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1789872472

   
   ## CI report:
   
   * c782b5ebbab7e1f1a2b8a1e7ac1c30c6942e10c5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20610)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1789872215

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * dbb4fd378f88372271167c2b89dc709b87c16e57 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20622)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on PR #9925:
URL: https://github.com/apache/hudi/pull/9925#issuecomment-1789871817

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [MINOR] Delete the duplicated getMetadata method in HoodieTable (#9964)

2023-11-01 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new acc418235e7 [MINOR] Delete the duplicated getMetadata method in 
HoodieTable (#9964)
acc418235e7 is described below

commit acc418235e7c78cb8901edeab6c42639a2112b4b
Author: majian <47964462+majian1...@users.noreply.github.com>
AuthorDate: Thu Nov 2 08:22:58 2023 +0800

[MINOR] Delete the duplicated getMetadata method in HoodieTable (#9964)
---
 .../src/main/java/org/apache/hudi/table/HoodieTable.java  | 4 
 .../test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala  | 2 +-
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
index 36a5e6de21a..c44d3b0f4cb 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
@@ -170,10 +170,6 @@ public abstract class HoodieTable implements 
Serializable {
 return viewManager;
   }
 
-  public HoodieTableMetadata getMetadata() {
-return metadata;
-  }
-
   /**
* Upsert a batch of new records into Hoodie table at the supplied 
instantTime.
* @param contextHoodieEngineContext
diff --git 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala
 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala
index b1973e250f4..17e3cadeeff 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala
@@ -385,7 +385,7 @@ class TestRecordLevelIndex extends RecordLevelIndexTestBase 
{
 doWriteAndValidateDataAndRecordIndex(hudiOpts,
   operation = DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL,
   saveMode = SaveMode.Append)
-val metadataTableFSView = getHoodieTable(metaClient, 
getWriteConfig(hudiOpts)).getMetadata
+val metadataTableFSView = getHoodieTable(metaClient, 
getWriteConfig(hudiOpts)).getMetadataTable
   .asInstanceOf[HoodieBackedTableMetadata].getMetadataFileSystemView
 val compactionTimeline = 
metadataTableFSView.getVisibleCommitsAndCompactionTimeline.filterCompletedAndCompactionInstants()
 val lastCompactionInstant = compactionTimeline



Re: [PR] [MINOR] Delete the duplicated getMetadata method in HoodieTable. [hudi]

2023-11-01 Thread via GitHub


danny0405 merged PR #9964:
URL: https://github.com/apache/hudi/pull/9964


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink [hudi]

2023-11-01 Thread via GitHub


danny0405 commented on issue #9971:
URL: https://github.com/apache/hudi/issues/9971#issuecomment-1789867376

   It looks like your Flink simple hasing index does not really take effect, is 
there any chance you can share the Flink options with us so that we might find 
more clues about the unexcepted discrepancy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-01 Thread via GitHub


nsivabalan commented on code in PR #9743:
URL: https://github.com/apache/hudi/pull/9743#discussion_r1379422146


##
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java:
##
@@ -111,18 +117,22 @@ public static InternalSchema reconcileSchema(Schema 
incomingSchema, InternalSche
 return 
SchemaChangeUtils.applyTableChanges2Schema(internalSchemaAfterAddColumns, 
typeChange);
   }
 
+  public static Schema reconcileSchema(Schema incomingSchema, Schema 
oldTableSchema) {
+return convert(reconcileSchema(incomingSchema, convert(oldTableSchema)), 
oldTableSchema.getFullName());
+  }
+
   /**
-   * Reconciles nullability requirements b/w {@code source} and {@code target} 
schemas,
+   * Reconciles nullability and datatype requirements b/w {@code source} and 
{@code target} schemas,
* by adjusting these of the {@code source} schema to be in-line with the 
ones of the
* {@code target} one
*
* @param sourceSchema source schema that needs reconciliation
* @param targetSchema target schema that source schema will be reconciled 
against
-   * @param opts config options
-   * @return schema (based off {@code source} one) that has nullability 
constraints reconciled
+   * @param opts config options
+   * @return schema (based off {@code source} one) that has nullability 
constraints and datatypes reconciled
*/
-  public static Schema reconcileNullability(Schema sourceSchema, Schema 
targetSchema, Map opts) {
-if (sourceSchema.getFields().isEmpty() || 
targetSchema.getFields().isEmpty()) {
+  public static Schema reconcileSchemaRequirements(Schema sourceSchema, Schema 
targetSchema, Map opts) {
+if (sourceSchema.getType() == Schema.Type.NULL || 
sourceSchema.getFields().isEmpty() || targetSchema.getFields().isEmpty()) {

Review Comment:
   if source schema fields are empty, shouldn't we be returning targetSchema. 



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -145,6 +145,13 @@ object HoodieSparkSqlWriter {
 new HoodieSparkSqlWriterInternal().deduceWriterSchema(sourceSchema, 
latestTableSchemaOpt, internalSchemaOpt, opts)
   }
 
+  def deduceWriterSchema(sourceSchema: Schema,

Review Comment:
   if its static, shouldn't we move them to Object HoodieSparkSqlWriter. 



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -624,17 +635,25 @@ class HoodieSparkSqlWriterInternal {
   } else {
 if (!shouldValidateSchemasCompatibility) {
   // if no validation is enabled, check for col drop
-  // if col drop is allowed, go ahead. if not, check for 
projection, so that we do not allow dropping cols
-  if (allowAutoEvolutionColumnDrop || 
canProject(latestTableSchema, canonicalizedSourceSchema)) {
+  if (allowAutoEvolutionColumnDrop) {
 canonicalizedSourceSchema
   } else {
-log.error(
-  s"""Incoming batch schema is not compatible with the table's 
one.
-   |Incoming schema ${sourceSchema.toString(true)}
-   |Incoming schema (canonicalized) 
${canonicalizedSourceSchema.toString(true)}
-   |Table's schema ${latestTableSchema.toString(true)}
-   |""".stripMargin)
-throw new SchemaCompatibilityException("Incoming batch schema 
is not compatible with the table's one")
+val reconciledSchema = if (addNullForDeletedColumns) {
+  
AvroSchemaEvolutionUtils.reconcileSchema(canonicalizedSourceSchema, 
latestTableSchema)
+} else {
+  canonicalizedSourceSchema
+}
+if (isValidEvolutionOf(reconciledSchema, latestTableSchema)) {
+  reconciledSchema
+} else {
+  log.error(
+s"""Incoming batch schema is not compatible with the 
table's one.
+   |Incoming schema ${sourceSchema.toString(true)}
+   |Incoming schema (canonicalized) 
${reconciledSchema.toString(true)}
+   |Table's schema ${latestTableSchema.toString(true)}
+   |""".stripMargin)
+  throw new SchemaCompatibilityException("Incoming batch 
schema is not compatible with the table's one")
+}

Review Comment:
   also, can you throw light on why we don't call 
AvroSchemaEvolutionUtils.reconcileSchema(canonicalizedSourceSchema, 
latestTableSchema) 
   in else block in L 658. 
   
   ie. when reconcile schema is set to false, and AVRO_SCHEMA_VALIDATE_ENABLE 
is set to true, looks like we don't ever call 
AvroSchemaEvolutionUtils.reconcileSchema(canonicalizedSourceSchema, 
latestTableSchema) . 
   
   also, curious to know whats the diff b/w 
AvroSchemaEvolutionUtils.r

Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1789837854

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * a972c66ddd8790ae58f6a70b826fba8b858b40a3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20621)
 
   * dbb4fd378f88372271167c2b89dc709b87c16e57 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20622)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1789831157

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * a972c66ddd8790ae58f6a70b826fba8b858b40a3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20621)
 
   * dbb4fd378f88372271167c2b89dc709b87c16e57 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1789823937

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * 8b54417859f80e09867bd26beca7cc15744fe192 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20620)
 
   * a972c66ddd8790ae58f6a70b826fba8b858b40a3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9913:
URL: https://github.com/apache/hudi/pull/9913#issuecomment-1789769111

   
   ## CI report:
   
   * a8dd6df5fbf92a57e4c31e8d0954e0f68c17c9a2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20619)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink [hudi]

2023-11-01 Thread via GitHub


joeytman opened a new issue, #9971:
URL: https://github.com/apache/hudi/issues/9971

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Simple Bucket Index behaves differently in Spark vs Flink.  Spark's Hudi 
writer maps records to numeric bucket IDs like `0004906` whereas Flink's hudi 
writer is mapping records to a hexadecimal bucket ID like `af35f7fa`. As a 
result, if you  bootstrap a Hudi table with a batch spark job in bulk insert 
mode and then set up a Flink streaming job to apply CDC updates to the table, 
records are mapped to entirely new bucket IDs. This leads to the old files 
never being updated, creating new buckets as records come in, treating updates 
as inserts and leading to duplicates.
   
   Hudi provides Flink [Index 
Bootstrap](https://hudi.apache.org/docs/hoodie_streaming_ingestion#index-bootstrap)
 to address this issue, but it seems like an incomplete approach. As per the 
docs [here](https://hudi.apache.org/docs/indexing/#flink-based-configs) both 
BUCKET index and FLINK_STATE index are supported. The description of index 
bootstrap states that:
   > When index bootstrap is enabled, the remain records in Hudi table will be 
loaded into the Flink state at one time
   
   This seems to imply that index bootstrap only works for FLINK_STATE index 
type -- as I understand it, we should not need to read all of the records into 
Flink state in order to bootstrap state for BUCKET index, the file group for a 
record should be uniquely identified by hashing the primary key(s). 
   
   So, why do Spark and Flink have differing bucket index behavior? Seems like 
we could avoid index bootstrap if they just used the same hash function and 
same bucket naming convention.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create a Hudi table using Spark with Simple Bucket Index via Bulk Insert 
   2. Configure a Flink pipeline to UPSERT records to the Hudi table, using the 
same settings for bucket index.
   3. Observe as entirely new file groups are created and the existing parquet 
files are never updated
   
   **Expected behavior**
   
   When configuring the Flink pipeline with identical settings to the Spark 
pipeline, the Flink pipeline should recognize existing buckets and map rows to 
the correct buckets created by the Spark bootstrap
   
   **Environment Description**
   
   * EMR version: 6.14
   
   * Hudi version : 0.14.0 (Spark), 0.13.1 (Flink) [Note the discrepancy is due 
to EMR supporting differing versions of Hudi based on execution engine]
   
   * Spark version : 3.4
   
   * Flink version : 1.17
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional Context**
   
   Bootstrapped files look like:
   ```
   2023-10-27 19:50:16  454.2 MiB 
0009-a122-4ea6-8872-1ff613d133c3-0_9-247-0_20231027194527266.parquet
   ```
   
   Flink-written files look like:
   ```
   2023-11-01 14:16:06   60.7 MiB 
d181766b-8624-47dd-b025-a17da16c9296_0-4-1_20231101135648178.parquet
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1789646470

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * 95a05c09af41cd1f570b796f25e753982e571e97 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20577)
 
   * 8b54417859f80e09867bd26beca7cc15744fe192 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20620)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1789634178

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * 95a05c09af41cd1f570b796f25e753982e571e97 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20577)
 
   * 8b54417859f80e09867bd26beca7cc15744fe192 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-01 Thread via GitHub


hudi-bot commented on PR #9913:
URL: https://github.com/apache/hudi/pull/9913#issuecomment-1789544018

   
   ## CI report:
   
   * cef842213bef83d99e50a8babc2224c9904b22eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20536)
 
   * a8dd6df5fbf92a57e4c31e8d0954e0f68c17c9a2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20619)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >