[GitHub] [hudi] hudi-bot commented on pull request #8353: [MINOR] Remove unused code

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8353:
URL: https://github.com/apache/hudi/pull/8353#issuecomment-1493229240

   
   ## CI report:
   
   * cc012c55987f2d62eff0d00d9d5432200167601d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16061)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bithw1 opened a new issue, #8356: [SUPPORT]What is the final for the MOR compaction operation.

2023-04-01 Thread via GitHub


bithw1 opened a new issue, #8356:
URL: https://github.com/apache/hudi/issues/8356

   Hi,
   
   I am running the following flink sql that writes the records to the hudi 
table using flink. I have enabled the compaction option by setting 
`'compaction.async.enabled'='true',`
   
   The whole sql is:
   
   ```
   val create_target_table_sql =
 s"""
  create table $hudi_table_name (
 uuid varchar(20) PRIMARY KEY NOT ENFORCED,  -- 必须指定主键
   name varchar(20),
   age int,
   ts timestamp(3),
   part varchar(20)
   )
   partitioned by (part)
   with (
   'connector' = 'hudi',
   'path' = '$base_path',
'table.type' = 'MERGE_ON_READ',
   
'hoodie.datasource.write.recordkey.field'= 'uuid',
'hoodie.datasource.write.precombine.field' = 'ts',
'write.precombine.field' =  'ts', 
   
'write.tasks'='2',
 'write.bucket_assign.tasks' = '3',
   
'compaction.tasks'='1',
'compaction.async.enabled'='true',
'compaction.schedule.enabled'='true',
'compaction.trigger.strategy'='num_commits',
'compaction.delta_commits'='5',
   )
   ```
   
   While the sql keeps writing data to hudi, I watched the hudi table's 
`.hoodie` directoy, I noticed that there are 
`20230402122706.compaction.requested` and `.20230402122706.compaction.inflight` 
there , but it looks there are no file created when the compaction completes, 
eg: `20230402122706.compaction`, I would ask what file(file naming) will be 
created when the compaction successfully completes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] TranHuyTiep commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

2023-04-01 Thread via GitHub


TranHuyTiep commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1493224689

   > is it can work in your local ? not k8s ?
   yes,  it can work in local
   I set up spark_conf.setMaster("local[*]") can work in k8s but not create 
executor and run in one driver


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8355: [HUDI-6016] HoodieCLIUtils supports creating HoodieClient with non-default database

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8355:
URL: https://github.com/apache/hudi/pull/8355#issuecomment-1493223583

   
   ## CI report:
   
   * a00ea5107f04b2ad26ea37ae5a6e7b462b0aecbf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16063)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8355: [HUDI-6016] HoodieCLIUtils supports creating HoodieClient with non-default database

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8355:
URL: https://github.com/apache/hudi/pull/8355#issuecomment-1493222731

   
   ## CI report:
   
   * a00ea5107f04b2ad26ea37ae5a6e7b462b0aecbf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8349:
URL: https://github.com/apache/hudi/pull/8349#issuecomment-1493221815

   
   ## CI report:
   
   * a1d57ff0e1f246cefa8f078c87dee970ab0f3e5e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16059)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] DavidZ1 commented on issue #8267: [SUPPORT] Why some delta commit logs files are not converted to parquet ?

2023-04-01 Thread via GitHub


DavidZ1 commented on issue #8267:
URL: https://github.com/apache/hudi/issues/8267#issuecomment-1493220098

   We use the offline method for compaction. It is normal to start the 
compaction, but after running for a period of time, we found that the 
compaction is delayed a lot, and the instant time is still yesterday.
   
   Let's take a look at the `FlinkCompactionConfig `parameter configuration. 
Currently, we have not found any parameters that can be tuned.
   
   
![1680408973530](https://user-images.githubusercontent.com/30795397/229331242-30ee4d8a-2be9-46ab-baa7-36126cb33c7d.png)
   
   
![1680409026244](https://user-images.githubusercontent.com/30795397/229331245-8fb078e5-d425-4073-a7cd-35cbbb56c023.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6016) HoodieCLIUtils supports creating HoodieClient with non-default database

2023-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6016:
-
Labels: pull-request-available  (was: )

> HoodieCLIUtils supports creating HoodieClient with non-default database
> ---
>
> Key: HUDI-6016
> URL: https://issues.apache.org/jira/browse/HUDI-6016
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xiaoping.huang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] huangxiaopingRD opened a new pull request, #8355: [HUDI-6016] HoodieCLIUtils supports creating HoodieClient with non-default database

2023-04-01 Thread via GitHub


huangxiaopingRD opened a new pull request, #8355:
URL: https://github.com/apache/hudi/pull/8355

   
   
   ### Change Logs
   
   If the database is not specified, `getHoodieCatalogTable` will default to 
look up the table in the `default` database, if it is not a table in the 
`default` database, it will appear
   `Error in query: Table or view 'hudi_mor_tbl' not found in database 'default'
   `
   This PR is to support the creation of `HoodieClient` in the case of 
`non-default` database.
   
   ### Impact
   
   no
   ### Risk level (write none, low medium or high below)
   
   none
   ### Documentation Update
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] DavidZ1 commented on issue #8267: [SUPPORT] Why some delta commit logs files are not converted to parquet ?

2023-04-01 Thread via GitHub


DavidZ1 commented on issue #8267:
URL: https://github.com/apache/hudi/issues/8267#issuecomment-1493216537

   Yes, I understand, we test the effect of different bucket numbers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6016) HoodieCLIUtils supports creating HoodieClient with non-default database

2023-04-01 Thread xiaoping.huang (Jira)
xiaoping.huang created HUDI-6016:


 Summary: HoodieCLIUtils supports creating HoodieClient with 
non-default database
 Key: HUDI-6016
 URL: https://issues.apache.org/jira/browse/HUDI-6016
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xiaoping.huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] codope commented on a diff in pull request #8076: [HUDI-5884] Support bulk_insert for insert_overwrite and insert_overwrite_table

2023-04-01 Thread via GitHub


codope commented on code in PR #8076:
URL: https://github.com/apache/hudi/pull/8076#discussion_r1155233044


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkSingleFileSortExecutionStrategy.java:
##
@@ -77,8 +78,11 @@ public HoodieData 
performClusteringWithRecordsAsRow(Dataset in
 // Since clustering will write to single file group using 
HoodieUnboundedCreateHandle, set max file size to a large value.
 newConfig.setValue(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE, 
String.valueOf(Long.MAX_VALUE));
 
-return HoodieDatasetBulkInsertHelper.bulkInsert(inputRecords, instantTime, 
getHoodieTable(), newConfig,
-getRowPartitioner(strategyParams, schema), numOutputGroups, 
shouldPreserveHoodieMetadata);
+BulkInsertPartitioner> partitioner = 
getRowPartitioner(strategyParams, schema);
+Dataset repartitionedRecords = 
partitioner.repartitionRecords(inputRecords, numOutputGroups);

Review Comment:
   Doesn't this already happen inside 
`HoodieDatasetBulkInsertHelper.bulkInsert`?



##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala:
##
@@ -88,17 +88,15 @@ object InsertIntoHoodieTableCommand extends Logging with 
ProvidesHoodieConfig wi
   extraOptions: Map[String, String] = Map.empty): Boolean = {
 val catalogTable = new HoodieCatalogTable(sparkSession, table)
 
-var mode = SaveMode.Append
-var isOverWriteTable = false
-var isOverWritePartition = false
-if (overwrite && partitionSpec.isEmpty) {
-  // insert overwrite table
-  mode = SaveMode.Overwrite
-  isOverWriteTable = true
+val mode = if (overwrite) {
+  SaveMode.Overwrite

Review Comment:
   I think that's a good suggestion. cc @nsivabalan @yihua 



##
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/commit/BaseDatasetBulkCommitActionExecutor.java:
##
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.commit;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.HoodieDatasetBulkInsertHelper;
+import org.apache.hudi.client.HoodieWriteResult;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieData;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.util.CommitUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.data.HoodieJavaRDD;
+import org.apache.hudi.exception.HoodieException;
+import 
org.apache.hudi.execution.bulkinsert.BulkInsertInternalPartitionerWithRowsFactory;
+import org.apache.hudi.execution.bulkinsert.NonSortPartitionerWithRows;
+import org.apache.hudi.table.BulkInsertPartitioner;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.HoodieWriteMetadata;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+public abstract class BaseDatasetBulkCommitActionExecutor implements 
Serializable {

Review Comment:
   Do we need this abstraction at a higher layer i.e. in `hudi-client-common`? 
And then maybe extend in hudi-spark-common for Dataset?



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkSortAndSizeExecutionStrategy.java:
##
@@ -69,8 +70,11 @@ public HoodieData 
performClusteringWithRecordsAsRow(Dataset in
 
 newConfig.setValue(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE, 
String.valueOf(getWriteConfig().getClusteringMaxBytesInGroup()));
 
-return HoodieDatasetBulkInsertHelper.bulkInsert(inputRecords, instantTime, 
getHoodieTable(), newConfig,
-getRowPartitioner(strategyParams, schema), numOutputGroups, 
shouldPreserveHoodieMetadata);
+BulkInsertPartitioner> partitioner = 

[GitHub] [hudi] bvaradar commented on pull request #7255: [HUDI-5250] use the estimate record size when estimation threshold is l…

2023-04-01 Thread via GitHub


bvaradar commented on PR #7255:
URL: https://github.com/apache/hudi/pull/7255#issuecomment-1493212690

   @honeyaya : Interested in taking this PR to completion ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #7457: [HUDI-5389] Remove Hudi Cli Duplicates Code.

2023-04-01 Thread via GitHub


bvaradar commented on PR #7457:
URL: https://github.com/apache/hudi/pull/7457#issuecomment-1493212484

   @slfan1989 : Did you accidentally close this PR ? Asking because you 
mentioned you will update the PR. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #7143: [HUDI-5175] Improving FileIndex load performance in PARALLELISM mode

2023-04-01 Thread via GitHub


bvaradar commented on PR #7143:
URL: https://github.com/apache/hudi/pull/7143#issuecomment-1493211715

   @zhangyue19921010 : Thanks for the PR. Please look at the question and 
comments above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5175) Improving FileIndex load performance in PARALLELISM mode

2023-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5175:
-
Labels: pull-request-available  (was: )

> Improving FileIndex load performance in PARALLELISM mode
> 
>
> Key: HUDI-5175
> URL: https://issues.apache.org/jira/browse/HUDI-5175
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] bvaradar commented on a diff in pull request #7143: [HUDI-5175] Improving FileIndex load performance in PARALLELISM mode

2023-04-01 Thread via GitHub


bvaradar commented on code in PR #7143:
URL: https://github.com/apache/hudi/pull/7143#discussion_r1155230011


##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -229,17 +238,93 @@ protected List 
getInputFileSlices(PartitionPath partition) {
   }
 
   private Map> 
loadFileSlicesForPartitions(List partitions) {
-FileStatus[] allFiles = listPartitionPathFiles(partitions);
+Pair, Map> 
partitionFilesPair = listPartitionPathFiles(partitions);
 HoodieTimeline activeTimeline = getActiveTimeline();
 Option latestInstant = activeTimeline.lastInstant();
 
-HoodieTableFileSystemView fileSystemView =
-new HoodieTableFileSystemView(metaClient, activeTimeline, allFiles);
-
 Option queryInstant = specifiedQueryInstant.or(() -> 
latestInstant.map(HoodieInstant::getTimestamp));
 
 validate(activeTimeline, queryInstant);
 
+int parallelism = 
Integer.parseInt(String.valueOf(configProperties.getOrDefault(HoodieCommonConfig.TABLE_LOADING_PARALLELISM.key(),
+HoodieCommonConfig.TABLE_LOADING_PARALLELISM.defaultValue(;

Review Comment:
   By default, we need to disable this. Only after sufficient runway of having 
this code being used, we need to enable parallelism. 



##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -229,17 +238,93 @@ protected List 
getInputFileSlices(PartitionPath partition) {
   }
 
   private Map> 
loadFileSlicesForPartitions(List partitions) {
-FileStatus[] allFiles = listPartitionPathFiles(partitions);
+Pair, Map> 
partitionFilesPair = listPartitionPathFiles(partitions);
 HoodieTimeline activeTimeline = getActiveTimeline();
 Option latestInstant = activeTimeline.lastInstant();
 
-HoodieTableFileSystemView fileSystemView =
-new HoodieTableFileSystemView(metaClient, activeTimeline, allFiles);
-
 Option queryInstant = specifiedQueryInstant.or(() -> 
latestInstant.map(HoodieInstant::getTimestamp));
 
 validate(activeTimeline, queryInstant);
 
+int parallelism = 
Integer.parseInt(String.valueOf(configProperties.getOrDefault(HoodieCommonConfig.TABLE_LOADING_PARALLELISM.key(),
+HoodieCommonConfig.TABLE_LOADING_PARALLELISM.defaultValue(;
+
+Map> cachedAllInputFileSlices;
+long buildCacheFileSlicesLocalStart = System.currentTimeMillis();
+if (parallelism > 0 && partitions.size() > 0) {
+
+  // convert Map to Map
+  Map left = 
partitionFilesPair.getLeft().entrySet().stream().map(entry -> {
+String partitionPath = entry.getKey().toString();
+FileStatus[] statuses = entry.getValue();
+return Pair.of(partitionPath, statuses);
+  }).collect(Collectors.toMap(Pair::getKey, Pair::getValue));
+
+  Map partitionFiles = combine(left, 
partitionFilesPair.getRight());
+
+  cachedAllInputFileSlices = 
buildCacheFileSlicesLocalParallel(parallelism, partitions, partitionFiles, 
activeTimeline, queryInstant);
+} else {
+  FileStatus[] allFiles = 
combine(flatMap(partitionFilesPair.getLeft().values()), 
flatMap(partitionFilesPair.getRight().values()));
+  HoodieTableFileSystemView fileSystemView =
+  new HoodieTableFileSystemView(metaClient, activeTimeline, allFiles);
+
+  cachedAllInputFileSlices =  getCandidateFileSlices(partitions, 
queryInstant, fileSystemView);
+}
+
+long buildCacheFileSlicesLocalEnd = System.currentTimeMillis();
+LOG.info(String.format("Build cache file slices, spent: %d ms", 
buildCacheFileSlicesLocalEnd - buildCacheFileSlicesLocalStart));
+
+return cachedAllInputFileSlices;
+  }
+
+  private Map> 
buildCacheFileSlicesLocalParallel(int parallelism, List 
partitions, Map partitionFiles,
+   
 HoodieTimeline activeTimeline, Option queryInstant) {
+HashMap> res = new HashMap<>();
+parallelism = Math.max(1, Math.min(parallelism, partitionFiles.size()));
+int totalPartitions = partitionFiles.size();
+int cursor = 0;
+int step = totalPartitions / parallelism;
+
+ExecutorService pool =  Executors.newFixedThreadPool((parallelism + 1));
+ArrayList>>> 
futureList = new ArrayList<>(parallelism + 1);
+
+while (cursor + step <= totalPartitions) {

Review Comment:
   Can we use simple Java Streams - parallelStream here to parallelize here 
instead of subdividing and then using parallel streams.  



##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -229,17 +238,93 @@ protected List 
getInputFileSlices(PartitionPath partition) {
   }
 
   private Map> 
loadFileSlicesForPartitions(List partitions) {
-FileStatus[] allFiles = listPartitionPathFiles(partitions);
+Pair, Map> 
partitionFilesPair = listPartitionPathFiles(partitions);
 HoodieTimeline activeTimeline = getActiveTimeline();
 Option latestInstant = activeTimeline.lastInstant();
 
-

[GitHub] [hudi] hudi-bot commented on pull request #8354: Exclude pentaho deps from hive-exec

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8354:
URL: https://github.com/apache/hudi/pull/8354#issuecomment-1493206770

   
   ## CI report:
   
   * a83de6a86d5479a97f23680724054d988ab9ae38 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16062)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8353: [MINOR] Remove unused code

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8353:
URL: https://github.com/apache/hudi/pull/8353#issuecomment-1493206765

   
   ## CI report:
   
   * cc012c55987f2d62eff0d00d9d5432200167601d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16061)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8354: Exclude pentaho deps from hive-exec

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8354:
URL: https://github.com/apache/hudi/pull/8354#issuecomment-1493205886

   
   ## CI report:
   
   * a83de6a86d5479a97f23680724054d988ab9ae38 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8353: [MINOR] Remove unused code

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8353:
URL: https://github.com/apache/hudi/pull/8353#issuecomment-1493205872

   
   ## CI report:
   
   * cc012c55987f2d62eff0d00d9d5432200167601d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope opened a new pull request, #8354: Exclude pentaho deps from hive-exec

2023-04-01 Thread via GitHub


codope opened a new pull request, #8354:
URL: https://github.com/apache/hudi/pull/8354

   ### Change Logs
   
   Hudi build on new machine (or remove the `.m2`) is failing due to
   ```
   [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) on project hudi-hadoop-mr: Failed to resolve 
dependencies for one or more projects in the reactor. Reason: Unable to get 
dependency information for 
org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Failed to retrieve 
POM for org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Could not 
transfer artifact org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde 
from/to conjars (http://conjars.org/repo): Connect to conjars.org:80 
[conjars.org/54.235.127.59] failed: Connection timed out (Connection timed out)
   ```
   Hudi does not use `pentaho`. It's a third-part dependdency being pulled in 
transitively. We should exclude it from `hive-exec`. The parent `pom.xml` does 
exclude it but `hive-exec` in parent pom is in `provided` scope so the 
exclusion rule is not inherited proabably.
   
   ### Impact
   
   Fix build failure.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lvyanquan commented on a diff in pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


lvyanquan commented on code in PR #8349:
URL: https://github.com/apache/hudi/pull/8349#discussion_r1155228399


##
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/BootstrapExecutorUtils.java:
##
@@ -182,6 +182,13 @@ public void execute() throws IOException {
   checkpointCommitMetadata.put(CHECKPOINT_KEY, Config.checkpoint);
   bootstrapClient.bootstrap(Option.of(checkpointCommitMetadata));
   syncHive();

Review Comment:
   Thanks for your suggestion, address it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] huangxiaopingRD opened a new pull request, #8353: [MINOR] Remove unused code

2023-04-01 Thread via GitHub


huangxiaopingRD opened a new pull request, #8353:
URL: https://github.com/apache/hudi/pull/8353

   ### Change Logs
   
   Remove unused code
   
   ### Impact
   
   no
   
   ### Risk level (write none, low medium or high below)
   
   none
   ### Documentation Update
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

2023-04-01 Thread via GitHub


KnightChess commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1493198904

   is it can work in your local ? not k8s ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (9a79a6d4631 -> 35401194599)

2023-04-01 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 9a79a6d4631 [HUDI-5649] Unify all the loggers to slf4j (#7955) (#7955)
 add 35401194599 [HUDI-5780] Refactor Deltastreamer source configs to use 
HoodieConfig/ConfigProperty (#8184)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/common/config/ConfigGroups.java|   5 +-
 .../integ/testsuite/dag/nodes/RollbackNode.java|   3 +-
 .../helpers/DFSTestSuitePathSelector.java  |   3 +-
 .../hudi/utilities/config/CloudSourceConfig.java   |  94 
 ...ormerConfig.java => DFSPathSelectorConfig.java} |  22 ++--
 .../config/DatePartitionPathSelectorConfig.java|  64 +++
 ...viderConfig.java => GCSEventsSourceConfig.java} |  25 +++--
 ...erConfig.java => HiveIncrPullSourceConfig.java} |  20 ++--
 .../utilities/config/HoodieIncrSourceConfig.java   |  86 +++
 .../hudi/utilities/config/JdbcSourceConfig.java|  93 
 .../config/JsonKafkaPostProcessorConfig.java   |  68 
 .../hudi/utilities/config/KafkaSourceConfig.java   |  83 +++
 .../hudi/utilities/config/PulsarSourceConfig.java  |  68 
 .../config/S3EventsHoodieIncrSourceConfig.java |  70 
 .../hudi/utilities/config/S3SourceConfig.java  |  76 +
 ...TransformerConfig.java => SqlSourceConfig.java} |  19 ++--
 .../hudi/utilities/deltastreamer/DeltaSync.java|   4 +-
 .../sources/GcsEventsHoodieIncrSource.java |  25 ++---
 .../hudi/utilities/sources/GcsEventsSource.java|  16 ++-
 .../hudi/utilities/sources/HiveIncrPullSource.java |   9 +-
 .../hudi/utilities/sources/HoodieIncrSource.java   |  78 --
 .../apache/hudi/utilities/sources/JdbcSource.java  | 118 ++---
 .../hudi/utilities/sources/JsonKafkaSource.java|   3 +-
 .../apache/hudi/utilities/sources/KafkaSource.java |   3 +-
 .../hudi/utilities/sources/PulsarSource.java   |  54 ++
 .../sources/S3EventsHoodieIncrSource.java  |  72 +++--
 .../hudi/utilities/sources/S3EventsSource.java |   3 +-
 .../apache/hudi/utilities/sources/SqlSource.java   |  13 +--
 .../utilities/sources/debezium/DebeziumSource.java |   5 +-
 .../sources/helpers/CloudObjectsSelector.java  |  63 ++-
 .../sources/helpers/CloudStoreIngestionConfig.java |  46 
 .../utilities/sources/helpers/DFSPathSelector.java |  17 +--
 .../sources/helpers/DatePartitionPathSelector.java |  64 ++-
 .../helpers/IncrSourceCloudStorageHelper.java  |   8 +-
 .../sources/helpers/IncrSourceHelper.java  |  50 ++---
 .../utilities/sources/helpers/KafkaOffsetGen.java  |  90 +++-
 .../sources/helpers/S3EventsMetaSelector.java  |   3 +-
 .../sources/helpers/gcs/FilePathsFetcher.java  |  16 +--
 .../sources/helpers/gcs/GcsIngestionConfig.java|   8 +-
 .../MaxwellJsonKafkaSourcePostProcessor.java   |  40 ++-
 .../hudi/utilities/config/SourceTestConfig.java|  49 +
 .../deltastreamer/TestHoodieDeltaStreamer.java |  12 +--
 .../TestHoodieDeltaStreamerWithMultiWriter.java|  18 ++--
 .../utilities/sources/BaseTestKafkaSource.java |   6 +-
 .../utilities/sources/TestAvroKafkaSource.java |   4 +-
 .../utilities/sources/TestGcsEventsSource.java |   8 +-
 .../utilities/sources/TestJsonKafkaSource.java |  10 +-
 .../sources/TestJsonKafkaSourcePostProcessor.java  |  23 ++--
 .../utilities/sources/TestProtoKafkaSource.java|   4 +-
 .../hudi/utilities/sources/TestS3EventsSource.java |  12 +--
 .../sources/helpers/TestCloudObjectsSelector.java  |   8 +-
 .../helpers/TestDFSPathSelectorCommonMethods.java  |   8 +-
 .../helpers/TestDatePartitionPathSelector.java |  24 ++---
 .../sources/helpers/TestS3EventsMetaSelector.java  |   8 +-
 .../testutils/sources/AbstractBaseTestSource.java  |  10 +-
 .../sources/DistributedTestDataSource.java |   8 +-
 .../testutils/sources/config/SourceConfigs.java|  43 
 57 files changed, 1225 insertions(+), 637 deletions(-)
 create mode 100644 
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/CloudSourceConfig.java
 copy 
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/{SqlTransformerConfig.java
 => DFSPathSelectorConfig.java} (64%)
 create mode 100644 
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/DatePartitionPathSelectorConfig.java
 copy 
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/{FilebasedSchemaProviderConfig.java
 => GCSEventsSourceConfig.java} (58%)
 copy 
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/{SqlTransformerConfig.java
 => HiveIncrPullSourceConfig.java} (63%)
 create mode 100644 
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/HoodieIncrSourceConfig.java
 create mode 100644 

[GitHub] [hudi] yihua merged pull request #8184: [HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty

2023-04-01 Thread via GitHub


yihua merged PR #8184:
URL: https://github.com/apache/hudi/pull/8184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8349:
URL: https://github.com/apache/hudi/pull/8349#issuecomment-1493196451

   
   ## CI report:
   
   * 1f8525dea0a58bcb174efb7bd8a42f32183d6df1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16049)
 
   * a1d57ff0e1f246cefa8f078c87dee970ab0f3e5e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16059)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8349:
URL: https://github.com/apache/hudi/pull/8349#issuecomment-1493195273

   
   ## CI report:
   
   * 1f8525dea0a58bcb174efb7bd8a42f32183d6df1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16049)
 
   * a1d57ff0e1f246cefa8f078c87dee970ab0f3e5e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8344: [HUDI-5968] Fix global index duplicate when update partition

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8344:
URL: https://github.com/apache/hudi/pull/8344#issuecomment-1493194480

   
   ## CI report:
   
   * fa1b1525a163af85271f0dc9e0d5765ea2075044 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16058)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lvyanquan commented on a diff in pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


lvyanquan commented on code in PR #8349:
URL: https://github.com/apache/hudi/pull/8349#discussion_r1155223998


##
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/BootstrapExecutorUtils.java:
##
@@ -182,6 +182,13 @@ public void execute() throws IOException {
   checkpointCommitMetadata.put(CHECKPOINT_KEY, Config.checkpoint);
   bootstrapClient.bootstrap(Option.of(checkpointCommitMetadata));
   syncHive();

Review Comment:
   Thanks for your suggestion, address it.
   Yeah, If If `syncHive` failed, we could call 
[hive_sync](https://hudi.apache.org/docs/next/procedures/#hive_sync) to avoid 
rerunning the long time bootstrap.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lvyanquan commented on a diff in pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


lvyanquan commented on code in PR #8349:
URL: https://github.com/apache/hudi/pull/8349#discussion_r1155223998


##
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/BootstrapExecutorUtils.java:
##
@@ -182,6 +182,13 @@ public void execute() throws IOException {
   checkpointCommitMetadata.put(CHECKPOINT_KEY, Config.checkpoint);
   bootstrapClient.bootstrap(Option.of(checkpointCommitMetadata));
   syncHive();

Review Comment:
   Thanks for your suggestion, address it.
   Yeah, If If `syncHive` failed, we could call 
[hive_sync](https://hudi.apache.org/docs/next/procedures/#hive_sync) to avoid 
rerun the long time bootstrap.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8344: [HUDI-5968] Fix global index duplicate when update partition

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8344:
URL: https://github.com/apache/hudi/pull/8344#issuecomment-1493151752

   
   ## CI report:
   
   * 51b9969e85c03f3f5c8274782d6ac5810c760ab1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16029)
 
   * fa1b1525a163af85271f0dc9e0d5765ea2075044 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16058)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8344: [HUDI-5968] Fix global index duplicate when update partition

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8344:
URL: https://github.com/apache/hudi/pull/8344#issuecomment-1493149491

   
   ## CI report:
   
   * 51b9969e85c03f3f5c8274782d6ac5810c760ab1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16029)
 
   * fa1b1525a163af85271f0dc9e0d5765ea2075044 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8352: [HUDI-6015] Refresh the table after executing rollback to instantTime

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8352:
URL: https://github.com/apache/hudi/pull/8352#issuecomment-1493146909

   
   ## CI report:
   
   * 4b24209938c245227cd6f1ccc1b428d50f3b51a9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16057)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #8344: [HUDI-5968] Fix global index duplicate when update partition

2023-04-01 Thread via GitHub


xushiyan commented on code in PR #8344:
URL: https://github.com/apache/hudi/pull/8344#discussion_r1155185037


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java:
##
@@ -135,8 +135,8 @@ private  HoodieData> getTaggedRecords(
   HoodieRecord deleteRecord = new HoodieAvroRecord(new 
HoodieKey(inputRecord.getRecordKey(), partitionPath), new 
EmptyHoodieRecordPayload());
   deleteRecord.setCurrentLocation(location);
   deleteRecord.seal();
-  // Tag the incoming record for inserting to the new partition
-  HoodieRecord insertRecord = (HoodieRecord) 
HoodieIndexUtils.getTaggedRecord(inputRecord, Option.empty());
+  // Tag the incoming record for inserting to the new partition; 
left unsealed for marking as dedup later
+  HoodieRecord insertRecord = (HoodieRecord) 
HoodieIndexUtils.getUnsealedTaggedRecord(inputRecord, Option.empty());

Review Comment:
   fair enough. updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8350: [HUDI-6014] Remove unused import in hudi-spark

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8350:
URL: https://github.com/apache/hudi/pull/8350#issuecomment-1493131581

   
   ## CI report:
   
   * 7348e1ee8204fe07f1ede4dc3ef569662b2294d5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16055)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8184: [HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8184:
URL: https://github.com/apache/hudi/pull/8184#issuecomment-1493131516

   
   ## CI report:
   
   * 91c817cde3999d618830b882c0905b3a3af567f1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16054)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7881:
URL: https://github.com/apache/hudi/pull/7881#issuecomment-1493112605

   
   ## CI report:
   
   * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN
   * fac9401a3fcaa99283e204f335d7d0d38dc1b748 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16056)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8351: [HUDI-6013] Support database name for meta sync in bootstrap

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8351:
URL: https://github.com/apache/hudi/pull/8351#issuecomment-1493094361

   
   ## CI report:
   
   * 62cce26c004b5dabd45271bda4141a730ddad6cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16052)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7881:
URL: https://github.com/apache/hudi/pull/7881#issuecomment-1493074608

   
   ## CI report:
   
   * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN
   * 8674a4b5d21a0b6254e02a4c36e69a9056e0f2e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15964)
 
   * fac9401a3fcaa99283e204f335d7d0d38dc1b748 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16056)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8352: [HUDI-6015] Refresh the table after executing rollback to instantTime

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8352:
URL: https://github.com/apache/hudi/pull/8352#issuecomment-1493073594

   
   ## CI report:
   
   * 4b24209938c245227cd6f1ccc1b428d50f3b51a9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16057)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7881:
URL: https://github.com/apache/hudi/pull/7881#issuecomment-1493073405

   
   ## CI report:
   
   * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN
   * 8674a4b5d21a0b6254e02a4c36e69a9056e0f2e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15964)
 
   * fac9401a3fcaa99283e204f335d7d0d38dc1b748 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8352: [HUDI-6015] Refresh the table after executing rollback to instantTime

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8352:
URL: https://github.com/apache/hudi/pull/8352#issuecomment-1493071944

   
   ## CI report:
   
   * 4b24209938c245227cd6f1ccc1b428d50f3b51a9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8349:
URL: https://github.com/apache/hudi/pull/8349#issuecomment-1493071842

   
   ## CI report:
   
   * 1f8525dea0a58bcb174efb7bd8a42f32183d6df1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16049)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6015) Refresh the table after executing rollback to instantTime

2023-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6015:
-
Labels: pull-request-available  (was: )

> Refresh the table after executing rollback to instantTime
> -
>
> Key: HUDI-6015
> URL: https://issues.apache.org/jira/browse/HUDI-6015
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xiaoping.huang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] huangxiaopingRD opened a new pull request, #8352: [HUDI-6015] Refresh the table after executing rollback to instantTime

2023-04-01 Thread via GitHub


huangxiaopingRD opened a new pull request, #8352:
URL: https://github.com/apache/hudi/pull/8352

   ### Change Logs
   
   Spark will cache some meta information of the table. After the 
RollbackToInstantTimeProcedure is executed on the table, the meta information 
will change and the table needs to be refreshed. Otherwise, the following error 
will occur when querying the data again:
   
   ```
   Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://x/user/hive/warehouse/hudi_cow_nonpcf_tbl2/7a19abfb-35ab-40bb-9580-6b1af681506a-0_0-23-20_20230402002001284.parquet
   It is possible the underlying files have been updated. You can explicitly 
invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in 
SQL or by recreating the Dataset/DataFrame involved.
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:187)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503)
   ```
   
   ### Impact
   
   No
   ### Risk level (write none, low medium or high below)
   
   none
   ### Documentation Update
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6015) Refresh the table after executing rollback to instantTime

2023-04-01 Thread xiaoping.huang (Jira)
xiaoping.huang created HUDI-6015:


 Summary: Refresh the table after executing rollback to instantTime
 Key: HUDI-6015
 URL: https://issues.apache.org/jira/browse/HUDI-6015
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xiaoping.huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7674: Bump mysql-connector-java from 8.0.22 to 8.0.28 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7674:
URL: https://github.com/apache/hudi/pull/7674#issuecomment-1493054669

   
   ## CI report:
   
   * 1a2a3dec3dc7711148db3a1ac1a9bf4b6234dff5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16048)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8350: [HUDI-6014] Remove unused import in hudi-spark

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8350:
URL: https://github.com/apache/hudi/pull/8350#issuecomment-1493043019

   
   ## CI report:
   
   * 144ee6c32ea038bc30194425dd8ef5101b24d5a1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16053)
 
   * 7348e1ee8204fe07f1ede4dc3ef569662b2294d5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16055)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8184: [HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8184:
URL: https://github.com/apache/hudi/pull/8184#issuecomment-1493042927

   
   ## CI report:
   
   * 698b05e1e29fde7cff6e450e666dd87bff00d85f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16043)
 
   * 91c817cde3999d618830b882c0905b3a3af567f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16054)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8350: [HUDI-6014] Remove unused import in hudi-spark

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8350:
URL: https://github.com/apache/hudi/pull/8350#issuecomment-1493041598

   
   ## CI report:
   
   * bc3780ce4f668b7455e3e787f740e3c54bc0fb8a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16051)
 
   * 144ee6c32ea038bc30194425dd8ef5101b24d5a1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16053)
 
   * 7348e1ee8204fe07f1ede4dc3ef569662b2294d5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8184: [HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8184:
URL: https://github.com/apache/hudi/pull/8184#issuecomment-1493041484

   
   ## CI report:
   
   * 698b05e1e29fde7cff6e450e666dd87bff00d85f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16043)
 
   * 91c817cde3999d618830b882c0905b3a3af567f1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] huangxiaopingRD commented on pull request #8350: [HUDI-6014] Remove unused import in hudi-spark

2023-04-01 Thread via GitHub


huangxiaopingRD commented on PR #8350:
URL: https://github.com/apache/hudi/pull/8350#issuecomment-1493034132

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7515: Bump gson from 2.6.2 to 2.8.9 in /packaging/hudi-cli-bundle

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7515:
URL: https://github.com/apache/hudi/pull/7515#issuecomment-1493007338

   
   ## CI report:
   
   * c1b20713d7ad85c536f38407082b0e9f183325ed Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16046)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] LinMingQiang commented on a diff in pull request #8338: [HUDI-5996] Verify the consistency of bucket num at job sta…

2023-04-01 Thread via GitHub


LinMingQiang commented on code in PR #8338:
URL: https://github.com/apache/hudi/pull/8338#discussion_r1155123399


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java:
##
@@ -217,18 +218,52 @@ public static HoodieTableMetaClient initTableIfNotExists(
   .setCDCEnabled(conf.getBoolean(FlinkOptions.CDC_ENABLED))
   
.setCDCSupplementalLoggingMode(conf.getString(FlinkOptions.SUPPLEMENTAL_LOGGING_MODE))
   .setTimelineLayoutVersion(1)
+  .setHoodieIndexConf(OptionsResolver.getHoodieIndexConf(conf))
   .initTable(hadoopConf, basePath);
   LOG.info("Table initialized under base path {}", basePath);
   return metaClient;
 } else {
   LOG.info("Table [{}/{}] already exists, no need to initialize the table",
   basePath, conf.getString(FlinkOptions.TABLE_NAME));
-  return StreamerUtil.createMetaClient(basePath, hadoopConf);
+  HoodieTableMetaClient client = StreamerUtil.createMetaClient(basePath, 
hadoopConf);
+  validateTableConfig(conf, client.getTableConfig());

Review Comment:
   We just need to verify when the StreamWriteOperatorCoordinator#start or 
HoodieTableFactory#setupTableOptions. Just ensure that the job cannot be 
started if the validation fails.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] chenbodeng719 commented on issue #8166: [SUPPORT] Hudi Bucket Index

2023-04-01 Thread via GitHub


chenbodeng719 commented on issue #8166:
URL: https://github.com/apache/hudi/issues/8166#issuecomment-1492990599

   @soumilshah1995  How can I use flink to do this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8350: [HUDI-6014] Remove unused import in hudi-spark

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8350:
URL: https://github.com/apache/hudi/pull/8350#issuecomment-1492989109

   
   ## CI report:
   
   * bc3780ce4f668b7455e3e787f740e3c54bc0fb8a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16051)
 
   * 144ee6c32ea038bc30194425dd8ef5101b24d5a1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16053)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8347: [HUDI-5173] Fix new config naming around single file group clustering

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8347:
URL: https://github.com/apache/hudi/pull/8347#issuecomment-1492989100

   
   ## CI report:
   
   * 1cb0d7904acccd83eabd47b28841dd159be9711e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16045)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8350: [HUDI-6014] Remove unused import in hudi-spark

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8350:
URL: https://github.com/apache/hudi/pull/8350#issuecomment-1492976543

   
   ## CI report:
   
   * bc3780ce4f668b7455e3e787f740e3c54bc0fb8a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16051)
 
   * 144ee6c32ea038bc30194425dd8ef5101b24d5a1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6014) Remove unused import in hudi-spark

2023-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6014:
-
Labels: pull-request-available  (was: )

> Remove unused import in hudi-spark
> --
>
> Key: HUDI-6014
> URL: https://issues.apache.org/jira/browse/HUDI-6014
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8351: [HUDI-6013] Support database name for meta sync in bootstrap

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8351:
URL: https://github.com/apache/hudi/pull/8351#issuecomment-1492976548

   
   ## CI report:
   
   * 62cce26c004b5dabd45271bda4141a730ddad6cb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16052)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8346: [HUDI-6004] Allow procedures to operate table which is not in the current session

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8346:
URL: https://github.com/apache/hudi/pull/8346#issuecomment-1492976520

   
   ## CI report:
   
   * b12bb9dd0337f8939bec1129a5b28e520c57dabc Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16044)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #8166: [SUPPORT] Hudi Bucket Index

2023-04-01 Thread via GitHub


soumilshah1995 commented on issue #8166:
URL: https://github.com/apache/hudi/issues/8166#issuecomment-1492976512

   Please refer to following video 
   https://www.youtube.com/watch?v=lOQFUrfJFP4=248s 
   hope this helps 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] chenbodeng719 commented on issue #8166: [SUPPORT] Hudi Bucket Index

2023-04-01 Thread via GitHub


chenbodeng719 commented on issue #8166:
URL: https://github.com/apache/hudi/issues/8166#issuecomment-1492976098

   @KnightChess  Hi, I use below conf to test bulk insert. There comes out only 
one parquet. Did I miss something? I expect 5 parquets( 5 buckets). My dataset 
is about 120GB.
   ```
   
   CREATE TABLE hbase2hudi_sink(
   uid STRING PRIMARY KEY NOT ENFORCED,
   oridata STRING,
   update_time TIMESTAMP_LTZ(3)
   ) WITH (
   'table.type' = 'MERGE_ON_READ',
   'connector' = 'hudi',
   'path' = '%s',
   'write.operation' = 'bulk_insert',
   'precombine.field' = 'update_time',
   'write.tasks' = '2',
   'index.type' = 'BUCKET',
   'hoodie.bucket.index.hash.field' = 'uid',
   'hoodie.bucket.index.num.buckets' = '5'
   )
   
   ```
   https://user-images.githubusercontent.com/104059106/229291867-c6c4f9fa-1183-4adb-838b-c72684868b6f.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8351: [HUDI-6013] Support database name for meta sync in bootstrap

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8351:
URL: https://github.com/apache/hudi/pull/8351#issuecomment-1492975078

   
   ## CI report:
   
   * 62cce26c004b5dabd45271bda4141a730ddad6cb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6014) Remove unused import in hudi-spark

2023-04-01 Thread xiaoping.huang (Jira)
xiaoping.huang created HUDI-6014:


 Summary: Remove unused import in hudi-spark
 Key: HUDI-6014
 URL: https://issues.apache.org/jira/browse/HUDI-6014
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xiaoping.huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6013) Support database name for meta sync in bootstrap

2023-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6013:
-
Labels: pull-request-available  (was: )

> Support database name for meta sync in bootstrap
> 
>
> Key: HUDI-6013
> URL: https://issues.apache.org/jira/browse/HUDI-6013
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xiaoping.huang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] huangxiaopingRD opened a new pull request, #8351: [HUDI-6013] Support database name for meta sync in bootstrap

2023-04-01 Thread via GitHub


huangxiaopingRD opened a new pull request, #8351:
URL: https://github.com/apache/hudi/pull/8351

   ### Change Logs
   
   Support database name for meta sync in bootstrap.
   
   ### Impact
   
   no
   ### Risk level (write none, low medium or high below)
   
   none
   ### Documentation Update
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6013) Support database name for meta sync in bootstrap

2023-04-01 Thread xiaoping.huang (Jira)
xiaoping.huang created HUDI-6013:


 Summary: Support database name for meta sync in bootstrap
 Key: HUDI-6013
 URL: https://issues.apache.org/jira/browse/HUDI-6013
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xiaoping.huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-01 Thread via GitHub


chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1492970849

   I use below conf to test bulk insert. There is only one parquet. Did I miss 
something? I expect 5 parquet. My dataset is about 120GB.
   ```
   
   CREATE TABLE hbase2hudi_sink(
   uid STRING PRIMARY KEY NOT ENFORCED,
   oridata STRING,
   update_time TIMESTAMP_LTZ(3)
   ) WITH (
   'table.type' = 'MERGE_ON_READ',
   'connector' = 'hudi',
   'path' = '%s',
   'write.operation' = 'bulk_insert',
   'precombine.field' = 'update_time',
   'write.tasks' = '2',
   'index.type' = 'BUCKET',
   'hoodie.bucket.index.hash.field' = 'uid',
   'hoodie.bucket.index.num.buckets' = '5'
   )
   
   ```
   https://user-images.githubusercontent.com/104059106/229291867-c6c4f9fa-1183-4adb-838b-c72684868b6f.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] huangxiaopingRD commented on a diff in pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


huangxiaopingRD commented on code in PR #8349:
URL: https://github.com/apache/hudi/pull/8349#discussion_r1155108286


##
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/BootstrapExecutorUtils.java:
##
@@ -182,6 +182,13 @@ public void execute() throws IOException {
   checkpointCommitMetadata.put(CHECKPOINT_KEY, Config.checkpoint);
   bootstrapClient.bootstrap(Option.of(checkpointCommitMetadata));
   syncHive();

Review Comment:
   If `syncHive` fails when `enable_hive_sync` is enabled, perhaps we can skip 
cleaning the path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] huangxiaopingRD commented on a diff in pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


huangxiaopingRD commented on code in PR #8349:
URL: https://github.com/apache/hudi/pull/8349#discussion_r1155108150


##
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/BootstrapExecutorUtils.java:
##


Review Comment:
   If `syncHive` fails when `enable_hive_sync` is enabled, perhaps we can skip 
cleaning the path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8350: [MINOR] Remove unused import

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8350:
URL: https://github.com/apache/hudi/pull/8350#issuecomment-1492961549

   
   ## CI report:
   
   * bc3780ce4f668b7455e3e787f740e3c54bc0fb8a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16051)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8350: [MINOR] Remove unused import

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8350:
URL: https://github.com/apache/hudi/pull/8350#issuecomment-1492960321

   
   ## CI report:
   
   * bc3780ce4f668b7455e3e787f740e3c54bc0fb8a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8184: [HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8184:
URL: https://github.com/apache/hudi/pull/8184#issuecomment-1492958877

   
   ## CI report:
   
   * 698b05e1e29fde7cff6e450e666dd87bff00d85f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16043)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] huangxiaopingRD opened a new pull request, #8350: [MINOR] Remove unused import

2023-04-01 Thread via GitHub


huangxiaopingRD opened a new pull request, #8350:
URL: https://github.com/apache/hudi/pull/8350

   ### Change Logs
   
   Remove unused import
   ### Impact
   
   No
   ### Risk level (write none, low medium or high below)
   
   none
   ### Documentation Update
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-5649) Unify all the loggers to slf4j

2023-04-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-5649.

Resolution: Fixed

Fixed via master branch: 9a79a6d463106dc1c579ae5bc194a2f1605980ad

> Unify all the loggers to slf4j
> --
>
> Key: HUDI-5649
> URL: https://issues.apache.org/jira/browse/HUDI-5649
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5649) Unify all the loggers to slf4j

2023-04-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5649:
-
Fix Version/s: 0.14.0

> Unify all the loggers to slf4j
> --
>
> Key: HUDI-5649
> URL: https://issues.apache.org/jira/browse/HUDI-5649
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (9c97340e001 -> 9a79a6d4631)

2023-04-01 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 9c97340e001 Bump protobuf-java from 3.21.5 to 3.21.7 (#6871)
 add 9a79a6d4631 [HUDI-5649] Unify all the loggers to slf4j (#7955) (#7955)

No new revisions were added by this update.

Summary of changes:
 .../hudi/aws/cloudwatch/CloudWatchReporter.java|  6 ++--
 .../HoodieConfigAWSCredentialsProvider.java| 11 
 .../hudi/aws/sync/AWSGlueCatalogSyncClient.java|  8 +++---
 .../lock/DynamoDBBasedLockProvider.java| 25 
 .../hudi/cli/commands/ArchivedCommitsCommand.java  |  6 ++--
 .../hudi/cli/commands/CompactionCommand.java   |  6 ++--
 .../apache/hudi/cli/commands/MetadataCommand.java  |  6 ++--
 .../apache/hudi/cli/commands/RepairsCommand.java   |  6 ++--
 .../org/apache/hudi/cli/commands/SparkMain.java|  6 ++--
 .../org/apache/hudi/cli/commands/TableCommand.java | 11 
 .../apache/hudi/cli/commands/TimelineCommand.java  | 13 +
 .../apache/hudi/cli/utils/InputStreamConsumer.java |  8 +++---
 .../hudi/cli/utils/SparkTempViewProvider.java  |  7 +++--
 .../scala/org/apache/hudi/cli/DedupeSparkJob.scala |  7 ++---
 .../hudi/cli/commands/TestRepairsCommand.java  |  4 +--
 .../cli/testutils/ShellEvaluationResultUtil.java   |  6 ++--
 .../org/apache/hudi/async/AsyncArchiveService.java |  6 ++--
 .../org/apache/hudi/async/AsyncCleanerService.java |  6 ++--
 .../apache/hudi/async/AsyncClusteringService.java  |  6 ++--
 .../org/apache/hudi/async/AsyncCompactService.java |  6 ++--
 .../org/apache/hudi/async/HoodieAsyncService.java  |  6 ++--
 .../http/HoodieWriteCommitHttpCallbackClient.java  | 11 
 .../impl/HoodieWriteCommitHttpCallback.java|  6 ++--
 .../org/apache/hudi/client/BaseHoodieClient.java   |  6 ++--
 .../hudi/client/BaseHoodieTableServiceClient.java  |  6 ++--
 .../apache/hudi/client/BaseHoodieWriteClient.java  |  6 ++--
 .../apache/hudi/client/CompactionAdminClient.java  |  6 ++--
 .../client/HoodieTableServiceManagerClient.java|  6 ++--
 .../apache/hudi/client/HoodieTimelineArchiver.java |  6 ++--
 .../apache/hudi/client/ReplaceArchivalHelper.java  | 10 +++
 .../org/apache/hudi/client/RunsTableService.java   |  6 ++--
 .../java/org/apache/hudi/client/WriteStatus.java   |  6 ++--
 .../bootstrap/FullRecordBootstrapDataProvider.java | 10 +++
 .../selector/BootstrapRegexModeSelector.java   | 16 ++-
 .../embedded/EmbeddedTimelineServerHelper.java |  6 ++--
 .../client/embedded/EmbeddedTimelineService.java   |  6 ++--
 .../hudi/client/heartbeat/HeartbeatUtils.java  |  6 ++--
 .../client/heartbeat/HoodieHeartbeatClient.java|  6 ++--
 ...urrentFileWritesConflictResolutionStrategy.java |  7 +++--
 .../client/transaction/TransactionManager.java |  6 ++--
 .../lock/FileSystemBasedLockProvider.java  | 17 +--
 .../transaction/lock/InProcessLockProvider.java|  6 ++--
 .../hudi/client/transaction/lock/LockManager.java  |  6 ++--
 .../lock/ZookeeperBasedLockProvider.java   |  6 ++--
 .../apache/hudi/client/utils/TransactionUtils.java | 10 +++
 .../org/apache/hudi/config/HoodieIndexConfig.java  |  6 ++--
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  6 ++--
 .../org/apache/hudi/index/HoodieIndexUtils.java|  6 ++--
 .../apache/hudi/index/bloom/HoodieBloomIndex.java  |  7 +++--
 .../hudi/index/bucket/HoodieBucketIndex.java   |  6 ++--
 .../hudi/index/bucket/HoodieSimpleBucketIndex.java |  7 +++--
 .../hbase/DefaultHBaseQPSResourceAllocator.java|  6 ++--
 .../org/apache/hudi/io/HoodieAppendHandle.java | 13 +
 .../org/apache/hudi/io/HoodieConcatHandle.java |  6 ++--
 .../org/apache/hudi/io/HoodieCreateHandle.java | 11 
 .../org/apache/hudi/io/HoodieKeyLookupHandle.java  |  9 +++---
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  6 ++--
 .../hudi/io/HoodieMergeHandleWithChangeLog.java| 13 +
 .../hudi/io/HoodieUnboundedCreateHandle.java   |  6 ++--
 .../java/org/apache/hudi/io/HoodieWriteHandle.java |  6 ++--
 .../metadata/HoodieBackedTableMetadataWriter.java  |  6 ++--
 .../hudi/metrics/ConsoleMetricsReporter.java   |  6 ++--
 .../org/apache/hudi/metrics/HoodieMetrics.java |  6 ++--
 .../apache/hudi/metrics/JmxMetricsReporter.java|  5 ++--
 .../main/java/org/apache/hudi/metrics/Metrics.java |  6 ++--
 .../hudi/metrics/MetricsGraphiteReporter.java  |  6 ++--
 .../hudi/metrics/MetricsReporterFactory.java   |  6 ++--
 .../cloudwatch/CloudWatchMetricsReporter.java  |  6 ++--
 .../hudi/metrics/datadog/DatadogHttpClient.java|  6 ++--
 .../hudi/metrics/datadog/DatadogReporter.java  |  6 ++--
 .../metrics/prometheus/PrometheusReporter.java |  6 ++--
 .../metrics/prometheus/PushGatewayReporter.java| 13 -
 .../java/org/apache/hudi/table/HoodieTable.java|  6 ++--
 

[GitHub] [hudi] danny0405 merged pull request #7955: [HUDI-5649] Unify all the loggers to slf4j

2023-04-01 Thread via GitHub


danny0405 merged PR #7955:
URL: https://github.com/apache/hudi/pull/7955


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8231: [HUDI-5963] Release 0.13.1 prep

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8231:
URL: https://github.com/apache/hudi/pull/8231#issuecomment-1492945709

   
   ## CI report:
   
   * 55838f0c1b495a57289b7f907560fa5b2c566981 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16041)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8349:
URL: https://github.com/apache/hudi/pull/8349#issuecomment-1492921694

   
   ## CI report:
   
   * 1f8525dea0a58bcb174efb7bd8a42f32183d6df1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16049)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8349:
URL: https://github.com/apache/hudi/pull/8349#issuecomment-1492920494

   
   ## CI report:
   
   * 1f8525dea0a58bcb174efb7bd8a42f32183d6df1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8345: [HUDI-6011] Fix cli show archived commits breaks for replacecommit

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8345:
URL: https://github.com/apache/hudi/pull/8345#issuecomment-1492919336

   
   ## CI report:
   
   * ff23f8bdae6dc5eac49789350c1d2a79c3538949 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16040)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6012) delete base path when failed to run bootstrap procedure

2023-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6012:
-
Labels: pull-request-available  (was: )

> delete base path when failed to run bootstrap procedure
> ---
>
> Key: HUDI-6012
> URL: https://issues.apache.org/jira/browse/HUDI-6012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: bootstrap
>Reporter: lvyanquan
>Priority: Major
>  Labels: pull-request-available
>
> [run_bootstrap](https://hudi.apache.org/docs/next/procedures#run_bootstrap) 
> procedure is called like this 
> {code:java}
> call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE', 
> bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table', 
> base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table', rowKey_field => 'id', 
> partition_path_field => 'dt'); {code}
> some exceptional cases this procedure will fail, for example, bootstrap_path 
> is not existed or empty.  The  `base_path` in HDFS was still remained with 
> `.hoodie` directory.
> Though we can still rerun bootstrap procedure and pass `bootstrap_overwrite` 
> parameter, it's better to clean this path that we created after failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] lvyanquan opened a new pull request, #8349: [HUDI-6012] Delete base path when failed to run bootstrap procedure

2023-04-01 Thread via GitHub


lvyanquan opened a new pull request, #8349:
URL: https://github.com/apache/hudi/pull/8349

   ### Change Logs
   
   Deleted the `base_path` we created when bootstrap failed.
   `base_path` is always empty when we start bootstrap as we have checked in 
[BootstrapExecutorUtils](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/BootstrapExecutorUtils.java)#initializeTable
   Though we can still rerun bootstrap procedure and pass `bootstrap_overwrite` 
parameter, it's better to clean this path that we created after failure.

   
   ### Impact
   
   `run_bootstrap` spark procedure
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6012) delete base path when failed to run bootstrap procedure

2023-04-01 Thread lvyanquan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lvyanquan updated HUDI-6012:

Description: 
[run_bootstrap](https://hudi.apache.org/docs/next/procedures#run_bootstrap) 
procedure is called like this 
{code:java}
call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE', 
bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table', 
base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table', rowKey_field => 'id', 
partition_path_field => 'dt'); {code}
some exceptional cases this procedure will fail, for example, bootstrap_path is 
not existed or empty.  The  `base_path` in HDFS was still remained with 
`.hoodie` directory.

Though we can still rerun bootstrap procedure and pass `bootstrap_overwrite` 
parameter, it's better to clean this path that we created after failure.

  was:
when we failed to `run_bootstrap` procedure, the  `base_path` was remained with 
`.hoodie` directory.

Though we can still rerun bootstrap procedure and pass `bootstrap_overwrite` 
parameter, it's better to clean this path that we created after failure.


> delete base path when failed to run bootstrap procedure
> ---
>
> Key: HUDI-6012
> URL: https://issues.apache.org/jira/browse/HUDI-6012
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: bootstrap
>Reporter: lvyanquan
>Priority: Major
>
> [run_bootstrap](https://hudi.apache.org/docs/next/procedures#run_bootstrap) 
> procedure is called like this 
> {code:java}
> call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE', 
> bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table', 
> base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table', rowKey_field => 'id', 
> partition_path_field => 'dt'); {code}
> some exceptional cases this procedure will fail, for example, bootstrap_path 
> is not existed or empty.  The  `base_path` in HDFS was still remained with 
> `.hoodie` directory.
> Though we can still rerun bootstrap procedure and pass `bootstrap_overwrite` 
> parameter, it's better to clean this path that we created after failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6012) delete base path when failed to run bootstrap procedure

2023-04-01 Thread lvyanquan (Jira)
lvyanquan created HUDI-6012:
---

 Summary: delete base path when failed to run bootstrap procedure
 Key: HUDI-6012
 URL: https://issues.apache.org/jira/browse/HUDI-6012
 Project: Apache Hudi
  Issue Type: Improvement
  Components: bootstrap
Reporter: lvyanquan


when we failed to `run_bootstrap` procedure, the  `base_path` was remained with 
`.hoodie` directory.

Though we can still rerun bootstrap procedure and pass `bootstrap_overwrite` 
parameter, it's better to clean this path that we created after failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7955: [HUDI-5649] Unify all the loggers to slf4j

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7955:
URL: https://github.com/apache/hudi/pull/7955#issuecomment-1492893722

   
   ## CI report:
   
   * 7c0823ea668154194f11e562f79fa9a6ecda2bee Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16038)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-01 Thread via GitHub


chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1492884944

   > You do not declare the index type as bucket while doing the bulk_insert.
   
   So do you mean I should change my bulk insert conf like below
   
   ```
   CREATE TABLE 2hudi_sink(
   uid STRING PRIMARY KEY NOT ENFORCED,
   oridata STRING,
   update_time TIMESTAMP_LTZ(3)
   ) WITH (
   'table.type' = 'MERGE_ON_READ',
   'connector' = 'hudi',
   'path' = '%s',
   'write.operation' = 'bulk_insert',
   'precombine.field' = 'update_time',
   'write.tasks' = '256',
   'hoodie.index.type' = 'BUCKET',
   'hoodie.bucket.index.hash.field' = 'uid',
   'hoodie.bucket.index.num.buckets' = '256'
   )
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bithw1 opened a new issue, #8348: [SUPPORT]One question about flink hudi streaming query

2023-04-01 Thread via GitHub


bithw1 opened a new issue, #8348:
URL: https://github.com/apache/hudi/issues/8348

   Hi,
   
   I am reading at 
https://hudi.apache.org/docs/flink-quick-start-guide#streaming-query
   
   The example query is as follows:
   
   ```
   CREATE TABLE t1(
 uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
 name VARCHAR(10),
 age INT,
 ts TIMESTAMP(3),
 `partition` VARCHAR(20)
   )
   PARTITIONED BY (`partition`)
   WITH (
 'connector' = 'hudi',
 'path' = '${path}',
 'table.type' = 'MERGE_ON_READ',
 'read.streaming.enabled' = 'true',  -- this option enable the streaming 
read
 'read.start-commit' = '20210316134557', -- specifies the start commit 
instant time
 'read.streaming.check-interval' = '4' -- specifies the check interval for 
finding new source commits, default 60s.
   );
   
   -- Then query the table in stream mode
   select * from t1;
   
   ```
   I got a question about the option `read.start-commit`:
When I start to run the query for the first time, the `read.start-commit` 
specify where the query starts.Then, the query run for a while(eg, one day) and 
the query stops ,  the hudi commits time have changed many times during this 
period.
   
   When I restart the query, how could I deal with the commit time? Should I 
manually specify a newer start-commit(It is very likely that I don't know which 
commits that flink query has processed)? 
   Are there checkpoint mechanism for `read.start-commit`?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7674: Bump mysql-connector-java from 8.0.22 to 8.0.28 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7674:
URL: https://github.com/apache/hudi/pull/7674#issuecomment-1492880365

   
   ## CI report:
   
   * c00d18e74a380a8a9d6f3bba0a8cef91dd9210d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14335)
 
   * 1a2a3dec3dc7711148db3a1ac1a9bf4b6234dff5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16048)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7515: Bump gson from 2.6.2 to 2.8.9 in /packaging/hudi-cli-bundle

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7515:
URL: https://github.com/apache/hudi/pull/7515#issuecomment-1492879107

   
   ## CI report:
   
   * b5389217bd15707ddd4de7036b439587bb1e5bfb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15410)
 
   * c1b20713d7ad85c536f38407082b0e9f183325ed Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16046)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7674: Bump mysql-connector-java from 8.0.22 to 8.0.28 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7674:
URL: https://github.com/apache/hudi/pull/7674#issuecomment-1492879147

   
   ## CI report:
   
   * c00d18e74a380a8a9d6f3bba0a8cef91dd9210d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14335)
 
   * 1a2a3dec3dc7711148db3a1ac1a9bf4b6234dff5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8347: [HUDI-5173] Fix new config naming around single file group clustering

2023-04-01 Thread via GitHub


hudi-bot commented on PR #8347:
URL: https://github.com/apache/hudi/pull/8347#issuecomment-1492868731

   
   ## CI report:
   
   * 1cb0d7904acccd83eabd47b28841dd159be9711e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16045)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7515: Bump gson from 2.6.2 to 2.8.9 in /packaging/hudi-cli-bundle

2023-04-01 Thread via GitHub


hudi-bot commented on PR #7515:
URL: https://github.com/apache/hudi/pull/7515#issuecomment-1492868419

   
   ## CI report:
   
   * b5389217bd15707ddd4de7036b439587bb1e5bfb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15410)
 
   * c1b20713d7ad85c536f38407082b0e9f183325ed UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch dependabot/maven/hudi-platform-service/hudi-metaserver/hudi-metaserver-server/mysql-mysql-connector-java-8.0.28 updated (c00d18e74a3 -> 1a2a3dec3dc)

2023-04-01 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch 
dependabot/maven/hudi-platform-service/hudi-metaserver/hudi-metaserver-server/mysql-mysql-connector-java-8.0.28
in repository https://gitbox.apache.org/repos/asf/hudi.git


omit c00d18e74a3 Bump mysql-connector-java
 add d2a3d11977d [MINOR] Add database config for flink (#7682)
 add 6cb4580defc [HUDI-4710] Fix flaky: 
TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue (#7571)
 add c9bc03ed868 [HUDI-4148] Add client for Hudi table service manager 
(TSM) (#6732)
 add ec5022b4fdd [MINOR] Unify naming for record merger (#7660)
 add 27a8866a1e1 [HUDI-5433] Fix the way we deduce the pending instants for 
MDT writes (#7544)
 add 124ab5fda66 [HUDI-5488] Make sure Disrupt queue start first, then 
insert records (#7582)
 add 43d23a49a02 [HUDI-5577] Validate option catalog.path in dfs mode 
(#7698)
 add 9ee36de98ab [Minor] add missing link and fix typo (#7696)
 add f8028a400eb [HUDI-4911][HUDI-3301] Fixing 
`HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup (#6782)
 add 86be8554820 [HUDI-5336] Fixing log file pattern match to ignore 
extraneous files (#7612)
 add 0e9bb024fb9 [HUDI-5559] Support CDC for flink bounded source (#7677)
 add e62b9da66b6 [HUDI-5516] Reduce memory footprint on workload with 
thousand active partitions (#7626)
 add b1552eff7af [HUDI-5384] Adding optimization rule to appropriately push 
down filters into the `HoodieFileIndex` (#7423)
 add 6f4c60e1835 [HUDI-5589] Fix Hudi config inference (#7713)
 add f0f8d618b30 [HUDI-5499] Fixing Spark SQL configs not being properly 
propagated for CTAS and other commands (#7607)
 add d03c8f9c155 [MINOR] Disable async clean in 
testCleanerDeleteReplacedDataWithArchive (#7721)
 add 6f6652a84aa [HUDI-5407][HUDI-5408] Fixing rollback in MDT to be eager 
(#7490)
 add febff4afd2a [HUDI-5417] support to read avro from non-legacy map/list 
in parquet log (#7512)
 add a70355f4457 [HUDI-5579] Fixing Kryo registration to be properly wired 
into Spark sessions (#7702)
 add 1d28d51d00c [HUDI-5596] Fix hudi-cli-bundle startup issue with gson 
(#7728)
 add 6593e8ba9d9 [minor] Fix flink 1.15 build profile (#7731)
 add c18d6153e10 [HUDI-1575] Early Conflict Detection For Multi-writer 
(#6133)
 add 811dcc591cd [MINOR] Eliminating Kryo from `hudi-integ-test-bundle` 
since it's being used w/ Spark (#7735)
 add 8917971e9f7 [HUDI-] Set class-loader in parquet data block (#7670)
 add 4f6b831ea11 [HUDI-5235] Clustering target size should larger than 
small file limit (#7232)
 add fc1831b22c3 Fixing FS `InputStream` leaks (#7741)
 add 5e3ca834366 [MINOR] Fixing `TestStructuredStreaming` test (#7736)
 add 146f39d49e5 [HUDI-5593] Fixing deadlocks due to async cleaner awaiting 
for lock while main thread is acquired the lock and awaiting for async cleaner 
to finish (#7739)
 add d439fab2421 [HUDI-3673] Clean up hbase shading dependencies (#7371)
 add 2fc20c186b7 [HUDI-5575] Adding/Fixing auto generation of record keys 
w/ hudi (#7726)
 add 25afb357df9 [HUDI-5401] Ensure user-provided hive metastore uri is set 
in HiveConf if not already set (#7543)
 add 20969c26059 [HUDI-5392] Fixing Bootstrapping flow handling of arrays 
(#7461)
 add 26b719a7fba [HUDI-2608] Support json schema in SchemaRegistryProvider 
(#7727)
 add 1769ff8fb90 [HUDI-5443] Fixing exception trying to read MOR table 
after `NestedSchemaPruning` rule has been applied (#7528)
 add a79f8093755 [HUDI-5582] Do not let users override internal metadata 
configs (#7709)
 add 65044d38fbe [HUDI-2118] Skip checking corrupt log blocks for 
transactional write file systems (#6830)
 add f49b5d34342 [HUDI-4991] Allow kafka-like configs to set truststore and 
keystore for the SchemaProvider
 add 2f22b07385f [HUDI-5276] Fix getting partition paths under relative 
paths (#7744)
 add c95abd3213f Revert "[HUDI-5575] Adding/Fixing auto generation of 
record keys w/ hudi (#7726)" (#7747)
 add 8c640b51331 [HUDI-5610] Fix hudi-cli-bundle startup conflict for spark 
3.2.0 (#7746)
 add 31b02d1798e [HUDI-5594] Add metaserver bundle validation (#7722)
 add e00ee2d88e2 [HUDI-5620] Fix metaserver bundle validation (#7749)
 add 7e35874c7ba [HUDI-5617] Rename configs for async conflict detector for 
clarity (#7750)
 add d4dcb3d1190 [HUDI-5618] Add `since version` to new configs for 0.13.0 
release (#7751)
 add 3a08bdc3f97 [HUDI-5363] Removing default value for shuffle parallelism 
configs (#7723)
 add 2e59c8e6b9d [MINOR] Correct RFC numbering (#7754)
 add 98643c63056 [HUDI-5380] Fixing change table path but table location in 
metastore … (#7445)
 add 45da30dc3ec [HUDI-5485] Add File System View API for batch listing and 
improve savepoint performance with metadata table (#7690)
 add e969a4c7848 [HUDI-5592] 

[GitHub] [hudi] yihua merged pull request #6871: Bump protobuf-java from 3.21.5 to 3.21.7

2023-04-01 Thread via GitHub


yihua merged PR #6871:
URL: https://github.com/apache/hudi/pull/6871


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (c7397013f64 -> 9c97340e001)

2023-04-01 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from c7397013f64 [HUDI-5954] Infer cleaning policy based on clean configs 
(#8238)
 add 9c97340e001 Bump protobuf-java from 3.21.5 to 3.21.7 (#6871)

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



  1   2   >