date:20230601

[GitHub] [hudi] linfey90 commented on a diff in pull request #8865: [HUDI-6306] dynamic catalog parameter

2023-06-01 Thread via GitHub



linfey90 commented on code in PR #8865:
URL: https://github.com/apache/hudi/pull/8865#discussion_r1213956512


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/HadoopConfigurations.java:
##
@@ -63,6 +63,7 @@ public static org.apache.hadoop.conf.Configuration 
getHiveConf(Configuration con
 if (explicitDir != null) {
   hadoopConf.addResource(new Path(explicitDir, "hive-site.xml"));
 }
+conf.toMap().forEach(hadoopConf::set);
 return hadoopConf;

Review Comment:
   Not only the configuration of hive-site.xml, but also the storage 
configuration is also wanted to be passed in dynamically to the bottom layer 
because the two configurations are not separated.The goal is for the 
upper-layer application to be able to dynamically switch the catalog and switch 
the underlying storage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-6308) add num_commits_after_last_request to flink

2023-06-01 Thread eric (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eric closed HUDI-6308.
--
Resolution: Not A Problem

> add num_commits_after_last_request to flink
> ---
>
> Key: HUDI-6308
> URL: https://issues.apache.org/jira/browse/HUDI-6308
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #8867: [HUDI-6307] Sync TIMESTAMP_MILLIS to hive

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8867:
URL: https://github.com/apache/hudi/pull/8867#issuecomment-1573191026

   
   ## CI report:
   
   * 7e24575b30fc34d8174a74a98431c6e1f42bef7c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17569)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] eric9204 commented on a diff in pull request #8871: [HUDI-6308]add num_commits_after_last_request to flink

2023-06-01 Thread via GitHub



eric9204 commented on code in PR #8871:
URL: https://github.com/apache/hudi/pull/8871#discussion_r1213950388


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -642,6 +642,7 @@ private FlinkOptions() {
   public static final String TIME_ELAPSED = "time_elapsed";
   public static final String NUM_AND_TIME = "num_and_time";
   public static final String NUM_OR_TIME = "num_or_time";
+  public static final String NUM_COMMITS_AFTER_LAST_REQUEST = 
"num_commits_after_last_request";
   @AdvancedConfig
   public static final ConfigOption COMPACTION_TRIGGER_STRATEGY = 
ConfigOptions

Review Comment:
   Got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] eric9204 closed pull request #8871: [HUDI-6308]add num_commits_after_last_request to flink

2023-06-01 Thread via GitHub



eric9204 closed pull request #8871: [HUDI-6308]add 
num_commits_after_last_request to flink
URL: https://github.com/apache/hudi/pull/8871


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #8871: [HUDI-6308]add num_commits_after_last_request to flink

2023-06-01 Thread via GitHub



danny0405 commented on code in PR #8871:
URL: https://github.com/apache/hudi/pull/8871#discussion_r1213944708


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -642,6 +642,7 @@ private FlinkOptions() {
   public static final String TIME_ELAPSED = "time_elapsed";
   public static final String NUM_AND_TIME = "num_and_time";
   public static final String NUM_OR_TIME = "num_or_time";
+  public static final String NUM_COMMITS_AFTER_LAST_REQUEST = 
"num_commits_after_last_request";
   @AdvancedConfig
   public static final ConfigOption COMPACTION_TRIGGER_STRATEGY = 
ConfigOptions

Review Comment:
   `NUM_COMMITS_AFTER_LAST_REQUEST` this variable is not used by any other code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] eric9204 commented on a diff in pull request #8871: [HUDI-6308]add num_commits_after_last_request to flink

2023-06-01 Thread via GitHub



eric9204 commented on code in PR #8871:
URL: https://github.com/apache/hudi/pull/8871#discussion_r1213939610


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -642,6 +642,7 @@ private FlinkOptions() {
   public static final String TIME_ELAPSED = "time_elapsed";
   public static final String NUM_AND_TIME = "num_and_time";
   public static final String NUM_OR_TIME = "num_or_time";
+  public static final String NUM_COMMITS_AFTER_LAST_REQUEST = 
"num_commits_after_last_request";
   @AdvancedConfig
   public static final ConfigOption COMPACTION_TRIGGER_STRATEGY = 
ConfigOptions

Review Comment:
   @danny0405 Thank you for your reply. 
   
   By adding this parameter 
`'compaction.trigger.strategy'='num_commits_after_last_request'`, in my test, 
the job can generate a compaction plan for every fixed number of `deltacommits`.
   
   I don't understand that "The variable may not be used anywhere". Is there a 
problem with this compaction trigger strategy in some scenarios?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8759: Add metrics counters for compaction requested/completed events.

2023-06-01 Thread via GitHub



SteNicholas commented on code in PR #8759:
URL: https://github.com/apache/hudi/pull/8759#discussion_r1213933746


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/compact/TestHoodieCompactor.java:
##
@@ -129,6 +152,10 @@ public void testCompactionEmpty() {
   String compactionInstantTime = 
HoodieActiveTimeline.createNewInstantTime();
   Option plan = table.scheduleCompaction(context, 
compactionInstantTime, Option.empty());
   assertFalse(plan.isPresent(), "If there is nothing to compact, result 
will be empty");
+
+  // Verify compaction.requested, compaction.completed metrics counts.
+  assertEquals(0, getCompactionMetricCount("counter", 
"compaction.requested"));

Review Comment:
   @amrishlal, you could use  const variable `REQUESTED_COMPACTION_EXTENSION` 
in `HoodieTimeline`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] SteNicholas commented on pull request #8759: Add metrics counters for compaction requested/completed events.

2023-06-01 Thread via GitHub



SteNicholas commented on PR #8759:
URL: https://github.com/apache/hudi/pull/8759#issuecomment-1573162218

   @amrishlal, could you create a JIRA ticket or issue for this metric 
introduction? Meanwhile, you should update the title of this pull request, 
otherwise this pull request fails to validate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-6241) HIVE_SYNC_TABLE_STRATEGY in HiveSyncConfigHolder Documentation fix

2023-06-01 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-6241:
-
Fix Version/s: 0.14.0

> HIVE_SYNC_TABLE_STRATEGY in HiveSyncConfigHolder Documentation fix
> --
>
> Key: HUDI-6241
> URL: https://issues.apache.org/jira/browse/HUDI-6241
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> document: Hive table synchronization strategy. Available option: ONLY_RO, 
> ONLY_RT, ALL. 
>  
> ONLY_RO,ONLY_RT need to change to RO and RT



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] stream2000 commented on a diff in pull request #8745: [HUDI-6182] Hive sync use state transient time to avoid losing partit…

2023-06-01 Thread via GitHub



stream2000 commented on code in PR #8745:
URL: https://github.com/apache/hudi/pull/8745#discussion_r1213873676


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##
@@ -298,6 +298,22 @@ protected void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat,
 LOG.info("Sync complete for " + tableName);
   }
 
+  private boolean needToSyncAllPartitions(Option lastCommitTimeSynced) 
{
+if (!lastCommitTimeSynced.isPresent()) {
+  return true;
+}
+if (config.getBoolean(META_SYNC_USE_STATE_TRANSIENT_TIME)) {
+  // If we use state transient time to sync partitions and the last commit 
time synced is before latest archive time
+  // We need to fall back to list all partitions instead of load the whole 
archive timeline
+  Option latestArchiveTime = syncClient.getLastArchiveTime();

Review Comment:
   Thanks for your advice~Will take a look at that pr and see if we can get rid 
of introducing the config `META_SYNC_USE_STATE_TRANSIENT_TIME `



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8758: [HUDI-53] Implementation of record_index - a HUDI index based on the metadata table.

2023-06-01 Thread via GitHub



nsivabalan commented on code in PR #8758:
URL: https://github.com/apache/hudi/pull/8758#discussion_r1213863164


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java:
##
@@ -351,6 +347,7 @@ public void run() {
 .filterCompletedInstants().filter(i -> 
i.getTimestamp().equals(instantTime)).firstInstant();
 instant = currentInstant.orElse(instant);
 // so that timeline is not reloaded very frequently
+// TODO: this does not handle the case that the commit has indeed 
failed. Maybe use HB detection here.

Review Comment:
   can we file follow up tickets for all these please?



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java:
##
@@ -338,14 +338,20 @@ protected void initMetadataTable(Option 
instantTime) {
* @param inFlightInstantTimestamp - The in-flight action responsible for 
the metadata table initialization
*/
   private void initializeMetadataTable(Option 
inFlightInstantTimestamp) {
-if (config.isMetadataTableEnabled()) {
-  HoodieTableMetadataWriter writer = 
SparkHoodieBackedTableMetadataWriter.create(context.getHadoopConf().get(), 
config,
-  context, Option.empty(), inFlightInstantTimestamp);
-  try {
-writer.close();
-  } catch (Exception e) {
-throw new HoodieException("Failed to instantiate Metadata table ", e);
+if (!config.isMetadataTableEnabled()) {
+  LOG.error("11");

Review Comment:
   lets fix these unintended changes



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/SparkMetadataTableRecordIndex.java:
##
@@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.index;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieData;
+import org.apache.hudi.common.data.HoodiePairData;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieAvroRecord;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordGlobalLocation;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.config.HoodieIndexConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.data.HoodieJavaPairRDD;
+import org.apache.hudi.data.HoodieJavaRDD;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.exception.TableNotFoundException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataUtil;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.PairFlatMapFunction;
+import org.apache.spark.sql.execution.PartitionIdPassthrough;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import scala.Tuple2;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN;
+
+/**
+ * Hoodie Index implementation backed by the record index present in the 
Metadata Table.
+ */
+public class SparkMetadataTableRecordIndex extends HoodieIndex 
{
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(SparkMetadataTableRecordIndex.class);
+  // The index to fallback upon when record index is not initialized yet.
+  // This should be a global index like record index so that the behavior of 
tagging across partitions is not changed.
+  private static final

[GitHub] [hudi] danny0405 commented on a diff in pull request #8745: [HUDI-6182] Hive sync use state transient time to avoid losing partit…

2023-06-01 Thread via GitHub



danny0405 commented on code in PR #8745:
URL: https://github.com/apache/hudi/pull/8745#discussion_r1213871041


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##
@@ -298,6 +298,22 @@ protected void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat,
 LOG.info("Sync complete for " + tableName);
   }
 
+  private boolean needToSyncAllPartitions(Option lastCommitTimeSynced) 
{
+if (!lastCommitTimeSynced.isPresent()) {
+  return true;
+}
+if (config.getBoolean(META_SYNC_USE_STATE_TRANSIENT_TIME)) {
+  // If we use state transient time to sync partitions and the last commit 
time synced is before latest archive time
+  // We need to fall back to list all partitions instead of load the whole 
archive timeline
+  Option latestArchiveTime = syncClient.getLastArchiveTime();

Review Comment:
   I believe we can get rid of the config option 
`META_SYNC_USE_STATE_TRANSIENT_TIME` if we keep both the start time(instant 
time) and max completion time(transition time) in the HMS.
   
   We can use the instant time to check the max version id, and the completion 
time for real sync progress. Then we can fix the 'hollow' instants that are 
missed, just like what I fix in commit: https://github.com/apache/hudi/pull/8611



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8871: [HUDI-6308]add num_commits_after_last_request to flink

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8871:
URL: https://github.com/apache/hudi/pull/8871#issuecomment-1573073017

   
   ## CI report:
   
   * 88a2f49c4c02ce1ade13549ac56d6bf396411289 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17570)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8867: [HUDI-6307] Sync TIMESTAMP_MILLIS to hive

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8867:
URL: https://github.com/apache/hudi/pull/8867#issuecomment-1573072987

   
   ## CI report:
   
   * 6a8fa73c9e31a90f6249772b5b840acf42ae1df5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17560)
 
   * 7e24575b30fc34d8174a74a98431c6e1f42bef7c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17569)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8871: [HUDI-6308]add num_commits_after_last_request to flink

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8871:
URL: https://github.com/apache/hudi/pull/8871#issuecomment-1573067586

   
   ## CI report:
   
   * 88a2f49c4c02ce1ade13549ac56d6bf396411289 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8867: [HUDI-6307] Sync TIMESTAMP_MILLIS to hive

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8867:
URL: https://github.com/apache/hudi/pull/8867#issuecomment-1573067541

   
   ## CI report:
   
   * 6a8fa73c9e31a90f6249772b5b840acf42ae1df5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17560)
 
   * 7e24575b30fc34d8174a74a98431c6e1f42bef7c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #8871: [HUDI-6308]add num_commits_after_last_request to flink

2023-06-01 Thread via GitHub



danny0405 commented on code in PR #8871:
URL: https://github.com/apache/hudi/pull/8871#discussion_r1213865107


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -642,6 +642,7 @@ private FlinkOptions() {
   public static final String TIME_ELAPSED = "time_elapsed";
   public static final String NUM_AND_TIME = "num_and_time";
   public static final String NUM_OR_TIME = "num_or_time";
+  public static final String NUM_COMMITS_AFTER_LAST_REQUEST = 
"num_commits_after_last_request";
   @AdvancedConfig
   public static final ConfigOption COMPACTION_TRIGGER_STRATEGY = 
ConfigOptions

Review Comment:
   The varialbe may not be used anywhere, maybe we should just fix the doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on issue #8855: [SUPPORT][FLINK SQL] Can not insert join result into hudi table

2023-06-01 Thread via GitHub



danny0405 commented on issue #8855:
URL: https://github.com/apache/hudi/issues/8855#issuecomment-1573064043

   Intreasting, can you share us the flink checkpoint configuration params?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-6293) Make HoodieFlinkCompactor's parallelism of compact_task more reasonable.

2023-06-01 Thread Danny Chen (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728544#comment-17728544
 ] 

Danny Chen commented on HUDI-6293:
--

Another fix for clustering: b36e7c459904860b0be086c144ba0b175961e805

> Make HoodieFlinkCompactor's  parallelism of compact_task more reasonable.
> -
>
> Key: HUDI-6293
> URL: https://issues.apache.org/jira/browse/HUDI-6293
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
> Attachments: image-2023-05-31-16-41-02-798.png
>
>
> !image-2023-05-31-16-41-02-798.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] danny0405 merged pull request #8866: [HUDI-6293] Make HoodieClusteringJob's parallelism of clustering_task…

2023-06-01 Thread via GitHub



danny0405 merged PR #8866:
URL: https://github.com/apache/hudi/pull/8866


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: [HUDI-6293] Make HoodieClusteringJob's parallelism of clustering_task more reasonable (#8866)

2023-06-01 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b36e7c45990 [HUDI-6293] Make HoodieClusteringJob's parallelism of 
clustering_task more reasonable (#8866)
b36e7c45990 is described below

commit b36e7c459904860b0be086c144ba0b175961e805
Author: voonhous 
AuthorDate: Fri Jun 2 10:52:04 2023 +0800

[HUDI-6293] Make HoodieClusteringJob's parallelism of clustering_task more 
reasonable (#8866)
---
 .../org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java| 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java
 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java
index 633f06b0e4f..223f85defca 100644
--- 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java
+++ 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java
@@ -310,9 +310,12 @@ public class HoodieFlinkClusteringJob {
 
   HoodieInstant instant = 
HoodieTimeline.getReplaceCommitRequestedInstant(clusteringInstant.getTimestamp());
 
+  int inputGroupSize = clusteringPlan.getInputGroups().size();
+
   // get clusteringParallelism.
   int clusteringParallelism = 
conf.getInteger(FlinkOptions.CLUSTERING_TASKS) == -1
-  ? clusteringPlan.getInputGroups().size() : 
conf.getInteger(FlinkOptions.CLUSTERING_TASKS);
+  ? inputGroupSize
+  : Math.min(conf.getInteger(FlinkOptions.CLUSTERING_TASKS), 
inputGroupSize);
 
   // Mark instant as clustering inflight
   table.getActiveTimeline().transitionReplaceRequestedToInflight(instant, 
Option.empty());

[jira] [Updated] (HUDI-6258) support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-6258:
-
Description: when we query mor table with olap engine such as 
starrocks、doris、presto，we can get data only from rt/ro table，this is beacause 
hudi did not sync meta to tablename，so need fix it to support query as above 
conditions  (was: when we query mor table with olap engine such as 
starrocks、doris、presto，we can get data only from rt/ro table，this is beacause 
hudi did not sync meta to tablename，so need fix it to support query as above 
condition)

> support olap engine query mor table in table name without ro/rt suffix
> --
>
> Key: HUDI-6258
> URL: https://issues.apache.org/jira/browse/HUDI-6258
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> when we query mor table with olap engine such as starrocks、doris、presto，we 
> can get data only from rt/ro table，this is beacause hudi did not sync meta to 
> tablename，so need fix it to support query as above conditions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6258) support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-6258:
-
Description: when we query mor table with olap engine such as 
starrocks、doris、presto，we can get data only from rt/ro table，this is beacause 
hudi did not sync meta to tablename，so need fix it to support query as above 
condition  (was: when we query mor table with olap engine such as 
starrocks、doris，we can get data only from rt/ro table，this is beacause hudi did 
not sync meta to tablename，so need fix it to support query as above condition)

> support olap engine query mor table in table name without ro/rt suffix
> --
>
> Key: HUDI-6258
> URL: https://issues.apache.org/jira/browse/HUDI-6258
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> when we query mor table with olap engine such as starrocks、doris、presto，we 
> can get data only from rt/ro table，this is beacause hudi did not sync meta to 
> tablename，so need fix it to support query as above condition



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] danny0405 commented on a diff in pull request #8830: [MINOR] auto generate init client id

2023-06-01 Thread via GitHub



danny0405 commented on code in PR #8830:
URL: https://github.com/apache/hudi/pull/8830#discussion_r1213860936


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/configuration/TestOptionsInference.java:
##
@@ -69,6 +70,12 @@ void testSetupClientId() throws Exception {
 }
   }
 
+  @Test
+  void testAutoGenerateClient() {
+  Configuration conf = getConf();
+  OptionsInference.setupClientId(conf);
+  assertNotNull(conf.getString(FlinkOptions.WRITE_CLIENT_ID), "auto 
generate client failed!");
+  }

Review Comment:
   > all writer will shared the ckp_meta
   
   How could that happen then providing the last client already sent the 
heartbeat?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on pull request #8867: [HUDI-6307] Sync TIMESTAMP_MILLIS to hive

2023-06-01 Thread via GitHub



danny0405 commented on PR #8867:
URL: https://github.com/apache/hudi/pull/8867#issuecomment-1573057917

   Hi @satishkotha can you help double check this change? I need some help for 
the background why in https://github.com/apache/hudi/pull/2129 only 
timestamp(6) is synced as timestamp in Hive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garyli1019 commented on a diff in pull request #8679: [DOCS] [RFC-69] Hudi 1.X

2023-06-01 Thread via GitHub



garyli1019 commented on code in PR #8679:
URL: https://github.com/apache/hudi/pull/8679#discussion_r1213858505


##
rfc/rfc-69/rfc-69.md:
##
@@ -0,0 +1,159 @@
+
+# RFC-69: Hudi 1.X
+
+## Proposers
+
+* Vinoth Chandar
+
+## Approvers
+
+*   Hudi PMC
+
+## Status
+
+Under Review
+
+## Abstract
+
+This RFC proposes an exciting and powerful re-imagination of the transactional 
database layer in Hudi to power continued innovation across the community in 
the coming years. We have 
[grown](https://git-contributor.com/?chart=contributorOverTime=apache/hudi)
 more than 6x contributors in the past few years, and this RFC serves as the 
perfect opportunity to clarify and align the community around a core vision. 
This RFC aims to serve as a starting point for this discussion, then solicit 
feedback, embrace new ideas and collaboratively build consensus towards an 
impactful Hudi 1.X vision, then distill down what constitutes the first release 
- Hudi 1.0.
+
+## **State of the Project**
+
+As many of you know, Hudi was originally created at Uber in 2016 to solve 
[large-scale data ingestion](https://www.uber.com/blog/uber-big-data-platform/) 
and [incremental data 
processing](https://www.uber.com/blog/ubers-lakehouse-architecture/) problems 
and later [donated](https://www.uber.com/blog/apache-hudi/) to the ASF. 
+Since its graduation as a top-level Apache project in 2020, the community has 
made impressive progress toward the [streaming data lake 
vision](https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform) 
to make data lakes more real-time and efficient with incremental processing on 
top of a robust set of platform components. 
+The most recent 0.13 brought together several notable features to empower 
incremental data pipelines, including - [_RFC-51 Change Data 
Capture_](https://github.com/apache/hudi/blob/master/rfc/rfc-51/rfc-51.md), 
more advanced indexing techniques like [_consistent hash 
indexes_](https://github.com/apache/hudi/blob/master/rfc/rfc-42/rfc-42.md) and 
+novel innovations like [_early conflict 
detection_](https://github.com/apache/hudi/blob/master/rfc/rfc-56/rfc-56.md) - 
to name a few.
+
+
+
+Today, Hudi [users](https://hudi.apache.org/powered-by) are able to solve 
end-end use cases using Hudi as a data lake platform that delivers a 
significant amount of automation on top of an interoperable open storage 
format. 
+Users can ingest incrementally from files/streaming systems/databases and 
insert/update/delete that data into Hudi tables, with a wide selection of 
performant indexes. 
+Thanks to the core design choices like record-level metadata and 
incremental/CDC queries, users are able to consistently chain the ingested data 
into downstream pipelines, with the help of strong stream processing support in 
+recent years in frameworks like Apache Spark, Apache Flink and Kafka Connect. 
Hudi's table services automatically kick in across this ingested and derived 
data to manage different aspects of table bookkeeping, metadata and storage 
layout. 
+Finally, Hudi's broad support for different catalogs and wide integration 
across various query engines mean Hudi tables can also be "batch" processed 
old-school style or accessed from interactive query engines.
+
+## **Future Opportunities**
+
+We have been adding new capabilities in the 0.x release line, but we can also 
turn the core of Hudi into a more general-purpose database experience for the 
lake. As the first kid on the lakehouse block (we called it "transactional data 
lakes" or "streaming data lakes" 
+to speak the warehouse users' and data engineers' languages, respectively), we 
made some conservative choices based on the ecosystem at that time. However, 
revisiting those choices is important to see if they still hold up.
+
+*   **Deep Query Engine Integrations:** Back then, query engines like Presto, 
Spark, Flink, Trino and Hive were getting good at queries on columnar data 
files but painfully hard to integrate into. Over time, we expected clear API 
abstractions 
+around indexing/metadata/table snapshots in the parquet/orc read paths that a 
project like Hudi can tap into to easily leverage innovations like 
Velox/PrestoDB. However, most engines preferred a separate integration - 
leading to Hudi maintaining its own Spark Datasource, 
+Presto and Trino connectors. However, this now opens up the opportunity to 
fully leverage Hudi's multi-modal indexing capabilities during query planning 
and execution.
+*   **Generalized Data Model:** While Hudi supported keys, we focused on 
updating Hudi tables as if they were a key-value store, while SQL queries ran 
on top, blissfully unchanged and unaware. Back then, generalizing the support 
for 
+keys felt premature based on where the ecosystem was, which was still doing 
large batch M/R jobs. Today, more performant, advanced engines like Apache 
Spark and Apache Flink have mature extensible SQL support that can support a 
generalized, 
+relational

[GitHub] [hudi] danny0405 commented on a diff in pull request #8865: [HUDI-6306] dynamic catalog parameter

2023-06-01 Thread via GitHub



danny0405 commented on code in PR #8865:
URL: https://github.com/apache/hudi/pull/8865#discussion_r1213858024


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/HadoopConfigurations.java:
##
@@ -63,6 +63,7 @@ public static org.apache.hadoop.conf.Configuration 
getHiveConf(Configuration con
 if (explicitDir != null) {
   hadoopConf.addResource(new Path(explicitDir, "hive-site.xml"));
 }
+conf.toMap().forEach(hadoopConf::set);
 return hadoopConf;

Review Comment:
   Seems you wanna pass around the hive config options through Flink sql 
options, this is not suggested, We prefer to config the hive properties through 
the `hive-site.xml` in the classpath.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] c-f-cooper commented on a diff in pull request #8830: [MINOR] auto generate init client id

2023-06-01 Thread via GitHub



c-f-cooper commented on code in PR #8830:
URL: https://github.com/apache/hudi/pull/8830#discussion_r1213854825


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/configuration/TestOptionsInference.java:
##
@@ -69,6 +70,12 @@ void testSetupClientId() throws Exception {
 }
   }
 
+  @Test
+  void testAutoGenerateClient() {
+  Configuration conf = getConf();
+  OptionsInference.setupClientId(conf);
+  assertNotNull(conf.getString(FlinkOptions.WRITE_CLIENT_ID), "auto 
generate client failed!");
+  }

Review Comment:
   > It should be a bug, the client still send heartbeat anyway for the 
INIT_CLIEN _ID:
   > 
   > 
https://github.com/apache/hudi/blob/00d50e91abe24aba31daa2fe2806de5414f03c77/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/ClientIds.java#L179
   
   Maybe,bug if the INIT_CLIENT_ID is empty,all writer will shared the 
ckp_meta,the risk of concurrent modification will occupy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8758: [HUDI-53] Implementation of record_index - a HUDI index based on the metadata table.

2023-06-01 Thread via GitHub



nsivabalan commented on code in PR #8758:
URL: https://github.com/apache/hudi/pull/8758#discussion_r1213814148


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -111,18 +111,27 @@ public abstract class HoodieBackedTableMetadataWriter 
implements HoodieTableMeta
 
   public static final String METADATA_COMPACTION_TIME_SUFFIX = "001";
 
+  // Virtual keys support for metadata table. This Field is
+  // from the metadata payload schema.
+  private static final String RECORD_KEY_FIELD_NAME = 
HoodieMetadataPayload.KEY_FIELD_NAME;
+
+  // Average size of a record saved within the record index.
+  // Record index has a fixed size schema. This has been calculated based on 
experiments with default settings
+  // for block size (4MB), compression (GZ) and disabling the hudi metadata 
fields.

Review Comment:
   default hfile block size on write in OSS is 1MB. Do we need to fix that? 



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -370,11 +336,10 @@ private  boolean 
isBootstrapNeeded(Option partitionInfoList = filesPartitionAvailable ? 
listAllPartitionsFromMDT(initializationTime) : 
listAllPartitionsFromFilesystem(initializationTime);
+Map> partitionToFilesMap = 
partitionInfoList.stream()
+.map(p -> {
+  String partitionName = 
HoodieTableMetadataUtil.getPartitionIdentifier(p.getRelativePath());
+  return Pair.of(partitionName, p.getFileNameToSizeMap());
+})
+.collect(Collectors.toMap(Pair::getKey, Pair::getValue));
+
+for (MetadataPartitionType partitionType : partitionsToInit) {
+  // Find the commit timestamp to use for this partition. Each 
initialization should use its own unique commit time.
+  String commitTimeForPartition = 
generateUniqueCommitInstantTime(initializationTime);
+
+  LOG.info("Initializing MDT partition " + partitionType + " at instant " 
+ commitTimeForPartition);
+
+  Pair> fileGroupCountAndRecordsPair;
+  switch (partitionType) {
+case FILES:
+  fileGroupCountAndRecordsPair = 
initializeFilesPartition(initializationTime, partitionInfoList);
+  break;
+case BLOOM_FILTERS:
+  fileGroupCountAndRecordsPair = 
initializeBloomFiltersPartition(initializationTime, partitionToFilesMap);
+  break;
+case COLUMN_STATS:
+  fileGroupCountAndRecordsPair = 
initializeColumnStatsPartition(partitionToFilesMap);
+  break;
+case RECORD_INDEX:
+  fileGroupCountAndRecordsPair = initializeRecordIndexPartition();
+  break;
+default:
+  throw new HoodieMetadataException("Unsupported MDT partition type: " 
+ partitionType);
+  }
+
+  // Generate the file groups
+  final int fileGroupCount = fileGroupCountAndRecordsPair.getKey();
+  ValidationUtils.checkArgument(fileGroupCount > 0, "FileGroup count for 
MDT partition " + partitionType + " should be > 0");
+  initializeFileGroups(dataMetaClient, partitionType, 
commitTimeForPartition, fileGroupCount);
+
+  // Perform the commit using bulkCommit
+  HoodieData records = 
fileGroupCountAndRecordsPair.getValue();
+  bulkCommit(commitTimeForPartition, partitionType, records, 
fileGroupCount);
+  metadataMetaClient.reloadActiveTimeline();
+  
dataMetaClient.getTableConfig().setMetadataPartitionState(dataMetaClient, 
partitionType, true);
+  initMetadataReader();
 }
-initializeEnabledFileGroups(dataMetaClient, createInstantTime, 
enabledPartitionTypes);
-initialCommit(createInstantTime, enabledPartitionTypes);
-updateInitializedPartitionsInTableConfig(enabledPartitionTypes);
+
 return true;
   }
 
-  private String getInitialCommitInstantTime(HoodieTableMetaClient 
dataMetaClient) {
-// If there is no commit on the dataset yet, use the SOLO_COMMIT_TIMESTAMP 
as the instant time for initial commit
-// Otherwise, we use the timestamp of the latest completed action.
-String createInstantTime = 
dataMetaClient.getActiveTimeline().filterCompletedInstants()
-
.getReverseOrderedInstants().findFirst().map(HoodieInstant::getTimestamp).orElse(SOLO_COMMIT_TIMESTAMP);
-LOG.info("Creating a new metadata table in " + 
metadataWriteConfig.getBasePath() + " at instant " + createInstantTime);
-return createInstantTime;
+  /**
+   * Returns a unique timestamp to use for initializing a MDT partition.
+   * 
+   * Since commits are immutable, we should use unique timestamps to 
initialize each partition. For this, we will add a suffix to the given 
initializationTime
+   * until we find a unique timestamp.
+   *
+   * @param initializationTime Timestamp from dataset to use for initialization
+   * @return a unique timestamp for MDT
+   */
+  private String generateUniqueCommitInstantTime(String initializationTime) {
+// Add

[GitHub] [hudi] danny0405 commented on a diff in pull request #8863: [HUDI-6305] s3a parameters cannot be filtered

2023-06-01 Thread via GitHub



danny0405 commented on code in PR #8863:
URL: https://github.com/apache/hudi/pull/8863#discussion_r1213851720


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/HadoopConfigurations.java:
##
@@ -49,8 +48,7 @@ public static org.apache.hadoop.conf.Configuration 
getParquetConf(
*/
   public static org.apache.hadoop.conf.Configuration 
getHadoopConf(Configuration conf) {
 org.apache.hadoop.conf.Configuration hadoopConf = 
FlinkClientUtil.getHadoopConf();
-Map options = 
FlinkOptions.getPropertiesWithPrefix(conf.toMap(), HADOOP_PREFIX);
-options.forEach(hadoopConf::set);
+conf.toMap().forEach(hadoopConf::set);

Review Comment:
   This could be a breaking change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-6308) add num_commits_after_last_request to flink

2023-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6308:
-
Labels: pull-request-available  (was: )

> add num_commits_after_last_request to flink
> ---
>
> Key: HUDI-6308
> URL: https://issues.apache.org/jira/browse/HUDI-6308
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] eric9204 opened a new pull request, #8871: [HUDI-6308]add num_commits_after_last_request to flink

2023-06-01 Thread via GitHub



eric9204 opened a new pull request, #8871:
URL: https://github.com/apache/hudi/pull/8871

   ### Change Logs
   
   None
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (HUDI-6258) support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread xy (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728532#comment-17728532
 ] 

xy edited comment on HUDI-6258 at 6/2/23 2:19 AM:
--

Fixed via master branch：3f9c45fdfa9b03e8092db07188b76c5931475733


was (Author: xuzifu):
master commit：3f9c45fdfa9b03e8092db07188b76c5931475733

> support olap engine query mor table in table name without ro/rt suffix
> --
>
> Key: HUDI-6258
> URL: https://issues.apache.org/jira/browse/HUDI-6258
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> when we query mor table with olap engine such as starrocks、doris，we can get 
> data only from rt/ro table，this is beacause hudi did not sync meta to 
> tablename，so need fix it to support query as above condition



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HUDI-6258) support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy resolved HUDI-6258.
--

> support olap engine query mor table in table name without ro/rt suffix
> --
>
> Key: HUDI-6258
> URL: https://issues.apache.org/jira/browse/HUDI-6258
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> when we query mor table with olap engine such as starrocks、doris，we can get 
> data only from rt/ro table，this is beacause hudi did not sync meta to 
> tablename，so need fix it to support query as above condition



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-6258) support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread xy (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728532#comment-17728532
 ] 

xy commented on HUDI-6258:
--

master commit：3f9c45fdfa9b03e8092db07188b76c5931475733

> support olap engine query mor table in table name without ro/rt suffix
> --
>
> Key: HUDI-6258
> URL: https://issues.apache.org/jira/browse/HUDI-6258
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> when we query mor table with olap engine such as starrocks、doris，we can get 
> data only from rt/ro table，this is beacause hudi did not sync meta to 
> tablename，so need fix it to support query as above condition



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] zhangyue19921010 commented on pull request #6868: [Hudi-4882] Multiple ordering fields and null value update for partial update to handle out-of-order events

2023-06-01 Thread via GitHub



zhangyue19921010 commented on PR #6868:
URL: https://github.com/apache/hudi/pull/6868#issuecomment-1573024411

   @fengjian428 would u mind to rebase master ? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stream2000 commented on a diff in pull request #8745: [HUDI-6182] Hive sync use state transient time to avoid losing partit…

2023-06-01 Thread via GitHub



stream2000 commented on code in PR #8745:
URL: https://github.com/apache/hudi/pull/8745#discussion_r1213840107


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##
@@ -298,6 +298,22 @@ protected void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat,
 LOG.info("Sync complete for " + tableName);
   }
 
+  private boolean needToSyncAllPartitions(Option lastCommitTimeSynced) 
{
+if (!lastCommitTimeSynced.isPresent()) {
+  return true;
+}
+if (config.getBoolean(META_SYNC_USE_STATE_TRANSIENT_TIME)) {
+  // If we use state transient time to sync partitions and the last commit 
time synced is before latest archive time
+  // We need to fall back to list all partitions instead of load the whole 
archive timeline
+  Option latestArchiveTime = syncClient.getLastArchiveTime();

Review Comment:
   Maybe we need to scan some archive logs here, at least those created after 
`lastCommitTimeSynced`.  Otherwise, we need to sync all partitions every time 
we do the archive. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-6308) add num_commits_after_last_request to flink

2023-06-01 Thread eric (Jira)

eric created HUDI-6308:
--

 Summary: add num_commits_after_last_request to flink
 Key: HUDI-6308
 URL: https://issues.apache.org/jira/browse/HUDI-6308
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: eric
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] Riddle4045 commented on issue #8870: [SUPPORT] Trino returns 0 rows when reading Hudi tables written by Flink 1.16

2023-06-01 Thread via GitHub



Riddle4045 commented on issue #8870:
URL: https://github.com/apache/hudi/issues/8870#issuecomment-1573003923

   > The compaction is executed async by default every 5 delta_commit on the 
table, did you have any chance to see the Parquet files already?
   
   @danny0405  no, there were total 6 commits, no compaction - is there a 
setting to toggle it, maybe it's turned off by default in Flink? I can also 
share the `.hoodie` folder if it helps you understand what's going on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #8867: [HUDI-6307] Sync TIMESTAMP_MILLIS to hive

2023-06-01 Thread via GitHub



danny0405 commented on code in PR #8867:
URL: https://github.com/apache/hudi/pull/8867#discussion_r1213826815


##
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java:
##
@@ -641,6 +643,26 @@ public void testSyncWithSchema(String syncMode, String 
enablePushDown) throws Ex
 "The last commit that was synced should be updated in the 
TBLPROPERTIES");
   }
 
+  @ParameterizedTest
+  @MethodSource("syncModeAndEnablePushDown")
+  public void testSyncTimestamp(String syncMode, String enablePushDown) throws 
Exception {
+hiveSyncProps.setProperty(HIVE_SYNC_MODE.key(), syncMode);

Review Comment:
   Can we add a test case similar with `testSchemaConvertTimestampMicros`, 
there is no need to add avro schema files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on issue #8870: [SUPPORT] Trino returns 0 rows when reading Hudi tables written by Flink 1.16

2023-06-01 Thread via GitHub



danny0405 commented on issue #8870:
URL: https://github.com/apache/hudi/issues/8870#issuecomment-1572999177

   The compaction is executed async by default every 5 delta_commit on the 
table, did you have any chance to see the Parquet files already?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-6256) fix the data table archiving and MDT cleaning config conflict

2023-06-01 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6256.

Fix Version/s: 0.14.0
   Resolution: Fixed

Fixed via master branch: 32adbe4dfb2a0976cb312c2fa14eb49f5a29a151

> fix the data table archiving and MDT cleaning config conflict
> -
>
> Key: HUDI-6256
> URL: https://issues.apache.org/jira/browse/HUDI-6256
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: yonghua jian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> fix the data table archiving and MDT cleaning config conflict



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] danny0405 merged pull request #8792: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf…

2023-06-01 Thread via GitHub



danny0405 merged PR #8792:
URL: https://github.com/apache/hudi/pull/8792


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf… (#8792)

2023-06-01 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 32adbe4dfb2 [HUDI-6256] Fix the data table archiving and MDT cleaning 
config conf… (#8792)
32adbe4dfb2 is described below

commit 32adbe4dfb2a0976cb312c2fa14eb49f5a29a151
Author: flashJd 
AuthorDate: Fri Jun 2 09:22:17 2023 +0800

[HUDI-6256] Fix the data table archiving and MDT cleaning config conf… 
(#8792)

* Fix the data table archiving and MDT cleaning config conflict
* Takes the MDT cleaning num commits as min(3, num_commits_DT), while 3 is 
the hardcode max cleaning num commits for MDT

-

Co-authored-by: Danny Chan 
---
 .../hudi/metadata/HoodieMetadataWriteUtils.java|  2 +-
 .../functional/TestHoodieBackedMetadata.java   | 40 ++
 .../client/functional/TestHoodieMetadataBase.java  |  2 +-
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java
index 5221f6523b0..df951ff3796 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java
@@ -93,7 +93,7 @@ public class HoodieMetadataWriteUtils {
 .withCleanerParallelism(parallelism)
 .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
 .withFailedWritesCleaningPolicy(failedWritesCleaningPolicy)
-.retainCommits(DEFAULT_METADATA_CLEANER_COMMITS_RETAINED)
+.retainCommits(Math.min(writeConfig.getCleanerCommitsRetained(), 
DEFAULT_METADATA_CLEANER_COMMITS_RETAINED))
 .build())
 // we will trigger archive manually, to ensure only regular writer 
invokes it
 .withArchivalConfig(HoodieArchivalConfig.newBuilder()
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
index 10b134887c4..b540f97d806 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
@@ -538,6 +538,46 @@ public class TestHoodieBackedMetadata extends 
TestHoodieMetadataBase {
 assertEquals("004", 
metadataTimeline.getCommitsTimeline().firstInstant().get().getTimestamp());
   }
 
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testMetadataArchivalCleanConfig(HoodieTableType tableType) 
throws Exception {
+init(tableType, false);
+writeConfig = getWriteConfigBuilder(true, true, false)
+.withMetadataConfig(HoodieMetadataConfig.newBuilder()
+.enable(true)
+.enableMetrics(false)
+.withMaxNumDeltaCommitsBeforeCompaction(1)
+.build())
+.withCleanConfig(HoodieCleanConfig.newBuilder()
+.retainCommits(1)
+.build())
+.withArchivalConfig(HoodieArchivalConfig.newBuilder()
+.archiveCommitsWith(2, 3)
+.build())
+.build();
+initWriteConfigAndMetatableWriter(writeConfig, true);
+
+AtomicInteger commitTime = new AtomicInteger(1);
+// Trigger 4 regular writes in data table.
+for (int i = 1; i <= 4; i++) {
+  doWriteOperation(testTable, "00" + (commitTime.getAndIncrement()), 
INSERT);
+}
+
+// The earliest deltacommit in the metadata table should be "001",
+// and the "00" init deltacommit should be archived.
+HoodieTableMetaClient metadataMetaClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataTableBasePath).build();
+HoodieActiveTimeline metadataTimeline = 
metadataMetaClient.reloadActiveTimeline();
+assertEquals("001", 
metadataTimeline.getCommitsTimeline().firstInstant().get().getTimestamp());
+
+getHoodieWriteClient(writeConfig);
+// Trigger data table archive, should archive "001", "002"
+archiveDataTable(writeConfig, 
HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(basePath).build());
+// Trigger a regular write operation. metadata timeline archival should 
kick in and catch up with data table.
+doWriteOperation(testTable, "00" + (commitTime.getAndIncrement()), 
INSERT);
+metadataTimeline = metadataMetaClient.reloadActiveTimeline();
+assertEquals("003", 
metadataTimeline.getCommitsTimeline().firstInstant().get().getTimestamp());
+  }
+

[GitHub] [hudi] XuQianJin-Stars commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread via GitHub



XuQianJin-Stars commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1572974659

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (e8ca0d4121a -> 3f9c45fdfa9)

2023-06-01 Thread forwardxu

This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from e8ca0d4121a [HUDI-6303] Bump flink version to 1.16.2 and 1.17.1 (#8861)
 add 3f9c45fdfa9 [HUDI-6258] support olap engine query mor table in table 
name without ro/rt suffix (#8795)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/hive/HiveSyncTool.java| 5 +
 .../src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java | 6 ++
 2 files changed, 11 insertions(+)

[GitHub] [hudi] XuQianJin-Stars merged pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread via GitHub



XuQianJin-Stars merged PR #8795:
URL: https://github.com/apache/hudi/pull/8795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8856: [HUDI-6300] fix file size parallelism not work when init metadata table

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8856:
URL: https://github.com/apache/hudi/pull/8856#issuecomment-1572836228

   
   ## CI report:
   
   * 2d4e285ba5ef3c5b07ec91af6ab3a2669d2b485d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17565)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] Riddle4045 commented on issue #8870: [SUPPORT] Trino returns 0 rows when reading Hudi tables written by Flink 1.16

2023-06-01 Thread via GitHub



Riddle4045 commented on issue #8870:
URL: https://github.com/apache/hudi/issues/8870#issuecomment-1572789850

   possibly related to https://github.com/apache/hudi/issues/8038
   @codope  could you help me understand how to configure the table for read 
optimized queries? or is it something that Hudi Sync tool should handle out of 
the box - Not sure why I am not seeing any rows back.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8869: Added logic to correctly verify partition keys for CustomAvroKeyGen

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8869:
URL: https://github.com/apache/hudi/pull/8869#issuecomment-1572785544

   
   ## CI report:
   
   * 54977785e91e2ee46baddd399a0d1889a323c612 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17566)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1572772807

   
   ## CI report:
   
   * 130523be1324218f56ce15ddc6ac3255e7cfcd9a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17550)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] Riddle4045 opened a new issue, #8870: [SUPPORT] Trino returns 0 rows when reading Hudi tables written by Flink 1.16

2023-06-01 Thread via GitHub



Riddle4045 opened a new issue, #8870:
URL: https://github.com/apache/hudi/issues/8870

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   TL;DR Trino returns 0 records from hudi table when I can see data in object 
store.
   
   I am writing hudi tables in ABFS  - reduced code
   
   ```java
   DataStream fares = env.addSource(new 
TaxiFareGenerator()).map(
   event -> GenericRowData.of(
   event.getRideId(),
   event.getDriverId(),
   event.getTaxiId(),
   event.getStartTime(),
   event.getTip(),
   event.getTolls(),
   event.getTotalFare()//,
   //event.getPaymentType()
   ));
   
   String targetTable = "TaxiFare";
   String outputPath = String.join("/",basePath, "hudi4");
   Map options = new HashMap<>();
   
   options.put(FlinkOptions.PATH.key(), outputPath);
   options.put(FlinkOptions.TABLE_TYPE.key(), 
HoodieTableType.MERGE_ON_READ.name());
   
   HoodiePipeline.Builder builder = HoodiePipeline.builder(targetTable)
   .column("rideId BIGINT")
   .column("driverId BIGINT")
   .column("taxiId BIGINT")
   .column("startTime BIGINT")
   .column("tip FLOAT")
   .column("tolls FLOAT")
   .column("totalFare FLOAT")
   .pk("driverId")
   .options(options);
   
   builder.sink(fares, false);
   env.execute("Hudi Table");
   ```
   
   I sync these tables to HMS using Hudi-Sync-Tool. 
   ```
   2023-06-01T13:15:09,757 INFO [main] org.apache.hudi.hive.HiveSyncTool - Sync 
complete for **hudi5_ro**
   2023-06-01T13:15:09,757 INFO [main] org.apache.hudi.hive.HiveSyncTool - 
Trying to sync hoodie table hudi5_rt with base path 
abfs://flink@.dfs.core.windows.net/flink/click_events/hudi4 of type 
MERGE_ON_READ
   2023-06-01T13:15:11,977 INFO [main] org.apache.hudi.hive.HiveSyncTool - Sync 
table hudi5_rt for the first time.
   2023-06-01T13:15:17,712 INFO [main] org.apache.hudi.hive.HiveSyncTool - Last 
commit time synced was found to be null
   2023-06-01T13:15:17,712 INFO [main] org.apache.hudi.hive.HiveSyncTool - Sync 
all partitions given the last commit time synced is empty or before the start 
of the active timeline. Listing all partitions in 
abfs://flink@.dfs.core.windows.net/flink/click_events/hudi4, file system: 
AzureBlobFileSystem{uri=abfs://flink@.dfs.core.windows.net, user='ispatw', 
primaryUserGroup='ispatw'}
   2023-06-01T13:15:24,755 INFO [main] org.apache.hudi.hive.HiveSyncTool - Sync 
complete for **hudi5_rt**
   2023-06-01T13:15:24,761 INFO [main] 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Closed a connection to 
metastore, current connections: 0
   ```
   
   I can see data streaming into the ABFS location 
   
![image](https://github.com/apache/hudi/assets/3648351/66b233bf-6b14-43b0-a462-d903895ad664)
   
   When I try to query it using Trino my tables have no records
   
![image](https://github.com/apache/hudi/assets/3648351/5d85d470-a818-450e-997a-f79ab4158475)
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.13
   * trino : 410
   * Storage (HDFS/S3/GCS..) : ABFS
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8758: [HUDI-53] Implementation of record_index - a HUDI index based on the metadata table.

2023-06-01 Thread via GitHub



nsivabalan commented on code in PR #8758:
URL: https://github.com/apache/hudi/pull/8758#discussion_r1213649019


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java:
##
@@ -41,35 +44,23 @@ public interface HoodieTableMetadataWriter extends 
Serializable, AutoCloseable {
* @param engineContext
* @param indexPartitionInfos - information about partitions to build such 
as partition type and base instant time
*/
-  void buildMetadataPartitions(HoodieEngineContext engineContext, 
List indexPartitionInfos);
-
-  /**
-   * Initialize file groups for the given metadata partitions when indexing is 
requested.
-   *
-   * @param dataMetaClient - meta client for the data table
-   * @param metadataPartitions - metadata partitions for which file groups 
needs to be initialized
-   * @param instantTime- instant time of the index action
-   * @throws IOException
-   */
-  void initializeMetadataPartitions(HoodieTableMetaClient dataMetaClient, 
List metadataPartitions, String instantTime) throws 
IOException;
+  void buildMetadataPartitions(HoodieEngineContext engineContext, 
List indexPartitionInfos) throws IOException;
 
   /**
* Drop the given metadata partitions.
*
-   * @param metadataPartitions
-   * @throws IOException
+   * @param metadataPartitions List of MDT partitions to drop
+   * @throws IOException on failures
*/
   void dropMetadataPartitions(List metadataPartitions) 
throws IOException;
 
   /**
* Update the metadata table due to a COMMIT operation.
*
-   * @param commitMetadata   commit metadata of the operation of interest.
-   * @param instantTime  instant time of the commit.
-   * @param isTableServiceAction true if caller is a table service. false 
otherwise. Only regular write operations can trigger metadata table services 
and this argument
-   * will assist in this.
+   * @param commitMetadata commit metadata of the operation of interest.
+   * @param instantTimeinstant time of the commit.
*/
-  void update(HoodieCommitMetadata commitMetadata, String instantTime, boolean 
isTableServiceAction);
+  void update(HoodieCommitMetadata commitMetadata, HoodieData 
writeStatuses, String instantTime);

Review Comment:
   nope. previously we were using this just to trigger compaction and cleaning 
within the update(). but now, we have made the performTableService as a 
separate method and is invoked separately. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8574:
URL: https://github.com/apache/hudi/pull/8574#issuecomment-1572681066

   
   ## CI report:
   
   * f71ca7ad4339c60719c97f3d54339b6a7bd5205f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17564)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8868: [HUDI-6278] Fixed the use of DynamoDBLockConfig class

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8868:
URL: https://github.com/apache/hudi/pull/8868#issuecomment-1572673768

   
   ## CI report:
   
   * b6f86c770f7e35d7488cff0066d2d760453eb931 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17563)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1572673644

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * 0328e76358dd170d62b94fd286a9ffb728516429 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17562)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] parisni commented on pull request #8740: [HUDI-6231] Handle glue comments

2023-06-01 Thread via GitHub



parisni commented on PR #8740:
URL: https://github.com/apache/hudi/pull/8740#issuecomment-1572624810

   > @parisni Hi, do we have plan to push-forward this feature?
   
   Yes I do. I am currently in vacations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-6253) Treat full bootstrap table as regular table

2023-06-01 Thread Jonathan Vexler (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler reassigned HUDI-6253:
-

Assignee: (was: Jonathan Vexler)

> Treat full bootstrap table as regular table
> ---
>
> Key: HUDI-6253
> URL: https://issues.apache.org/jira/browse/HUDI-6253
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: bootstrap
>Reporter: Jonathan Vexler
>Priority: Major
>
> Bootstrap tables have a performance hit compared to regular tables. If you 
> bootstrap with full bootstrap mode, I think we should just treat the table 
> like a regular table. I think the easiest way to do this would be to prevent 
> setting bootstrap base path in the tableconfig. If that isn't possible, then 
> we could add another table config stating if it has metadata only bootstrap 
> files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] codope commented on a diff in pull request #8758: [HUDI-53] Implementation of record_index - a HUDI index based on the metadata table.

2023-06-01 Thread via GitHub



codope commented on code in PR #8758:
URL: https://github.com/apache/hudi/pull/8758#discussion_r1213269335


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java:
##
@@ -99,7 +90,25 @@ public interface HoodieTableMetadataWriter extends 
Serializable, AutoCloseable {
* Deletes the given metadata partitions. This path reuses DELETE_PARTITION 
operation.
*
* @param instantTime - instant time when replacecommit corresponding to the 
drop will be recorded in the metadata timeline
-   * @param partitions - list of {@link MetadataPartitionType} to drop
+   * @param partitions  - list of {@link MetadataPartitionType} to drop
*/
   void deletePartitions(String instantTime, List 
partitions);
+
+  /**
+   * It returns write client for metadata table.
+   */
+  BaseHoodieWriteClient getWriteClient();

Review Comment:
   rename to `getMetadataTableWriteClient` for clarity?



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java:
##
@@ -99,7 +90,25 @@ public interface HoodieTableMetadataWriter extends 
Serializable, AutoCloseable {
* Deletes the given metadata partitions. This path reuses DELETE_PARTITION 
operation.
*
* @param instantTime - instant time when replacecommit corresponding to the 
drop will be recorded in the metadata timeline
-   * @param partitions - list of {@link MetadataPartitionType} to drop
+   * @param partitions  - list of {@link MetadataPartitionType} to drop
*/
   void deletePartitions(String instantTime, List 
partitions);
+
+  /**
+   * It returns write client for metadata table.
+   */
+  BaseHoodieWriteClient getWriteClient();
+
+  /**
+   * Returns true if the metadata table is initialized.
+   */
+  boolean isInitialized();

Review Comment:
   Is it needed? Can we not get this from table config? If MDT is initialized 
then we should have some MDT partition as value for 
`hoodie.table.metadata.partitions` or  
`hoodie.table.metadata.partitions.inflight` right?



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java:
##
@@ -41,35 +44,23 @@ public interface HoodieTableMetadataWriter extends 
Serializable, AutoCloseable {
* @param engineContext
* @param indexPartitionInfos - information about partitions to build such 
as partition type and base instant time
*/
-  void buildMetadataPartitions(HoodieEngineContext engineContext, 
List indexPartitionInfos);
-
-  /**
-   * Initialize file groups for the given metadata partitions when indexing is 
requested.
-   *
-   * @param dataMetaClient - meta client for the data table
-   * @param metadataPartitions - metadata partitions for which file groups 
needs to be initialized
-   * @param instantTime- instant time of the index action
-   * @throws IOException
-   */
-  void initializeMetadataPartitions(HoodieTableMetaClient dataMetaClient, 
List metadataPartitions, String instantTime) throws 
IOException;
+  void buildMetadataPartitions(HoodieEngineContext engineContext, 
List indexPartitionInfos) throws IOException;
 
   /**
* Drop the given metadata partitions.
*
-   * @param metadataPartitions
-   * @throws IOException
+   * @param metadataPartitions List of MDT partitions to drop
+   * @throws IOException on failures
*/
   void dropMetadataPartitions(List metadataPartitions) 
throws IOException;
 
   /**
* Update the metadata table due to a COMMIT operation.
*
-   * @param commitMetadata   commit metadata of the operation of interest.
-   * @param instantTime  instant time of the commit.
-   * @param isTableServiceAction true if caller is a table service. false 
otherwise. Only regular write operations can trigger metadata table services 
and this argument
-   * will assist in this.
+   * @param commitMetadata commit metadata of the operation of interest.
+   * @param instantTimeinstant time of the commit.
*/
-  void update(HoodieCommitMetadata commitMetadata, String instantTime, boolean 
isTableServiceAction);
+  void update(HoodieCommitMetadata commitMetadata, HoodieData 
writeStatuses, String instantTime);

Review Comment:
   Why remove `isTableServiceAction`? Wouldn't we want to distinguish the 
update call due to regular ingestion writer from table service writer?



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -18,14 +18,19 @@
 
 package org.apache.hudi.metadata;
 
+import org.apache.avro.specific.SpecificRecordBase;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;

Review Comment:
   nit: if we can avoid re-ordering imports, it would make review easier. Also, 
I think we put hudi imports first.



##

[GitHub] [hudi] hudi-bot commented on pull request #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8526:
URL: https://github.com/apache/hudi/pull/8526#issuecomment-1572527934

   
   ## CI report:
   
   * 0f2f4ddd192879cdc6a9c91aa2b2c5c6813ab490 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16511)
 
   * 673f502686ebf316ab9f6ba802fd318e5c5bd613 UNKNOWN
   * 09a1ea8789d509b6200018e60ec6911bea50bca7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prashantwason commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-01 Thread via GitHub



prashantwason commented on code in PR #8837:
URL: https://github.com/apache/hudi/pull/8837#discussion_r1213476749


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -851,26 +919,49 @@ public void update(HoodieRestoreMetadata restoreMetadata, 
String instantTime) {
*/
   @Override
   public void update(HoodieRollbackMetadata rollbackMetadata, String 
instantTime) {
-if (enabled && metadata != null) {
-  // Is this rollback of an instant that has been synced to the metadata 
table?
-  String rollbackInstant = rollbackMetadata.getCommitsRollback().get(0);
-  boolean wasSynced = 
metadataMetaClient.getActiveTimeline().containsInstant(new HoodieInstant(false, 
HoodieTimeline.DELTA_COMMIT_ACTION, rollbackInstant));
-  if (!wasSynced) {
-// A compaction may have taken place on metadata table which would 
have included this instant being rolled back.
-// Revisit this logic to relax the compaction fencing : 
https://issues.apache.org/jira/browse/HUDI-2458
-Option latestCompaction = metadata.getLatestCompactionTime();
-if (latestCompaction.isPresent()) {
-  wasSynced = HoodieTimeline.compareTimestamps(rollbackInstant, 
HoodieTimeline.LESSER_THAN_OR_EQUALS, latestCompaction.get());
-}
+// The commit which is being rolled back on the dataset
+final String commitInstantTime = 
rollbackMetadata.getCommitsRollback().get(0);
+// Find the deltacommits since the last compaction
+Option> deltaCommitsInfo =
+
CompactionUtils.getDeltaCommitsSinceLatestCompaction(metadataMetaClient.getActiveTimeline());
+if (!deltaCommitsInfo.isPresent()) {
+  LOG.info(String.format("Ignoring rollback of instant %s at %s since 
there are no deltacommits on MDT", commitInstantTime, instantTime));
+  return;
+}
+
+// This could be a compaction or deltacommit instant (See 
CompactionUtils.getDeltaCommitsSinceLatestCompaction)
+HoodieInstant compactionInstant = deltaCommitsInfo.get().getValue();
+HoodieTimeline deltacommitsSinceCompaction = 
deltaCommitsInfo.get().getKey();
+
+// The deltacommit that will be rolled back
+HoodieInstant deltaCommitInstant = new HoodieInstant(false, 
HoodieTimeline.DELTA_COMMIT_ACTION, commitInstantTime);
+
+// The commit being rolled back should not be older than the latest 
compaction on the MDT. Compaction on MDT only occurs when all actions
+// are completed on the dataset. Hence, this case implies a rollback of 
completed commit which should actually be handled using restore.
+if (compactionInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)) {

Review Comment:
   CompactionUtils.getDeltaCommitsSinceLatestCompaction returns a Pair. The 
value in that Pair can be either a DeltaCommit instant (if no compactions 
happened) or a Commit action (if a compaction was found).
   
   We only want to check for the compaction here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prashantwason commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-01 Thread via GitHub



prashantwason commented on code in PR #8837:
URL: https://github.com/apache/hudi/pull/8837#discussion_r1213473799


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -837,10 +840,75 @@ public void update(HoodieCleanMetadata cleanMetadata, 
String instantTime) {
*/
   @Override
   public void update(HoodieRestoreMetadata restoreMetadata, String 
instantTime) {
-processAndCommit(instantTime, () -> 
HoodieTableMetadataUtil.convertMetadataToRecords(engineContext,
-metadataMetaClient.getActiveTimeline(), restoreMetadata, 
getRecordsGenerationParams(), instantTime,
-metadata.getSyncedInstantTime()), false);
-closeInternal();
+dataMetaClient.reloadActiveTimeline();
+
+// Since the restore has completed on the dataset, the latest write 
timeline instant is the one to which the
+// restore was performed. This should be always present.
+final String restoreToInstantTime = 
dataMetaClient.getActiveTimeline().getWriteTimeline()
+.getReverseOrderedInstants().findFirst().get().getTimestamp();
+
+// We cannot restore to before the oldest compaction on MDT as we don't 
have the basefiles before that time.
+Option lastCompaction = 
metadataMetaClient.getCommitTimeline().filterCompletedInstants().lastInstant();

Review Comment:
   Yes, the BaseHoodieWriteClient also has this check. It is duplicated as we 
allow update() methods on the MDT to be called from outside the write client 
path. Its safer this way I suppose though duplicated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prashantwason commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-01 Thread via GitHub



prashantwason commented on code in PR #8837:
URL: https://github.com/apache/hudi/pull/8837#discussion_r1213473799


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -837,10 +840,75 @@ public void update(HoodieCleanMetadata cleanMetadata, 
String instantTime) {
*/
   @Override
   public void update(HoodieRestoreMetadata restoreMetadata, String 
instantTime) {
-processAndCommit(instantTime, () -> 
HoodieTableMetadataUtil.convertMetadataToRecords(engineContext,
-metadataMetaClient.getActiveTimeline(), restoreMetadata, 
getRecordsGenerationParams(), instantTime,
-metadata.getSyncedInstantTime()), false);
-closeInternal();
+dataMetaClient.reloadActiveTimeline();
+
+// Since the restore has completed on the dataset, the latest write 
timeline instant is the one to which the
+// restore was performed. This should be always present.
+final String restoreToInstantTime = 
dataMetaClient.getActiveTimeline().getWriteTimeline()
+.getReverseOrderedInstants().findFirst().get().getTimestamp();
+
+// We cannot restore to before the oldest compaction on MDT as we don't 
have the basefiles before that time.
+Option lastCompaction = 
metadataMetaClient.getCommitTimeline().filterCompletedInstants().lastInstant();

Review Comment:
   Yes, the BaseHoodieWriteClient also has this check. It is duplicated as we 
allow update() methods on the MDT to be called from outside. Its safer this way 
I suppose though duplicated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prashantwason commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-01 Thread via GitHub



prashantwason commented on code in PR #8837:
URL: https://github.com/apache/hudi/pull/8837#discussion_r1213470892


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -669,32 +669,51 @@ public void restoreToSavepoint() {
* @param savepointTime Savepoint time to rollback to
*/
   public void restoreToSavepoint(String savepointTime) {
-boolean initialMetadataTableIfNecessary = config.isMetadataTableEnabled();
-if (initialMetadataTableIfNecessary) {
+boolean initializeMetadataTableIfNecessary = 
config.isMetadataTableEnabled();
+if (initializeMetadataTableIfNecessary) {
   try {
-// Delete metadata table directly when users trigger savepoint 
rollback if mdt existed and beforeTimelineStarts
+// Delete metadata table directly when users trigger savepoint 
rollback if mdt existed and if the savePointTime is beforeTimelineStarts
+// or before the oldest compaction on MDT.
+// We cannot restore to before the oldest compaction on MDT as we 
don't have the basefiles before that time.
 String metadataTableBasePathStr = 
HoodieTableMetadata.getMetadataTableBasePath(config.getBasePath());
 HoodieTableMetaClient mdtClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataTableBasePathStr).build();
-// Same as HoodieTableMetadataUtil#processRollbackMetadata
+Option lastCompaction = 
mdtClient.getCommitTimeline().filterCompletedInstants().lastInstant();

Review Comment:
   Done
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prashantwason commented on a diff in pull request #8604: [HUDI-6151] Rollback previously applied commits to MDT when operations are retried.

2023-06-01 Thread via GitHub



prashantwason commented on code in PR #8604:
URL: https://github.com/apache/hudi/pull/8604#discussion_r1213467835


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java:
##
@@ -159,6 +162,13 @@ protected void commit(String instantTime, 
Maphttps://github.com/apache/hudi/pull/8684 where the new 
partition enabling has been changed to:
   1. Use bulkInsert for initial commit
   2. Always use a unique timestamp on MDT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prashantwason commented on pull request #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.

2023-06-01 Thread via GitHub



prashantwason commented on PR #8526:
URL: https://github.com/apache/hudi/pull/8526#issuecomment-1572480457

   @danny0405 PTAL again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prashantwason commented on a diff in pull request #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.

2023-06-01 Thread via GitHub



prashantwason commented on code in PR #8526:
URL: https://github.com/apache/hudi/pull/8526#discussion_r1213452295


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##
@@ -152,98 +153,107 @@ private void addShutDownHook() {
   // TODO : convert content and block length to long by using ByteBuffer, raw 
byte [] allows
   // for max of Integer size
   private HoodieLogBlock readBlock() throws IOException {
-int blockSize;
-long blockStartPos = inputStream.getPos();
-try {
-  // 1 Read the total size of the block
-  blockSize = (int) inputStream.readLong();
-} catch (EOFException | CorruptedLogFileException e) {
-  // An exception reading any of the above indicates a corrupt block
-  // Create a corrupt block by finding the next MAGIC marker or EOF
-  return createCorruptBlock(blockStartPos);
-}
-
-// We may have had a crash which could have written this block partially
-// Skip blockSize in the stream and we should either find a sync marker 
(start of the next
-// block) or EOF. If we did not find either of it, then this block is a 
corrupted block.
-boolean isCorrupted = isBlockCorrupted(blockSize);
-if (isCorrupted) {
-  return createCorruptBlock(blockStartPos);
-}
-
-// 2. Read the version for this log format
-HoodieLogFormat.LogFormatVersion nextBlockVersion = readVersion();
+long blockStartPos = 0;
+long blockSize = 0;
 
-// 3. Read the block type for a log block
-HoodieLogBlockType blockType = tryReadBlockType(nextBlockVersion);
+try {
+  blockStartPos = inputStream.getPos();
 
-// 4. Read the header for a log block, if present
+  // 1 Read the total size of the block
+  blockSize = inputStream.readLong();
+
+  // We may have had a crash which could have written this block 
partially. We are deferring the check for corrupted block so as not to pay the
+  // penalty of doing seeks + read and then re-seeks. More aggressive 
checks after reading each item as well as a final corrupted check should ensure 
we
+  // find the corrupted block eventually.
+
+  // 2. Read the version for this log format
+  HoodieLogFormat.LogFormatVersion nextBlockVersion = readVersion();
+
+  // 3. Read the block type for a log block
+  HoodieLogBlockType blockType = tryReadBlockType(nextBlockVersion);
+
+  // 4. Read the header for a log block, if present
+  Map header =
+  nextBlockVersion.hasHeader() ? 
HoodieLogBlock.getLogMetadata(inputStream) : null;
+
+  // 5. Read the content length for the content
+  // Fallback to full-block size if no content-length
+  // TODO replace w/ hasContentLength
+  long contentLength =
+  nextBlockVersion.getVersion() != 
HoodieLogFormatVersion.DEFAULT_VERSION ? (int) inputStream.readLong() : 
blockSize;
+  checkArgument(contentLength >= 0, "Content Length should be greater than 
or equal to 0 " + contentLength);
+
+  // 6. Read the content or skip content based on IO vs Memory trade-off 
by client
+  long contentPosition = inputStream.getPos();
+  boolean shouldReadLazily = readBlockLazily && 
nextBlockVersion.getVersion() != HoodieLogFormatVersion.DEFAULT_VERSION;
+  Option content = HoodieLogBlock.tryReadContent(inputStream, 
contentLength, shouldReadLazily);
+
+  // 7. Read footer if any
+  Map footer =
+  nextBlockVersion.hasFooter() ? 
HoodieLogBlock.getLogMetadata(inputStream) : null;
+
+  // 8. Read log block length, if present. This acts as a reverse pointer 
when traversing a
+  // log file in reverse
+  if (nextBlockVersion.hasLogBlockLength()) {
+long currentPos = inputStream.getPos();
+long logBlockLength = inputStream.readLong();
+if (blockSize != (logBlockLength - magicBuffer.length) || currentPos 
!= (blockStartPos + blockSize)) {
+  return createCorruptBlock(blockStartPos);
+}
+  }
 
-Map header =
-nextBlockVersion.hasHeader() ? 
HoodieLogBlock.getLogMetadata(inputStream) : null;
+  // 9. Read the log block end position in the log file
+  long blockEndPos = inputStream.getPos();
 
-// 5. Read the content length for the content
-// Fallback to full-block size if no content-length
-// TODO replace w/ hasContentLength
-int contentLength =
-nextBlockVersion.getVersion() != 
HoodieLogFormatVersion.DEFAULT_VERSION ? (int) inputStream.readLong() : 
blockSize;
+  HoodieLogBlock.HoodieLogBlockContentLocation logBlockContentLoc =
+  new HoodieLogBlock.HoodieLogBlockContentLocation(hadoopConf, 
logFile, contentPosition, contentLength, blockEndPos);
 
-// 6. Read the content or skip content based on IO vs Memory trade-off by 
client
-long contentPosition = inputStream.getPos();
-boolean shouldReadLazily = readBlockLazily && 
nextBlockVersion.getVersion() != HoodieLogFormatVersion.DEFAULT_VERSION;

[GitHub] [hudi] prashantwason commented on a diff in pull request #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.

2023-06-01 Thread via GitHub



prashantwason commented on code in PR #8526:
URL: https://github.com/apache/hudi/pull/8526#discussion_r1213451923


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##
@@ -152,98 +153,107 @@ private void addShutDownHook() {
   // TODO : convert content and block length to long by using ByteBuffer, raw 
byte [] allows
   // for max of Integer size
   private HoodieLogBlock readBlock() throws IOException {
-int blockSize;
-long blockStartPos = inputStream.getPos();
-try {
-  // 1 Read the total size of the block
-  blockSize = (int) inputStream.readLong();
-} catch (EOFException | CorruptedLogFileException e) {
-  // An exception reading any of the above indicates a corrupt block
-  // Create a corrupt block by finding the next MAGIC marker or EOF
-  return createCorruptBlock(blockStartPos);
-}
-
-// We may have had a crash which could have written this block partially
-// Skip blockSize in the stream and we should either find a sync marker 
(start of the next
-// block) or EOF. If we did not find either of it, then this block is a 
corrupted block.
-boolean isCorrupted = isBlockCorrupted(blockSize);
-if (isCorrupted) {
-  return createCorruptBlock(blockStartPos);
-}
-
-// 2. Read the version for this log format
-HoodieLogFormat.LogFormatVersion nextBlockVersion = readVersion();
+long blockStartPos = 0;
+long blockSize = 0;
 
-// 3. Read the block type for a log block
-HoodieLogBlockType blockType = tryReadBlockType(nextBlockVersion);
+try {
+  blockStartPos = inputStream.getPos();
 
-// 4. Read the header for a log block, if present
+  // 1 Read the total size of the block
+  blockSize = inputStream.readLong();
+
+  // We may have had a crash which could have written this block 
partially. We are deferring the check for corrupted block so as not to pay the
+  // penalty of doing seeks + read and then re-seeks. More aggressive 
checks after reading each item as well as a final corrupted check should ensure 
we
+  // find the corrupted block eventually.
+
+  // 2. Read the version for this log format
+  HoodieLogFormat.LogFormatVersion nextBlockVersion = readVersion();
+
+  // 3. Read the block type for a log block
+  HoodieLogBlockType blockType = tryReadBlockType(nextBlockVersion);
+
+  // 4. Read the header for a log block, if present
+  Map header =
+  nextBlockVersion.hasHeader() ? 
HoodieLogBlock.getLogMetadata(inputStream) : null;
+
+  // 5. Read the content length for the content
+  // Fallback to full-block size if no content-length
+  // TODO replace w/ hasContentLength
+  long contentLength =
+  nextBlockVersion.getVersion() != 
HoodieLogFormatVersion.DEFAULT_VERSION ? (int) inputStream.readLong() : 
blockSize;
+  checkArgument(contentLength >= 0, "Content Length should be greater than 
or equal to 0 " + contentLength);
+
+  // 6. Read the content or skip content based on IO vs Memory trade-off 
by client
+  long contentPosition = inputStream.getPos();
+  boolean shouldReadLazily = readBlockLazily && 
nextBlockVersion.getVersion() != HoodieLogFormatVersion.DEFAULT_VERSION;
+  Option content = HoodieLogBlock.tryReadContent(inputStream, 
contentLength, shouldReadLazily);
+
+  // 7. Read footer if any
+  Map footer =
+  nextBlockVersion.hasFooter() ? 
HoodieLogBlock.getLogMetadata(inputStream) : null;
+
+  // 8. Read log block length, if present. This acts as a reverse pointer 
when traversing a
+  // log file in reverse
+  if (nextBlockVersion.hasLogBlockLength()) {
+long currentPos = inputStream.getPos();
+long logBlockLength = inputStream.readLong();
+if (blockSize != (logBlockLength - magicBuffer.length) || currentPos 
!= (blockStartPos + blockSize)) {
+  return createCorruptBlock(blockStartPos);
+}
+  }
 
-Map header =
-nextBlockVersion.hasHeader() ? 
HoodieLogBlock.getLogMetadata(inputStream) : null;
+  // 9. Read the log block end position in the log file
+  long blockEndPos = inputStream.getPos();
 
-// 5. Read the content length for the content
-// Fallback to full-block size if no content-length
-// TODO replace w/ hasContentLength
-int contentLength =
-nextBlockVersion.getVersion() != 
HoodieLogFormatVersion.DEFAULT_VERSION ? (int) inputStream.readLong() : 
blockSize;
+  HoodieLogBlock.HoodieLogBlockContentLocation logBlockContentLoc =
+  new HoodieLogBlock.HoodieLogBlockContentLocation(hadoopConf, 
logFile, contentPosition, contentLength, blockEndPos);
 
-// 6. Read the content or skip content based on IO vs Memory trade-off by 
client
-long contentPosition = inputStream.getPos();
-boolean shouldReadLazily = readBlockLazily && 
nextBlockVersion.getVersion() != HoodieLogFormatVersion.DEFAULT_VERSION;

[GitHub] [hudi] prashantwason commented on pull request #8487: [HUDI-6093] Use the correct partitionToReplacedFileIds during commit.

2023-06-01 Thread via GitHub



prashantwason commented on PR #8487:
URL: https://github.com/apache/hudi/pull/8487#issuecomment-1572447665

   @nsivabalan I fixed the conflict and all tests are passing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8792: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf…

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8792:
URL: https://github.com/apache/hudi/pull/8792#issuecomment-1572436343

   
   ## CI report:
   
   * 683dc368e714ace1c44d741d642f1fe64b7910b2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17548)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17559)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-6253) Treat full bootstrap table as regular table

2023-06-01 Thread Jonathan Vexler (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler reassigned HUDI-6253:
-

Assignee: Jonathan Vexler

> Treat full bootstrap table as regular table
> ---
>
> Key: HUDI-6253
> URL: https://issues.apache.org/jira/browse/HUDI-6253
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: bootstrap
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>
> Bootstrap tables have a performance hit compared to regular tables. If you 
> bootstrap with full bootstrap mode, I think we should just treat the table 
> like a regular table. I think the easiest way to do this would be to prevent 
> setting bootstrap base path in the tableconfig. If that isn't possible, then 
> we could add another table config stating if it has metadata only bootstrap 
> files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-5987) Clustering on bootstrap table fails when row writer is disabled

2023-06-01 Thread Jonathan Vexler (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler closed HUDI-5987.
-
Resolution: Fixed

> Clustering on bootstrap table fails when row writer is disabled
> ---
>
> Key: HUDI-5987
> URL: https://issues.apache.org/jira/browse/HUDI-5987
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, table-service
>Reporter: Sagar Sumit
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> As was pointed out in 
> [https://github.com/apache/hudi/pull/8206#pullrequestreview-1345104330,] 
> clustering on bootstrap table fails when row writer is disabled. The non-row 
> writer path does not handle bootstrap file paths. An attemp to fix this was 
> made in [https://github.com/apache/hudi/pull/8289] but it only succeeds for 
> Spark 3.2+ versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] jonvex commented on a diff in pull request #8679: [DOCS] [RFC-69] Hudi 1.X

2023-06-01 Thread via GitHub



jonvex commented on code in PR #8679:
URL: https://github.com/apache/hudi/pull/8679#discussion_r1213427044


##
rfc/rfc-69/rfc-69.md:
##
@@ -0,0 +1,159 @@
+
+# RFC-69: Hudi 1.X
+
+## Proposers
+
+* Vinoth Chandar
+
+## Approvers
+
+*   Hudi PMC
+
+## Status
+
+Under Review
+
+## Abstract
+
+This RFC proposes an exciting and powerful re-imagination of the transactional 
database layer in Hudi to power continued innovation across the community in 
the coming years. We have 
[grown](https://git-contributor.com/?chart=contributorOverTime=apache/hudi)
 more than 6x contributors in the past few years, and this RFC serves as the 
perfect opportunity to clarify and align the community around a core vision. 
This RFC aims to serve as a starting point for this discussion, then solicit 
feedback, embrace new ideas and collaboratively build consensus towards an 
impactful Hudi 1.X vision, then distill down what constitutes the first release 
- Hudi 1.0.
+
+## **State of the Project**
+
+As many of you know, Hudi was originally created at Uber in 2016 to solve 
[large-scale data ingestion](https://www.uber.com/blog/uber-big-data-platform/) 
and [incremental data 
processing](https://www.uber.com/blog/ubers-lakehouse-architecture/) problems 
and later [donated](https://www.uber.com/blog/apache-hudi/) to the ASF. 
+Since its graduation as a top-level Apache project in 2020, the community has 
made impressive progress toward the [streaming data lake 
vision](https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform) 
to make data lakes more real-time and efficient with incremental processing on 
top of a robust set of platform components. 
+The most recent 0.13 brought together several notable features to empower 
incremental data pipelines, including - [_RFC-51 Change Data 
Capture_](https://github.com/apache/hudi/blob/master/rfc/rfc-51/rfc-51.md), 
more advanced indexing techniques like [_consistent hash 
indexes_](https://github.com/apache/hudi/blob/master/rfc/rfc-42/rfc-42.md) and 
+novel innovations like [_early conflict 
detection_](https://github.com/apache/hudi/blob/master/rfc/rfc-56/rfc-56.md) - 
to name a few.
+
+
+
+Today, Hudi [users](https://hudi.apache.org/powered-by) are able to solve 
end-end use cases using Hudi as a data lake platform that delivers a 
significant amount of automation on top of an interoperable open storage 
format. 
+Users can ingest incrementally from files/streaming systems/databases and 
insert/update/delete that data into Hudi tables, with a wide selection of 
performant indexes. 
+Thanks to the core design choices like record-level metadata and 
incremental/CDC queries, users are able to consistently chain the ingested data 
into downstream pipelines, with the help of strong stream processing support in 
+recent years in frameworks like Apache Spark, Apache Flink and Kafka Connect. 
Hudi's table services automatically kick in across this ingested and derived 
data to manage different aspects of table bookkeeping, metadata and storage 
layout. 
+Finally, Hudi's broad support for different catalogs and wide integration 
across various query engines mean Hudi tables can also be "batch" processed 
old-school style or accessed from interactive query engines.
+
+## **Future Opportunities**
+
+We're adding new capabilities in the 0.x release line, but we can also turn 
the core of Hudi into a more general-purpose database experience for the lake. 
As the first kid on the lakehouse block (we called it "transactional data 
lakes" or "streaming data lakes" 
+to speak the warehouse users' and data engineers' languages, respectively), we 
made some conservative choices based on the ecosystem at that time. However, 
revisiting those choices is important to see if they still hold up.
+
+*   **Deep Query Engine Integrations:** Back then, query engines like Presto, 
Spark, Trino and Hive were getting good at queries on columnar data files but 
painfully hard to integrate into. Over time, we expected clear API abstractions 
+around indexing/metadata/table snapshots in the parquet/orc read paths that a 
project like Hudi can tap into to easily leverage innovations like 
Velox/PrestoDB. However, most engines preferred a separate integration - 
leading to Hudi maintaining its own Spark Datasource, 
+Presto and Trino connectors. However, this now opens up the opportunity to 
fully leverage Hudi's multi-modal indexing capabilities during query planning 
and execution.
+*   **Generalized Data Model:** While Hudi supported keys, we focused on 
updating Hudi tables as if they were a key-value store, while SQL queries ran 
on top, blissfully unchanged and unaware. Back then, generalizing the support 
for 
+keys felt premature based on where the ecosystem was, which was still doing 
large batch M/R jobs. Today, more performant, advanced engines like Apache 
Spark and Apache Flink have mature extensible SQL support that can support a 
generalized, 
+relational data model for Hudi

[GitHub] [hudi] hudi-bot commented on pull request #8867: [HUDI-6307] Sync TIMESTAMP_MILLIS to hive

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8867:
URL: https://github.com/apache/hudi/pull/8867#issuecomment-1572398800

   
   ## CI report:
   
   * 6a8fa73c9e31a90f6249772b5b840acf42ae1df5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17560)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1572398057

   
   ## CI report:
   
   * 130523be1324218f56ce15ddc6ac3255e7cfcd9a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17550)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bvaradar commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2023-06-01 Thread via GitHub



bvaradar commented on PR #8847:
URL: https://github.com/apache/hudi/pull/8847#issuecomment-1572371115

   Sure. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jonvex commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2023-06-01 Thread via GitHub



jonvex commented on PR #8847:
URL: https://github.com/apache/hudi/pull/8847#issuecomment-1572364211

   @bvaradar do you think you would be able to review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] gamblewin commented on issue #8855: [SUPPORT][FLINK SQL] Can not insert join result into hudi table

2023-06-01 Thread via GitHub



gamblewin commented on issue #8855:
URL: https://github.com/apache/hudi/issues/8855#issuecomment-1572347014

   https://github.com/apache/hudi/assets/39117591/2def2c5a-39bc-4bfc-9d8b-0575d3fc3119;>
   it seems like it doesn't trigger a checkpoint.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1572324712

   
   ## CI report:
   
   * 130523be1324218f56ce15ddc6ac3255e7cfcd9a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8869: Added logic to correctly verify partition keys for CustomAvroKeyGen

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8869:
URL: https://github.com/apache/hudi/pull/8869#issuecomment-1572325298

   
   ## CI report:
   
   * 54977785e91e2ee46baddd399a0d1889a323c612 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17566)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8869: Added logic to correctly verify partition keys for CustomAvroKeyGen

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8869:
URL: https://github.com/apache/hudi/pull/8869#issuecomment-1572312116

   
   ## CI report:
   
   * 54977785e91e2ee46baddd399a0d1889a323c612 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8452:
URL: https://github.com/apache/hudi/pull/8452#issuecomment-1572310179

   
   ## CI report:
   
   * 8082df232089396b2a9f9be2b915e51b3645f172 UNKNOWN
   * 9e5504e078b93d1997cf901868234e36c69dd97e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17558)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] cbts-alec-johnson commented on issue #8857: [SUPPORT] Column comments not syncing to AWS Glue Catalog

2023-06-01 Thread via GitHub



cbts-alec-johnson commented on issue #8857:
URL: https://github.com/apache/hudi/issues/8857#issuecomment-1572291524

   > Guess this is what you needed: 
https://github.com/apache/hudi/pull/8740/files
   
   Yes this is what I need. Also, I think you may have tagged this gcp-support 
instead of aws-support?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ad1happy2go opened a new pull request, #8869: Added logic to correctly verify partition keys for CustomAvroKeyGen

2023-06-01 Thread via GitHub



ad1happy2go opened a new pull request, #8869:
URL: https://github.com/apache/hudi/pull/8869

   ### Change Logs
   
   Added logic to correctly verify partition keys for CustomAvroKeyGenerator
   It will fix for Github issue -  https://github.com/apache/hudi/issues/8372
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bkosuru commented on issue #8800: GCP: Hudi stopped working in Dataproc Serverless 1.1.4

2023-06-01 Thread via GitHub



bkosuru commented on issue #8800:
URL: https://github.com/apache/hudi/issues/8800#issuecomment-1572227348

   It started working with hudi 0.13.1, did you fix anything in 0.13.1 to make 
it work?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] machadoluiz commented on issue #8824: [SUPPORT] Performance and Data Integrity Issues with Hudi for Long-Term Data Retention

2023-06-01 Thread via GitHub



machadoluiz commented on issue #8824:
URL: https://github.com/apache/hudi/issues/8824#issuecomment-1572195137

   @ad1happy2go, the runtime increment happens gradually. In a specific 
example, it reached 2 minutes and 30 seconds around 300 commits (or 10 months). 
This poses a challenge for us, given it represents less than a year's worth of 
data.  Is there any way that could improve this performance, or is this a 
trade-off we must deal with?
   
   Does Hudi perform operations using actual data or just metadata in the 
background? 
   
   Does this mean that if we expand the size of the database, the cost/runtime 
will increase proportionally for managing the metadata? Or is this related only 
to the filenames, in which case this cost will be somewhat constant, regardless 
of the size of the database?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8866: [HUDI-6293] Make HoodieClusteringJob's parallelism of clustering_task…

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8866:
URL: https://github.com/apache/hudi/pull/8866#issuecomment-1572191506

   
   ## CI report:
   
   * badb098e6bd6b0ee8b317514f08eb460659a8d93 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17557)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8745: [HUDI-6182] Hive sync use state transient time to avoid losing partit…

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8745:
URL: https://github.com/apache/hudi/pull/8745#issuecomment-1572190692

   
   ## CI report:
   
   * 62377696531fc1d4ee2b7c0c86897d1cfb6b5de9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17556)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8865: [HUDI-6306] dynamic catalog parameter

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8865:
URL: https://github.com/apache/hudi/pull/8865#issuecomment-1572082961

   
   ## CI report:
   
   * 821e287f35e93974ae28f1e1e7a513c68749c281 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17554)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stream2000 commented on pull request #8745: [HUDI-6182] Hive sync use state transient time to avoid losing partit…

2023-06-01 Thread via GitHub



stream2000 commented on PR #8745:
URL: https://github.com/apache/hudi/pull/8745#issuecomment-1572050050

   @danny0405 Hi danny, could you help to review this PR? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-06-01 Thread via GitHub



xiarixiaoyao commented on code in PR #8851:
URL: https://github.com/apache/hudi/pull/8851#discussion_r1211598407


##
hudi-spark-datasource/hudi-spark3.0.x/src/main/java/org/apache/spark/sql/execution/datasources/parquet/Spark30HoodieVectorizedParquetRecordReader.java:
##
@@ -184,4 +209,62 @@ public boolean nextKeyValue() throws IOException {
 ++batchIdx;
 return true;
   }
+
+  private void initializeInternal() throws IOException, 
UnsupportedOperationException {
+// Check that the requested schema is supported.
+missingColumns = new HashMap<>();
+List columns = requestedSchema.getColumns();
+List paths = requestedSchema.getPaths();
+for (int i = 0; i < requestedSchema.getFieldCount(); ++i) {
+  String[] colPath = paths.get(i);
+  if (!fileSchema.containsPath(colPath)) {
+if (columns.get(i).getMaxDefinitionLevel() == 0) {
+  // Column is missing in data but the required data is non-nullable. 
This file is invalid.
+  throw new IOException("Required column is missing in data file. Col: 
" + Arrays.toString(colPath));
+}
+missingColumns.put(i, requestedSchema.getFields().get(i).getName());
+  }
+}
+missed = schema != null && missingColumns.keySet().stream()
+.allMatch(columnIndex -> 
Objects.nonNull(schema.findField(missingColumns.get(columnIndex)).getDefaultValue()));
+  }
+
+  private void setColumnDefaultValue(int columnIndex) {

Review Comment:
   ditto



##
hudi-spark-datasource/hudi-spark3.0.x/src/main/java/org/apache/spark/sql/execution/datasources/parquet/Spark30HoodieVectorizedParquetRecordReader.java:
##
@@ -184,4 +209,62 @@ public boolean nextKeyValue() throws IOException {
 ++batchIdx;
 return true;
   }
+
+  private void initializeInternal() throws IOException, 
UnsupportedOperationException {

Review Comment:
   Pls extract this method to reuse it for different version parquetRecordReader



##
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/SerDeHelper.java:
##
@@ -144,6 +148,11 @@ private static void toJson(Types.RecordType record, 
Integer maxColumnId, Long ve
   if (field.doc() != null) {
 generator.writeStringField(DOC, field.doc());
   }
+  // NOTE: The value of null field is JsonProperties.NULL_VALUE.
+  if (field.getDefaultValue() != null && field.getDefaultValue() != 
JsonProperties.NULL_VALUE) {
+generator.writeFieldName(DEFAULT_VALUE);
+generator.writeObject(field.getDefaultValue());

Review Comment:
   Do I need to verify the correctness of the defaultValue type before 
serialization, rather than just serializing it directly
 private static void writeDefaultValue(Types.Field field, JsonGenerator 
generator) throws IOException {
   if (field.getDefaultValue() == null || field.getDefaultValue() == 
JsonProperties.NULL_VALUE) {
 return;
   }
   switch (field.type().typeId()) {
 case RECORD:
 case ARRAY:
 case MAP:
   JsonNode defaultNode = 
JacksonUtils.toJsonNode(field.getDefaultValue());
   generator.writeObjectField(DEFAULT_VALUE, defaultNode);
   break;
 case STRING:
   generator.writeStringField(DEFAULT_VALUE, 
field.getDefaultValue().toString());
   break;
 case INT:
   generator.writeNumberField(DEFAULT_VALUE, 
Integer.valueOf(field.getDefaultValue().toString()));
   break;
 case LONG:
   generator.writeNumberField(DEFAULT_VALUE, 
Long.valueOf(field.getDefaultValue().toString()));
   break;
 case FLOAT:
   generator.writeNumberField(DEFAULT_VALUE, 
Double.valueOf(field.getDefaultValue().toString()));
   break;
 case DOUBLE:
   generator.writeNumberField(DEFAULT_VALUE, 
Double.valueOf(field.getDefaultValue().toString()));
   break;
 case BOOLEAN:
   generator.writeBooleanField(DEFAULT_VALUE, 
Boolean.valueOf(field.getDefaultValue().toString()));
   break;
 case DECIMAL:
   generator.writeBinaryField(DEFAULT_VALUE, 
(byte[])field.getDefaultValue());
   break;
 case FIXED:
   generator.writeBinaryField(DEFAULT_VALUE, 
(byte[])field.getDefaultValue());
   break;
 case BINARY:
   generator.writeBinaryField(DEFAULT_VALUE, 
(byte[])field.getDefaultValue());
   break;
 case DATE:
   generator.writeNumberField(DEFAULT_VALUE, 
Integer.valueOf(field.getDefaultValue().toString()));
   break;
 case TIME:
   generator.writeNumberField(DEFAULT_VALUE, 
Long.valueOf(field.getDefaultValue().toString()));
   break;
 case TIMESTAMP:
   generator.writeNumberField(DEFAULT_VALUE, 
Long.valueOf(field.getDefaultValue().toString()));
   break;
 case UUID:
   generator.writeStringField(DEFAULT_VALUE,

[GitHub] [hudi] hudi-bot commented on pull request #8856: [HUDI-6300] fix file size parallelism not work when init metadata table

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8856:
URL: https://github.com/apache/hudi/pull/8856#issuecomment-1572012983

   
   ## CI report:
   
   * 23a574b64681c95c17db47d4c63c86d7e0215ba9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17528)
 
   * 2d4e285ba5ef3c5b07ec91af6ab3a2669d2b485d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17565)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8574:
URL: https://github.com/apache/hudi/pull/8574#issuecomment-1572011208

   
   ## CI report:
   
   * dacba722974aa32f506626c106d90fa86d22cd23 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17375)
 
   * f71ca7ad4339c60719c97f3d54339b6a7bd5205f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17564)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8856: [HUDI-6300] fix file size parallelism not work when init metadata table

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8856:
URL: https://github.com/apache/hudi/pull/8856#issuecomment-1571998458

   
   ## CI report:
   
   * 23a574b64681c95c17db47d4c63c86d7e0215ba9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17528)
 
   * 2d4e285ba5ef3c5b07ec91af6ab3a2669d2b485d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8574:
URL: https://github.com/apache/hudi/pull/8574#issuecomment-1571996984

   
   ## CI report:
   
   * dacba722974aa32f506626c106d90fa86d22cd23 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17375)
 
   * f71ca7ad4339c60719c97f3d54339b6a7bd5205f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8487: [HUDI-6093] Use the correct partitionToReplacedFileIds during commit.

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8487:
URL: https://github.com/apache/hudi/pull/8487#issuecomment-1571996647

   
   ## CI report:
   
   * 280515ea1c939f0afa7a4cd8a5593e55bd394648 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17553)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] KnightChess commented on pull request #8856: [HUDI-6300] fix file size parallelism not work when init metadata table

2023-06-01 Thread via GitHub



KnightChess commented on PR #8856:
URL: https://github.com/apache/hudi/pull/8856#issuecomment-1571974348

   before:
   https://github.com/apache/hudi/assets/20125927/9dc94884-347f-4284-8c9c-58d38a6c936a;>
   
   after:
   https://github.com/apache/hudi/assets/20125927/12f6bc24-42fe-4ee4-83ef-929e117331a2;>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8868: [HUDI-6278] Fixed the use of DynamoDBLockConfig class

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8868:
URL: https://github.com/apache/hudi/pull/8868#issuecomment-1571922230

   
   ## CI report:
   
   * b6f86c770f7e35d7488cff0066d2d760453eb931 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17563)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-06-01 Thread via GitHub



hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571921986

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * e712d534d9c0a16b3027706ed394de88ff2b293d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17555)
 
   * 0328e76358dd170d62b94fd286a9ffb728516429 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17562)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 >

1 - 100 of 180 matches

Mail list logo