[GitHub] [hudi] hudi-bot commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1571396063

   
   ## CI report:
   
   * fa6a550cbe4c52a22314796d4926a34c04d0e6cc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17541)
 
   * 130523be1324218f56ce15ddc6ac3255e7cfcd9a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571390186

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * be4650d94edd3eca1a15a956b6a60e2b8ec5daca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17542)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] marsupialtail commented on pull request #8768: [HUDI-1407] Basic python reader for Hudi

2023-05-31 Thread via GitHub


marsupialtail commented on PR #8768:
URL: https://github.com/apache/hudi/pull/8768#issuecomment-1571389826

   Any word on when writers are coming?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


danny0405 commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571366756

   cc @xiarixiaoyao Do you have chance to take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


danny0405 commented on code in PR #8859:
URL: https://github.com/apache/hudi/pull/8859#discussion_r1212585549


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -111,8 +111,13 @@ trait ProvidesHoodieConfig extends Logging {
 val tableType = hoodieCatalogTable.tableTypeName
 val tableConfig = hoodieCatalogTable.tableConfig
 
-val combinedOpts: Map[String, String] = combineOptions(hoodieCatalogTable, 
tableConfig, sparkSession.sqlContext.conf,
+var combinedOpts: Map[String, String] = combineOptions(hoodieCatalogTable, 
tableConfig, sparkSession.sqlContext.conf,
   defaultOpts = Map.empty, overridingOpts = extraOptions)
+if 
(!combinedOpts.getOrElse(INSERT_DROP_DUPS.key(),INSERT_DROP_DUPS.defaultValue()).toBoolean
 &&

Review Comment:
   ```suggestion
   if (!combinedOpts.getOrElse(INSERT_DROP_DUPS.key(), 
INSERT_DROP_DUPS.defaultValue()).toBoolean &&
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8858: [HUDI-6301][UBER] Support SqlFileBasedSource for DeltaStreamer

2023-05-31 Thread via GitHub


danny0405 commented on code in PR #8858:
URL: https://github.com/apache/hudi/pull/8858#discussion_r1212583839


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlFileBasedSource.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Scanner;
+
+/**
+ * File-based SQL Source that uses SQL queries in a file to read from any 
table.
+ *
+ * SQL file path should be configured using this hoodie config:
+ *
+ * hoodie.deltastreamer.source.sql.file = 'hdfs://xxx/source.sql'
+ *
+ * File-based SQL Source is used for one time backfill scenarios, this 
won't update the deltastreamer.checkpoint.key
+ * to the processed commit, instead it will fetch the latest successful 
checkpoint key and set that value as
+ * this backfill commits checkpoint so that it won't interrupt the regular 
incremental processing.
+ *
+ * To fetch and use the latest incremental checkpoint, you need to also set 
this hoodie_conf for deltastremer jobs:
+ *
+ * hoodie.write.meta.key.prefixes = 'deltastreamer.checkpoint.key'
+ */
+public class SqlFileBasedSource extends RowSource {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(SqlFileBasedSource.class);
+  private final String sourceSqlFile;
+  private final boolean shouldEmitCheckPoint;
+
+  public SqlFileBasedSource(
+  TypedProperties props,
+  JavaSparkContext sparkContext,
+  SparkSession sparkSession,
+  SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+DataSourceUtils.checkRequiredProperties(
+props, 
Collections.singletonList(SqlFileBasedSource.Config.SOURCE_SQL_FILE));
+sourceSqlFile = props.getString(SqlFileBasedSource.Config.SOURCE_SQL_FILE);
+shouldEmitCheckPoint = props.getBoolean(Config.EMIT_EPOCH_CHECKPOINT, 
false);
+  }
+
+  @Override
+  protected Pair>, String> fetchNextBatch(
+  Option lastCkptStr, long sourceLimit) {
+Dataset rows = null;
+final FileSystem fs = FSUtils.getFs(sourceSqlFile, 
sparkContext.hadoopConfiguration(), true);
+try {
+  final Scanner scanner = new Scanner(fs.open(new Path(sourceSqlFile)));
+  scanner.useDelimiter(";");
+  while (scanner.hasNext()) {
+String sqlStr = scanner.next().trim();
+if (!sqlStr.isEmpty()) {
+  LOG.info(sqlStr);
+  // overwrite the same dataset object until the last statement then 
return.
+  rows = sparkSession.sql(sqlStr);
+}
+  }
+  return Pair.of(Option.of(rows), shouldEmitCheckPoint ? 
String.valueOf(System.currentTimeMillis()) : null);
+} catch (IOException ioe) {
+  throw new HoodieIOException("Error reading source SQL file.", ioe);
+}
+  }
+
+  /**
+   * Configs supported.
+   */
+  private static class Config {
+private static final String SOURCE_SQL_FILE = 
"hoodie.deltastreamer.source.sql.file";
+private static final String EMIT_EPOCH_CHECKPOINT = 
"hoodie.deltastreamer.source.sql.checkpoint.emit";

Review Comment:
   Should we make these keys public?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8858: [HUDI-6301][UBER] Support SqlFileBasedSource for DeltaStreamer

2023-05-31 Thread via GitHub


danny0405 commented on code in PR #8858:
URL: https://github.com/apache/hudi/pull/8858#discussion_r1212583636


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java:
##
@@ -2325,7 +2325,7 @@ public void 
testCsvDFSSourceNoHeaderWithoutSchemaProviderAndWithTransformer() th
   public void testCsvDFSSourceNoHeaderWithSchemaProviderAndTransformer() 
throws Exception {
 // The CSV files do not have header, the columns are separated by '\t'
 // File schema provider is used, transformer is applied
-// In this case, the source and target schema come from the Avro schema 
files
+// In this case, the source and target schema come from the Avro schema 
filesTestSqlFileBasedSour
 testCsvDFSSource(false, '\t', true, 
Collections.singletonList(TripsWithDistanceTransformer.class.getName()));

Review Comment:
   unnecessary change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8783: [HUDI-5569] Maintain commit timeline even in case of long standing inflights

2023-05-31 Thread via GitHub


danny0405 commented on code in PR #8783:
URL: https://github.com/apache/hudi/pull/8783#discussion_r1212579733


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -460,19 +465,12 @@ private Stream 
getCommitInstantsToArchive() throws IOException {
   return !(firstSavepoint.isPresent() && 
compareTimestamps(firstSavepoint.get().getTimestamp(), LESSER_THAN_OR_EQUALS, 
s.getTimestamp()));
 }
   }).filter(s -> {
-// Ensure commits >= the oldest pending compaction/replace commit 
is retained
-return oldestPendingCompactionAndReplaceInstant
+// oldestCommitToRetain is the highest completed commit instant 
that is less than the oldest inflight instant.
+// By filter out any commit >= oldestCommitToRetain, we can ensure 
there are no gaps in the timeline
+// when inflight commits are present.
+return oldestCommitToRetain

Review Comment:
   Yeah, it can fixes https://github.com/apache/hudi/pull/7738, although a 
little obscure to understand.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -460,19 +465,12 @@ private Stream 
getCommitInstantsToArchive() throws IOException {
   return !(firstSavepoint.isPresent() && 
compareTimestamps(firstSavepoint.get().getTimestamp(), LESSER_THAN_OR_EQUALS, 
s.getTimestamp()));
 }
   }).filter(s -> {
-// Ensure commits >= the oldest pending compaction/replace commit 
is retained
-return oldestPendingCompactionAndReplaceInstant
+// oldestCommitToRetain is the highest completed commit instant 
that is less than the oldest inflight instant.
+// By filter out any commit >= oldestCommitToRetain, we can ensure 
there are no gaps in the timeline
+// when inflight commits are present.
+return oldestCommitToRetain

Review Comment:
   Yeah, it can fix https://github.com/apache/hudi/pull/7738, although a little 
obscure to understand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #8437: [HUDI-6066] HoodieTableSource supports parquet predicate push down

2023-05-31 Thread via GitHub


SteNicholas commented on code in PR #8437:
URL: https://github.com/apache/hudi/pull/8437#discussion_r1212572127


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionPredicates.java:
##
@@ -0,0 +1,654 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.source;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.parquet.filter2.predicate.FilterPredicate;
+import org.apache.parquet.filter2.predicate.Operators;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import static org.apache.hudi.common.util.ValidationUtils.checkState;
+import static org.apache.hudi.util.ExpressionUtils.getValueFromLiteral;
+import static org.apache.parquet.filter2.predicate.FilterApi.and;
+import static org.apache.parquet.filter2.predicate.FilterApi.binaryColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.booleanColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.doubleColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.eq;
+import static org.apache.parquet.filter2.predicate.FilterApi.floatColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.gt;
+import static org.apache.parquet.filter2.predicate.FilterApi.gtEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.intColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.longColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.lt;
+import static org.apache.parquet.filter2.predicate.FilterApi.ltEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.not;
+import static org.apache.parquet.filter2.predicate.FilterApi.notEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.or;
+import static org.apache.parquet.io.api.Binary.fromConstantByteArray;
+import static org.apache.parquet.io.api.Binary.fromString;
+
+/**
+ * Tool to predicate the {@link 
org.apache.flink.table.expressions.ResolvedExpression}s.
+ */
+public class ExpressionPredicates {

Review Comment:
   @XuQianJin-Stars, why implement `ExpressionVisitor` interface? 
`ExpressionEvaluators` doesn't also implement this interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8859:
URL: https://github.com/apache/hudi/pull/8859#issuecomment-1571341385

   
   ## CI report:
   
   * 0781a5259d9432c5264f642e392b27ca5eadf6fd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17547)
 
   * 8f4d284eef389ff420b473b9a721212de46302da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17549)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #8437: [HUDI-6066] HoodieTableSource supports parquet predicate push down

2023-05-31 Thread via GitHub


SteNicholas commented on code in PR #8437:
URL: https://github.com/apache/hudi/pull/8437#discussion_r1212572127


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionPredicates.java:
##
@@ -0,0 +1,654 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.source;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.parquet.filter2.predicate.FilterPredicate;
+import org.apache.parquet.filter2.predicate.Operators;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import static org.apache.hudi.common.util.ValidationUtils.checkState;
+import static org.apache.hudi.util.ExpressionUtils.getValueFromLiteral;
+import static org.apache.parquet.filter2.predicate.FilterApi.and;
+import static org.apache.parquet.filter2.predicate.FilterApi.binaryColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.booleanColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.doubleColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.eq;
+import static org.apache.parquet.filter2.predicate.FilterApi.floatColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.gt;
+import static org.apache.parquet.filter2.predicate.FilterApi.gtEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.intColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.longColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.lt;
+import static org.apache.parquet.filter2.predicate.FilterApi.ltEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.not;
+import static org.apache.parquet.filter2.predicate.FilterApi.notEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.or;
+import static org.apache.parquet.io.api.Binary.fromConstantByteArray;
+import static org.apache.parquet.io.api.Binary.fromString;
+
+/**
+ * Tool to predicate the {@link 
org.apache.flink.table.expressions.ResolvedExpression}s.
+ */
+public class ExpressionPredicates {

Review Comment:
   @XuQianJin-Stars, why implement `ExpressionVisitor` interface? 
`ExpressionEvoluators` doesn't also implement this interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8452:
URL: https://github.com/apache/hudi/pull/8452#issuecomment-1571340159

   
   ## CI report:
   
   * 8082df232089396b2a9f9be2b915e51b3645f172 UNKNOWN
   * 55ece2d5ee0d752f48d7c33cd6b8235b5ffacac5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17540)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #8437: [HUDI-6066] HoodieTableSource supports parquet predicate push down

2023-05-31 Thread via GitHub


SteNicholas commented on code in PR #8437:
URL: https://github.com/apache/hudi/pull/8437#discussion_r1212572127


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionPredicates.java:
##
@@ -0,0 +1,654 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.source;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.parquet.filter2.predicate.FilterPredicate;
+import org.apache.parquet.filter2.predicate.Operators;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import static org.apache.hudi.common.util.ValidationUtils.checkState;
+import static org.apache.hudi.util.ExpressionUtils.getValueFromLiteral;
+import static org.apache.parquet.filter2.predicate.FilterApi.and;
+import static org.apache.parquet.filter2.predicate.FilterApi.binaryColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.booleanColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.doubleColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.eq;
+import static org.apache.parquet.filter2.predicate.FilterApi.floatColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.gt;
+import static org.apache.parquet.filter2.predicate.FilterApi.gtEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.intColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.longColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.lt;
+import static org.apache.parquet.filter2.predicate.FilterApi.ltEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.not;
+import static org.apache.parquet.filter2.predicate.FilterApi.notEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.or;
+import static org.apache.parquet.io.api.Binary.fromConstantByteArray;
+import static org.apache.parquet.io.api.Binary.fromString;
+
+/**
+ * Tool to predicate the {@link 
org.apache.flink.table.expressions.ResolvedExpression}s.
+ */
+public class ExpressionPredicates {

Review Comment:
   Why implement `ExpressionVisitor` interface?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8859:
URL: https://github.com/apache/hudi/pull/8859#issuecomment-1571335287

   
   ## CI report:
   
   * 641c66d64597d8d83f6234172c3d3dc2c8bc358e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17538)
 
   * 0781a5259d9432c5264f642e392b27ca5eadf6fd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17547)
 
   * 8f4d284eef389ff420b473b9a721212de46302da UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8792: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf…

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8792:
URL: https://github.com/apache/hudi/pull/8792#issuecomment-1571335091

   
   ## CI report:
   
   * f07a16ccf74a657ff312c5706d617fdaac23a752 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17546)
 
   * 683dc368e714ace1c44d741d642f1fe64b7910b2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17548)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1571330406

   
   ## CI report:
   
   * fa6a550cbe4c52a22314796d4926a34c04d0e6cc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8792: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf…

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8792:
URL: https://github.com/apache/hudi/pull/8792#issuecomment-1571330366

   
   ## CI report:
   
   * 14dd620516bf89c288bf437aec74e08ee3da27a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17454)
 
   * f07a16ccf74a657ff312c5706d617fdaac23a752 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17546)
 
   * 683dc368e714ace1c44d741d642f1fe64b7910b2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #8855: [SUPPORT][FLINK SQL] Can not insert join result into hudi table

2023-05-31 Thread via GitHub


danny0405 commented on issue #8855:
URL: https://github.com/apache/hudi/issues/8855#issuecomment-1571318663

   The stream writer has no outputs, but it would flush the records to Hudi 
table anyway. Does your job checkpoint proceeds normally?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6304) [UBER] Rollback enhancements

2023-05-31 Thread Danny Chen (Jira)
Danny Chen created HUDI-6304:


 Summary: [UBER] Rollback enhancements
 Key: HUDI-6304
 URL: https://issues.apache.org/jira/browse/HUDI-6304
 Project: Apache Hudi
  Issue Type: Improvement
  Components: writer-core
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on a diff in pull request #8860: Fix spark sql core flow

2023-05-31 Thread via GitHub


danny0405 commented on code in PR #8860:
URL: https://github.com/apache/hudi/pull/8860#discussion_r1212546857


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkSqlCoreFlow.scala:
##
@@ -46,18 +47,18 @@ class TestSparkSqlCoreFlow extends HoodieSparkSqlTestBase {
 
"COPY_ON_WRITE|false|false|org.apache.hudi.keygen.SimpleKeyGenerator|SIMPLE",
 
"COPY_ON_WRITE|true|false|org.apache.hudi.keygen.SimpleKeyGenerator|SIMPLE",
 
"COPY_ON_WRITE|true|true|org.apache.hudi.keygen.SimpleKeyGenerator|SIMPLE",
-
"COPY_ON_WRITE|false|false|org.apache.hudi.keygen.NonpartitionedKeyGenerator|GLOBAL_BLOOM",
-
"COPY_ON_WRITE|true|false|org.apache.hudi.keygen.NonpartitionedKeyGenerator|GLOBAL_BLOOM",
-
"COPY_ON_WRITE|true|true|org.apache.hudi.keygen.NonpartitionedKeyGenerator|GLOBAL_BLOOM",
+//
"COPY_ON_WRITE|false|false|org.apache.hudi.keygen.NonpartitionedKeyGenerator|GLOBAL_BLOOM",
+//
"COPY_ON_WRITE|true|false|org.apache.hudi.keygen.NonpartitionedKeyGenerator|GLOBAL_BLOOM",

Review Comment:
   can we just remove the comment?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8830: [MINOR] auto generate init client id

2023-05-31 Thread via GitHub


danny0405 commented on code in PR #8830:
URL: https://github.com/apache/hudi/pull/8830#discussion_r1212541100


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/configuration/TestOptionsInference.java:
##
@@ -69,6 +70,12 @@ void testSetupClientId() throws Exception {
 }
   }
 
+  @Test
+  void testAutoGenerateClient() {
+  Configuration conf = getConf();
+  OptionsInference.setupClientId(conf);
+  assertNotNull(conf.getString(FlinkOptions.WRITE_CLIENT_ID), "auto 
generate client failed!");
+  }

Review Comment:
   Can you explain again why we need this fix, when a Flink job was submitted, 
it is assigned a client id automically, this id is hold by this job with a 
hearbeat, when another job was submitted, it saw the heartbeat and would create 
a new client id, is this what you saw in you case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8859:
URL: https://github.com/apache/hudi/pull/8859#issuecomment-1571301873

   
   ## CI report:
   
   * 641c66d64597d8d83f6234172c3d3dc2c8bc358e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17538)
 
   * 0781a5259d9432c5264f642e392b27ca5eadf6fd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17547)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8792: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf…

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8792:
URL: https://github.com/apache/hudi/pull/8792#issuecomment-1571301640

   
   ## CI report:
   
   * 14dd620516bf89c288bf437aec74e08ee3da27a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17454)
 
   * f07a16ccf74a657ff312c5706d617fdaac23a752 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17546)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8861: [HUDI-6303] Bump flink version to 1.16.2 and 1.17.1

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8861:
URL: https://github.com/apache/hudi/pull/8861#issuecomment-1571296759

   
   ## CI report:
   
   * 611580e4214c5a3cc1922f06fb820b5fc3a7355e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17545)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8859:
URL: https://github.com/apache/hudi/pull/8859#issuecomment-1571296743

   
   ## CI report:
   
   * 641c66d64597d8d83f6234172c3d3dc2c8bc358e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17538)
 
   * 0781a5259d9432c5264f642e392b27ca5eadf6fd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8792: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf…

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8792:
URL: https://github.com/apache/hudi/pull/8792#issuecomment-1571296608

   
   ## CI report:
   
   * 14dd620516bf89c288bf437aec74e08ee3da27a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17454)
 
   * f07a16ccf74a657ff312c5706d617fdaac23a752 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8861: [HUDI-6303] Bump flink version to 1.16.2 and 1.17.1

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8861:
URL: https://github.com/apache/hudi/pull/8861#issuecomment-1571292341

   
   ## CI report:
   
   * 611580e4214c5a3cc1922f06fb820b5fc3a7355e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8840: [HUDI-5352] Fix `LocalDate` serialization in colstats

2023-05-31 Thread via GitHub


danny0405 commented on code in PR #8840:
URL: https://github.com/apache/hudi/pull/8840#discussion_r1212530303


##
hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java:
##
@@ -20,41 +20,74 @@
 package org.apache.hudi.common.util;
 
 import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.util.Lazy;
 
 import com.fasterxml.jackson.annotation.JsonAutoDetect;
 import com.fasterxml.jackson.annotation.PropertyAccessor;
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.fasterxml.jackson.databind.DeserializationFeature;
 import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.SerializationFeature;
+import com.fasterxml.jackson.databind.util.StdDateFormat;
+import com.fasterxml.jackson.datatype.jsr310.JavaTimeModule;
 
 /**
  * Utils for JSON serialization and deserialization.
  */
 public class JsonUtils {
 
-  private static final ObjectMapper MAPPER = new ObjectMapper();
-
-  static {
-MAPPER.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
-// We need to exclude custom getters, setters and creators which can use 
member fields
-// to derive new fields, so that they are not included in the serialization
-MAPPER.setVisibility(PropertyAccessor.FIELD, 
JsonAutoDetect.Visibility.ANY);
-MAPPER.setVisibility(PropertyAccessor.GETTER, 
JsonAutoDetect.Visibility.NONE);
-MAPPER.setVisibility(PropertyAccessor.IS_GETTER, 
JsonAutoDetect.Visibility.NONE);
-MAPPER.setVisibility(PropertyAccessor.SETTER, 
JsonAutoDetect.Visibility.NONE);
-MAPPER.setVisibility(PropertyAccessor.CREATOR, 
JsonAutoDetect.Visibility.NONE);
-  }
+  private static final Lazy MAPPER = 
Lazy.lazily(JsonUtils::instantiateObjectMapper);
 
   public static ObjectMapper getObjectMapper() {
-return MAPPER;
+return MAPPER.get();
   }
 
   public static String toString(Object value) {
 try {
-  return MAPPER.writeValueAsString(value);
+  return MAPPER.get().writeValueAsString(value);
 } catch (JsonProcessingException e) {
   throw new HoodieIOException(
   "Fail to convert the class: " + value.getClass().getName() + " to 
Json String", e);
 }
   }
+
+  private static ObjectMapper instantiateObjectMapper() {
+ObjectMapper mapper = new ObjectMapper();
+
+registerModules(mapper);
+
+// We're writing out dates as their string representations instead of 
(int) timestamps
+mapper.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);
+// NOTE: This is necessary to make sure that w/ Jackson >= 2.11 colon is 
not infixed
+//   into the timezone value ("+00:00" as opposed to "+" before 
2.11)
+//   While Jackson is able to parse both of these formats, we keep it 
as false
+//   to make sure metadata produced by Hudi stays consistent across 
Jackson versions
+configureColonInTimezone(mapper);

Review Comment:
   Yeah, wondering when we need to serialize the column stats to json string?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6293) Make HoodieFlinkCompactor's parallelism of compact_task more reasonable.

2023-05-31 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6293.

Resolution: Fixed

Fixed via master branch: 00d50e91abe24aba31daa2fe2806de5414f03c77

> Make HoodieFlinkCompactor's  parallelism of compact_task more reasonable.
> -
>
> Key: HUDI-6293
> URL: https://issues.apache.org/jira/browse/HUDI-6293
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-05-31-16-41-02-798.png
>
>
> !image-2023-05-31-16-41-02-798.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6293) Make HoodieFlinkCompactor's parallelism of compact_task more reasonable.

2023-05-31 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6293:
-
Fix Version/s: 0.14.0

> Make HoodieFlinkCompactor's  parallelism of compact_task more reasonable.
> -
>
> Key: HUDI-6293
> URL: https://issues.apache.org/jira/browse/HUDI-6293
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
> Attachments: image-2023-05-31-16-41-02-798.png
>
>
> !image-2023-05-31-16-41-02-798.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-6293] Make HoodieFlinkCompactor's parallelism of compact_task more reasonable (#8854)

2023-05-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 00d50e91abe [HUDI-6293] Make HoodieFlinkCompactor's  parallelism of 
compact_task more reasonable (#8854)
00d50e91abe is described below

commit 00d50e91abe24aba31daa2fe2806de5414f03c77
Author: Dongsj <90449228+eric9...@users.noreply.github.com>
AuthorDate: Thu Jun 1 11:38:27 2023 +0800

[HUDI-6293] Make HoodieFlinkCompactor's  parallelism of compact_task more 
reasonable (#8854)

Co-authored-by: dongsj 
---
 .../java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java
 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java
index 63dfd26c4ac..e396897dc7e 100644
--- 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java
+++ 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java
@@ -274,10 +274,12 @@ public class HoodieFlinkCompactor {
 
   List instants = 
compactionInstantTimes.stream().map(HoodieTimeline::getCompactionRequestedInstant).collect(Collectors.toList());
 
+  int totalOperations = 
Math.toIntExact(compactionPlans.stream().mapToLong(pair -> 
pair.getRight().getOperations().size()).sum());
+
   // get compactionParallelism.
   int compactionParallelism = 
conf.getInteger(FlinkOptions.COMPACTION_TASKS) == -1
-  ? Math.toIntExact(compactionPlans.stream().mapToLong(pair -> 
pair.getRight().getOperations().size()).sum())
-  : conf.getInteger(FlinkOptions.COMPACTION_TASKS);
+  ? totalOperations
+  : Math.min(conf.getInteger(FlinkOptions.COMPACTION_TASKS), 
totalOperations);
 
   LOG.info("Start to compaction for instant " + compactionInstantTimes);
 



[GitHub] [hudi] danny0405 merged pull request #8854: [HUDI-6293]Make HoodieFlinkCompactor's parallelism of compact_task m…

2023-05-31 Thread via GitHub


danny0405 merged PR #8854:
URL: https://github.com/apache/hudi/pull/8854


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8792: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf…

2023-05-31 Thread via GitHub


danny0405 commented on code in PR #8792:
URL: https://github.com/apache/hudi/pull/8792#discussion_r1212527300


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##
@@ -93,7 +93,7 @@ public static HoodieWriteConfig createMetadataWriteConfig(
 .withCleanerParallelism(parallelism)
 .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
 .withFailedWritesCleaningPolicy(failedWritesCleaningPolicy)
-.retainCommits(DEFAULT_METADATA_CLEANER_COMMITS_RETAINED)
+
.retainCommits(Math.min(writeConfig.getCleanerCommitsRetained(),DEFAULT_METADATA_CLEANER_COMMITS_RETAINED))

Review Comment:
   ```suggestion
   .retainCommits(Math.min(writeConfig.getCleanerCommitsRetained(), 
DEFAULT_METADATA_CLEANER_COMMITS_RETAINED))
   ```



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBase.java:
##
@@ -400,7 +400,7 @@ protected HoodieWriteConfig 
getMetadataWriteConfig(HoodieWriteConfig writeConfig
 .withCleanerParallelism(parallelism)
 .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
 
.withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.LAZY)
-.retainCommits(DEFAULT_METADATA_CLEANER_COMMITS_RETAINED)
+
.retainCommits(Math.min(writeConfig.getCleanerCommitsRetained(),DEFAULT_METADATA_CLEANER_COMMITS_RETAINED))

Review Comment:
   ```suggestion
   .retainCommits(Math.min(writeConfig.getCleanerCommitsRetained(), 
DEFAULT_METADATA_CLEANER_COMMITS_RETAINED))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #8740: [HUDI-6231] Handle glue comments

2023-05-31 Thread via GitHub


danny0405 commented on PR #8740:
URL: https://github.com/apache/hudi/pull/8740#issuecomment-1571277471

   @parisni Hi, do we have plan to push-forward this feature?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #8857: [SUPPORT] Column comments not syncing to AWS Glue Catalog

2023-05-31 Thread via GitHub


danny0405 commented on issue #8857:
URL: https://github.com/apache/hudi/issues/8857#issuecomment-1571274339

   Guess this is what you needed: https://github.com/apache/hudi/pull/8740/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #8437: [HUDI-6066] HoodieTableSource supports parquet predicate push down

2023-05-31 Thread via GitHub


XuQianJin-Stars commented on code in PR #8437:
URL: https://github.com/apache/hudi/pull/8437#discussion_r1212523550


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionPredicates.java:
##
@@ -0,0 +1,654 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.source;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.parquet.filter2.predicate.FilterPredicate;
+import org.apache.parquet.filter2.predicate.Operators;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import static org.apache.hudi.common.util.ValidationUtils.checkState;
+import static org.apache.hudi.util.ExpressionUtils.getValueFromLiteral;
+import static org.apache.parquet.filter2.predicate.FilterApi.and;
+import static org.apache.parquet.filter2.predicate.FilterApi.binaryColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.booleanColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.doubleColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.eq;
+import static org.apache.parquet.filter2.predicate.FilterApi.floatColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.gt;
+import static org.apache.parquet.filter2.predicate.FilterApi.gtEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.intColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.longColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.lt;
+import static org.apache.parquet.filter2.predicate.FilterApi.ltEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.not;
+import static org.apache.parquet.filter2.predicate.FilterApi.notEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.or;
+import static org.apache.parquet.io.api.Binary.fromConstantByteArray;
+import static org.apache.parquet.io.api.Binary.fromString;
+
+/**
+ * Tool to predicate the {@link 
org.apache.flink.table.expressions.ResolvedExpression}s.
+ */
+public class ExpressionPredicates {

Review Comment:
   ```implements ExpressionVisitor``` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #8437: [HUDI-6066] HoodieTableSource supports parquet predicate push down

2023-05-31 Thread via GitHub


XuQianJin-Stars commented on code in PR #8437:
URL: https://github.com/apache/hudi/pull/8437#discussion_r1212523550


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionPredicates.java:
##
@@ -0,0 +1,654 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.source;
+
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ResolvedExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.functions.FunctionDefinition;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.parquet.filter2.predicate.FilterPredicate;
+import org.apache.parquet.filter2.predicate.Operators;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import static org.apache.hudi.common.util.ValidationUtils.checkState;
+import static org.apache.hudi.util.ExpressionUtils.getValueFromLiteral;
+import static org.apache.parquet.filter2.predicate.FilterApi.and;
+import static org.apache.parquet.filter2.predicate.FilterApi.binaryColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.booleanColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.doubleColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.eq;
+import static org.apache.parquet.filter2.predicate.FilterApi.floatColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.gt;
+import static org.apache.parquet.filter2.predicate.FilterApi.gtEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.intColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.longColumn;
+import static org.apache.parquet.filter2.predicate.FilterApi.lt;
+import static org.apache.parquet.filter2.predicate.FilterApi.ltEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.not;
+import static org.apache.parquet.filter2.predicate.FilterApi.notEq;
+import static org.apache.parquet.filter2.predicate.FilterApi.or;
+import static org.apache.parquet.io.api.Binary.fromConstantByteArray;
+import static org.apache.parquet.io.api.Binary.fromString;
+
+/**
+ * Tool to predicate the {@link 
org.apache.flink.table.expressions.ResolvedExpression}s.
+ */
+public class ExpressionPredicates {

Review Comment:
   implements ExpressionVisitor ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on pull request #8076: [HUDI-5884] Support bulk_insert for insert_overwrite and insert_overwrite_table

2023-05-31 Thread via GitHub


codope commented on PR #8076:
URL: https://github.com/apache/hudi/pull/8076#issuecomment-1571270237

   Looks good to me. @yihua @nsivabalan If you can take one pass, that would be 
great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-6286) Overwrite mode should not delete old data

2023-05-31 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-6286:
-

Assignee: Hui An  (was: Sagar Sumit)

> Overwrite mode should not delete old data
> -
>
> Key: HUDI-6286
> URL: https://issues.apache.org/jira/browse/HUDI-6286
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, writer-core
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
> Fix For: 0.14.0
>
>
> https://github.com/apache/hudi/pull/8076/files#r1127283648
> For *Overwrite* mode, we should not delete the basePath. Just overwrite the 
> existing data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6286) Overwrite mode should not delete old data

2023-05-31 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6286:
--
Fix Version/s: 0.14.0

> Overwrite mode should not delete old data
> -
>
> Key: HUDI-6286
> URL: https://issues.apache.org/jira/browse/HUDI-6286
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, writer-core
>Reporter: Hui An
>Priority: Major
> Fix For: 0.14.0
>
>
> https://github.com/apache/hudi/pull/8076/files#r1127283648
> For *Overwrite* mode, we should not delete the basePath. Just overwrite the 
> existing data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6286) Overwrite mode should not delete old data

2023-05-31 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-6286:
-

Assignee: Sagar Sumit

> Overwrite mode should not delete old data
> -
>
> Key: HUDI-6286
> URL: https://issues.apache.org/jira/browse/HUDI-6286
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, writer-core
>Reporter: Hui An
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 0.14.0
>
>
> https://github.com/apache/hudi/pull/8076/files#r1127283648
> For *Overwrite* mode, we should not delete the basePath. Just overwrite the 
> existing data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6290) Fix Flink MDT compaction strategy

2023-05-31 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6290.

Resolution: Fixed

Fixed via master branch: 399b46c494252f16a4efcd9f19ccfa66b488b563

> Fix Flink MDT compaction strategy
> -
>
> Key: HUDI-6290
> URL: https://issues.apache.org/jira/browse/HUDI-6290
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (6671d1a2460 -> 399b46c4942)

2023-05-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 6671d1a2460 [HUDI-6294] Fix log classname in 
SqlQuerySingleResultPreCommitValidator (#8853)
 add 399b46c4942 [HUDI-6290] Fix Flink MDT compaction strategy (#8850)

No new revisions were added by this update.

Summary of changes:
 .../metadata/HoodieBackedTableMetadataWriter.java  | 51 ++
 .../FlinkHoodieBackedTableMetadataWriter.java  |  8 
 .../apache/hudi/table/ITTestHoodieDataSource.java  |  8 ++--
 3 files changed, 43 insertions(+), 24 deletions(-)



[GitHub] [hudi] danny0405 merged pull request #8850: [HUDI-6290] Fix Flink MDT compaction strategy

2023-05-31 Thread via GitHub


danny0405 merged PR #8850:
URL: https://github.com/apache/hudi/pull/8850


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] flashJd commented on a diff in pull request #8792: [HUDI-6256] Fix the data table archiving and MDT cleaning config conf…

2023-05-31 Thread via GitHub


flashJd commented on code in PR #8792:
URL: https://github.com/apache/hudi/pull/8792#discussion_r1212516124


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestStreamWriteOperatorCoordinator.java:
##
@@ -247,34 +247,16 @@ void testSyncMetadataTable() throws Exception {
 assertThat("One instant need to sync to metadata table", 
completedTimeline.countInstants(), is(7));
 assertThat(completedTimeline.nthFromLastInstant(1).get().getTimestamp(), 
is(instant + "001"));
 assertThat(completedTimeline.nthFromLastInstant(1).get().getAction(), 
is(HoodieTimeline.COMMIT_ACTION));
-// write another 2 commits
-for (int i = 7; i < 8; i++) {
+
+// write another 27 commits to trigger the first clean
+for (int i = 7; i < 34; i++) {
   instant = mockWriteWithMetadata();
-  metadataTableMetaClient.reloadActiveTimeline();
-  completedTimeline = 
metadataTableMetaClient.getActiveTimeline().filterCompletedInstants();
-  assertThat("One instant need to sync to metadata table", 
completedTimeline.countInstants(), is(i + 1));

Review Comment:
   That sounds reasonable,I've modified it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on pull request #8437: [HUDI-6066] HoodieTableSource supports parquet predicate push down

2023-05-31 Thread via GitHub


SteNicholas commented on PR #8437:
URL: https://github.com/apache/hudi/pull/8437#issuecomment-1571256809

   @danny0405, I have addressed comments. Please help to review again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571254989

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * 720b8aec68410a1e48e57fc540b1885bee6a7423 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17535)
 
   * be4650d94edd3eca1a15a956b6a60e2b8ec5daca Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17542)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8830: [MINOR] auto generate init client id

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8830:
URL: https://github.com/apache/hudi/pull/8830#issuecomment-1571254881

   
   ## CI report:
   
   * 065791647cb669b5d18c4c1f4e9dfcf9d4e0c9c1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17537)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


SteNicholas commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571254679

   @xiarixiaoyao, @yihua, could you help to review this pull request?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6303) Bump flink version to 1.16.2 and 1.17.1

2023-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6303:
-
Labels: pull-request-available  (was: )

> Bump flink version to 1.16.2 and 1.17.1
> ---
>
> Key: HUDI-6303
> URL: https://issues.apache.org/jira/browse/HUDI-6303
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Weijie Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] reswqa opened a new pull request, #8861: [HUDI-6303] Bump flink version to 1.16.2 and 1.17.1

2023-05-31 Thread via GitHub


reswqa opened a new pull request, #8861:
URL: https://github.com/apache/hudi/pull/8861

   ### Change Logs
   
   Bump flink version to 1.16.2 and 1.17.1
   
   ### Impact
   
   Expected no impact
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-05-31 Thread via GitHub


nsivabalan commented on code in PR #8837:
URL: https://github.com/apache/hudi/pull/8837#discussion_r1212500637


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -837,10 +840,75 @@ public void update(HoodieCleanMetadata cleanMetadata, 
String instantTime) {
*/
   @Override
   public void update(HoodieRestoreMetadata restoreMetadata, String 
instantTime) {
-processAndCommit(instantTime, () -> 
HoodieTableMetadataUtil.convertMetadataToRecords(engineContext,
-metadataMetaClient.getActiveTimeline(), restoreMetadata, 
getRecordsGenerationParams(), instantTime,
-metadata.getSyncedInstantTime()), false);
-closeInternal();
+dataMetaClient.reloadActiveTimeline();
+
+// Since the restore has completed on the dataset, the latest write 
timeline instant is the one to which the
+// restore was performed. This should be always present.
+final String restoreToInstantTime = 
dataMetaClient.getActiveTimeline().getWriteTimeline()
+.getReverseOrderedInstants().findFirst().get().getTimestamp();
+
+// We cannot restore to before the oldest compaction on MDT as we don't 
have the basefiles before that time.
+Option lastCompaction = 
metadataMetaClient.getCommitTimeline().filterCompletedInstants().lastInstant();

Review Comment:
   shouldn't this logic go into restore method in BaseHoodieWriteClient where 
we trigger the restore for data table. so, this is double/repeated validation 
right? 



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -669,32 +669,51 @@ public void restoreToSavepoint() {
* @param savepointTime Savepoint time to rollback to
*/
   public void restoreToSavepoint(String savepointTime) {
-boolean initialMetadataTableIfNecessary = config.isMetadataTableEnabled();
-if (initialMetadataTableIfNecessary) {
+boolean initializeMetadataTableIfNecessary = 
config.isMetadataTableEnabled();
+if (initializeMetadataTableIfNecessary) {
   try {
-// Delete metadata table directly when users trigger savepoint 
rollback if mdt existed and beforeTimelineStarts
+// Delete metadata table directly when users trigger savepoint 
rollback if mdt existed and if the savePointTime is beforeTimelineStarts
+// or before the oldest compaction on MDT.
+// We cannot restore to before the oldest compaction on MDT as we 
don't have the basefiles before that time.
 String metadataTableBasePathStr = 
HoodieTableMetadata.getMetadataTableBasePath(config.getBasePath());
 HoodieTableMetaClient mdtClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataTableBasePathStr).build();
-// Same as HoodieTableMetadataUtil#processRollbackMetadata
+Option lastCompaction = 
mdtClient.getCommitTimeline().filterCompletedInstants().lastInstant();

Review Comment:
   rename to "latestMdtCompaction"



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -837,10 +840,75 @@ public void update(HoodieCleanMetadata cleanMetadata, 
String instantTime) {
*/
   @Override
   public void update(HoodieRestoreMetadata restoreMetadata, String 
instantTime) {
-processAndCommit(instantTime, () -> 
HoodieTableMetadataUtil.convertMetadataToRecords(engineContext,
-metadataMetaClient.getActiveTimeline(), restoreMetadata, 
getRecordsGenerationParams(), instantTime,
-metadata.getSyncedInstantTime()), false);
-closeInternal();
+dataMetaClient.reloadActiveTimeline();
+
+// Since the restore has completed on the dataset, the latest write 
timeline instant is the one to which the
+// restore was performed. This should be always present.
+final String restoreToInstantTime = 
dataMetaClient.getActiveTimeline().getWriteTimeline()
+.getReverseOrderedInstants().findFirst().get().getTimestamp();
+
+// We cannot restore to before the oldest compaction on MDT as we don't 
have the basefiles before that time.
+Option lastCompaction = 
metadataMetaClient.getCommitTimeline().filterCompletedInstants().lastInstant();
+if (lastCompaction.isPresent()) {
+  if (HoodieTimeline.LESSER_THAN_OR_EQUALS.test(restoreToInstantTime, 
lastCompaction.get().getTimestamp())) {
+String msg = String.format("Cannot restore MDT to %s because it is 
older than latest compaction at %s", restoreToInstantTime,
+lastCompaction.get().getTimestamp()) + ". Please delete MDT and 
restore again";
+LOG.error(msg);
+throw new HoodieMetadataException(msg);
+  }
+}
+
+// Restore requires the existing pipelines to be shutdown. So we can 
safely scan the dataset to find the current
+// list of files in the filesystem.
+List dirInfoList = listAllPartitions(dataMetaClient);
+Map dirInfoMap = 

[jira] [Created] (HUDI-6303) Bump flink version to 1.16.2 and 1.17.1

2023-05-31 Thread Weijie Guo (Jira)
Weijie Guo created HUDI-6303:


 Summary: Bump flink version to 1.16.2 and 1.17.1
 Key: HUDI-6303
 URL: https://issues.apache.org/jira/browse/HUDI-6303
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571247655

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * 720b8aec68410a1e48e57fc540b1885bee6a7423 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17535)
 
   * be4650d94edd3eca1a15a956b6a60e2b8ec5daca UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8859:
URL: https://github.com/apache/hudi/pull/8859#issuecomment-1571242238

   
   ## CI report:
   
   * 641c66d64597d8d83f6234172c3d3dc2c8bc358e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17538)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs commented on a diff in pull request #7304: [HUDI-5278] Support more conf to cluster procedure

2023-05-31 Thread via GitHub


boneanxs commented on code in PR #7304:
URL: https://github.com/apache/hudi/pull/7304#discussion_r1212496811


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala:
##
@@ -385,4 +395,261 @@ class TestClusteringProcedure extends 
HoodieSparkProcedureTestBase {
   }
 }
   }
+
+  test("Test Call run_clustering Procedure with specific instants") {
+withTempDir { tmp =>
+  val tableName = generateTableName
+  val basePath = s"${tmp.getCanonicalPath}/$tableName"
+
+  spark.sql(
+s"""
+   |create table $tableName (
+   |  c1 int,
+   |  c2 string,
+   |  c3 double
+   |) using hudi
+   | options (
+   |  primaryKey = 'c1',
+   |  type = 'cow',
+   |  hoodie.metadata.enable = 'true',
+   |  hoodie.metadata.index.column.stats.enable = 'true',
+   |  hoodie.enable.data.skipping = 'true',
+   |  hoodie.datasource.write.operation = 'insert'
+   | )
+   | location '$basePath'
+ """.stripMargin)
+
+  writeRecords(2, 4, 0, basePath, Map("hoodie.avro.schema.validate" -> 
"false"))
+  spark.sql(s"call run_clustering(table => '$tableName', op => 
'schedule')")
+
+  writeRecords(2, 4, 0, basePath, Map("hoodie.avro.schema.validate" -> 
"false"))
+  spark.sql(s"call run_clustering(table => '$tableName', op => 
'schedule')")
+
+  val conf = new Configuration
+  val metaClient = 
HoodieTableMetaClient.builder.setConf(conf).setBasePath(basePath).build
+  val instants = 
metaClient.getActiveTimeline.filterPendingReplaceTimeline().getInstants.iterator().asScala.map(_.getTimestamp).toSeq
+  assert(2 == instants.size)
+
+  checkExceptionContain(
+s"call run_clustering(table => '$tableName', instants => '00, 
${instants.head}')"
+  )("specific 00 instants is not exist")
+  metaClient.reloadActiveTimeline()
+  assert(0 == 
metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.size())
+  assert(2 == 
metaClient.getActiveTimeline.filterPendingReplaceTimeline.getInstants.size())
+
+  writeRecords(2, 4, 0, basePath, Map("hoodie.avro.schema.validate" -> 
"false"))
+  // specific instants will not schedule new cluster plan
+  spark.sql(s"call run_clustering(table => '$tableName', instants => 
'${instants.mkString(",")}')")
+  metaClient.reloadActiveTimeline()
+  assert(2 == 
metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.size())
+  assert(0 == 
metaClient.getActiveTimeline.filterPendingReplaceTimeline.getInstants.size())
+
+  // test with operator schedule
+  checkExceptionContain(
+  s"call run_clustering(table => '$tableName', instants => '00', op => 
'schedule')"
+  )("specific instants only can be used in 'execute' op or not specific 
op")
+
+  // test with operator scheduleAndExecute
+  checkExceptionContain(
+s"call run_clustering(table => '$tableName', instants => '00', op 
=> 'scheduleAndExecute')"
+  )("specific instants only can be used in 'execute' op or not specific 
op")
+
+  // test with operator execute
+  spark.sql(s"call run_clustering(table => '$tableName', op => 
'schedule')")
+  metaClient.reloadActiveTimeline()
+  val instants2 = 
metaClient.getActiveTimeline.filterPendingReplaceTimeline().getInstants.iterator().asScala.map(_.getTimestamp).toSeq
+  spark.sql(s"call run_clustering(table => '$tableName', instants => 
'${instants2.mkString(",")}', op => 'execute')")
+  metaClient.reloadActiveTimeline()
+  assert(3 == 
metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.size())
+  assert(0 == 
metaClient.getActiveTimeline.filterPendingReplaceTimeline.getInstants.size())
+}
+  }
+
+  test("Test Call run_clustering Procedure op") {
+withTempDir { tmp =>
+  val tableName = generateTableName
+  val basePath = s"${tmp.getCanonicalPath}/$tableName"
+
+  spark.sql(
+s"""
+   |create table $tableName (
+   |  c1 int,
+   |  c2 string,
+   |  c3 double
+   |) using hudi
+   | options (
+   |  primaryKey = 'c1',
+   |  type = 'cow',
+   |  hoodie.metadata.enable = 'true',
+   |  hoodie.metadata.index.column.stats.enable = 'true',
+   |  hoodie.enable.data.skipping = 'true',
+   |  hoodie.datasource.write.operation = 'insert'
+   | )
+   | location '$basePath'
+ """.stripMargin)
+
+  writeRecords(2, 4, 0, basePath, Map("hoodie.avro.schema.validate"-> 
"false"))
+  val conf = new Configuration
+  val metaClient = 
HoodieTableMetaClient.builder.setConf(conf).setBasePath(basePath).build
+  assert(0 == 
metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.size())
+  

[GitHub] [hudi] hudi-bot commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1571212001

   
   ## CI report:
   
   * 2291d56660efdbace36ec3e52a05a41ab236fff0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17501)
 
   * fa6a550cbe4c52a22314796d4926a34c04d0e6cc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8452:
URL: https://github.com/apache/hudi/pull/8452#issuecomment-1571210983

   
   ## CI report:
   
   * 8082df232089396b2a9f9be2b915e51b3645f172 UNKNOWN
   * 5d3bc78d0eab3a5a1b72ba54a0577d36bee7845b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17511)
 
   * 55ece2d5ee0d752f48d7c33cd6b8235b5ffacac5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17540)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1571205582

   
   ## CI report:
   
   * 2291d56660efdbace36ec3e52a05a41ab236fff0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17501)
 
   * fa6a550cbe4c52a22314796d4926a34c04d0e6cc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8452:
URL: https://github.com/apache/hudi/pull/8452#issuecomment-1571205102

   
   ## CI report:
   
   * 8082df232089396b2a9f9be2b915e51b3645f172 UNKNOWN
   * 5d3bc78d0eab3a5a1b72ba54a0577d36bee7845b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17511)
 
   * 55ece2d5ee0d752f48d7c33cd6b8235b5ffacac5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8860: Fix spark sql core flow

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8860:
URL: https://github.com/apache/hudi/pull/8860#issuecomment-1571201295

   
   ## CI report:
   
   * 7dfdce4528315764dd140b5e186f1b6052c9be43 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17539)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571201237

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * 720b8aec68410a1e48e57fc540b1885bee6a7423 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17535)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8847:
URL: https://github.com/apache/hudi/pull/8847#issuecomment-1571201140

   
   ## CI report:
   
   * fe991dc492e5bec19b4bfd91dc0b210e6b152b7a UNKNOWN
   * 19630df001d430cdf4ec6413c6f025e48bb84bb3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17533)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on a diff in pull request #8856: [HUDI-6300] fix file size parallelism not work when init metadata table

2023-05-31 Thread via GitHub


KnightChess commented on code in PR #8856:
URL: https://github.com/apache/hudi/pull/8856#discussion_r1212457103


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##
@@ -842,7 +842,8 @@ public static HoodieData 
convertFilesToBloomFilterRecords(HoodieEn
 
 List>> partitionToDeletedFilesList = 
partitionToDeletedFiles.entrySet()
 .stream().map(e -> Pair.of(e.getKey(), 
e.getValue())).collect(Collectors.toList());
-int parallelism = Math.max(Math.min(partitionToDeletedFilesList.size(), 
recordsGenerationParams.getBloomIndexParallelism()), 1);
+int totalDeleteFileSize = partitionToDeletedFilesList.stream().mapToInt(p 
-> p.getValue().size()).sum();
+int parallelism = Math.max(Math.min(totalDeleteFileSize, 
recordsGenerationParams.getBloomIndexParallelism()), 1);

Review Comment:
   oh, I got it. I mistakenly think that the later processing logic is the same 
as the update, I will try fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8860: Fix spark sql core flow

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8860:
URL: https://github.com/apache/hudi/pull/8860#issuecomment-1571161421

   
   ## CI report:
   
   * 7dfdce4528315764dd140b5e186f1b6052c9be43 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17539)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8859:
URL: https://github.com/apache/hudi/pull/8859#issuecomment-1571161385

   
   ## CI report:
   
   * 641c66d64597d8d83f6234172c3d3dc2c8bc358e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17538)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8830: [MINOR] auto generate init client id

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8830:
URL: https://github.com/apache/hudi/pull/8830#issuecomment-1571161237

   
   ## CI report:
   
   * b17cd27589692f3d8cc37865eaa7c75b12ec7141 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17516)
 
   * 065791647cb669b5d18c4c1f4e9dfcf9d4e0c9c1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17537)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #8849: [UBER] Rollback enhancements

2023-05-31 Thread via GitHub


nsivabalan commented on code in PR #8849:
URL: https://github.com/apache/hudi/pull/8849#discussion_r1212454576


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java:
##
@@ -240,22 +244,21 @@ protected void finishRollback(HoodieInstant 
inflightInstant, HoodieRollbackMetad
 this.txnManager.beginTransaction(Option.of(inflightInstant), 
Option.empty());
   }
 
-  // If publish the rollback to the timeline, we first write the rollback 
metadata
-  // to metadata table
+  // If publish the rollback to the timeline, we first write the rollback 
metadata to metadata table
+  // Then transition the inflight rollback to completed state.
   if (!skipTimelinePublish) {
 writeTableMetadata(rollbackMetadata);
-  }
-
-  // Then we delete the inflight instant in the data table timeline if 
enabled
-  deleteInflightAndRequestedInstant(deleteInstants, 
table.getActiveTimeline(), resolvedInstant);
-
-  // If publish the rollback to the timeline, we finally transition the 
inflight rollback
-  // to complete in the data table timeline
-  if (!skipTimelinePublish) {
 
table.getActiveTimeline().transitionRollbackInflightToComplete(inflightInstant,
 TimelineMetadataUtils.serializeRollbackMetadata(rollbackMetadata));
 LOG.info("Rollback of Commits " + 
rollbackMetadata.getCommitsRollback() + " is complete");
   }
+
+  // Commit to rollback instant files are deleted after the rollback 
commit is transitioned from inflight to completed
+  // If job were to fail after transitioning rollback from inflight to 
complete and before delete the instant files,
+  // then subsequent retries of the rollback for this instant will see if 
there is a completed rollback present for this instant
+  // and then directly delete the files and abort.

Review Comment:
   can you point me to this part of the code. Where we check for completed 
rollback instants and clean up the files directly ? 
   From what I know, we only check for pending rollbacks 
   ref: getPendingRollbackInfos
   and re-use the rollback instant if applicable. 
   
   Also, do we have a test for this ?



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestCopyOnWriteRollbackActionExecutor.java:
##
@@ -328,4 +340,213 @@ private void performRollbackAndValidate(boolean 
isUsingMarkers, HoodieWriteConfi
 
 assertFalse(WriteMarkersFactory.get(cfg.getMarkersType(), table, 
commitInstant.getTimestamp()).doesMarkerDirExist());
   }
+
+  /**
+   * This method tests rollback of completed ingestion commits and 
replacecommit inflight files
+   * when there is another replacecommit with greater timestamp already 
present in the timeline.
+   */
+  @Test
+  public void testRollbackWhenReplaceCommitIsPresent() throws Exception {

Review Comment:
   can you move these into separate PRs. will help landing the PRs at diff 
pace. also, the PR description will be succinct. its fine for this PR. but in 
future



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/cluster/ClusteringTestBase.java:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator;
+import org.apache.hudi.common.config.HoodieStorageConfig;
+import org.apache.hudi.common.fs.ConsistencyGuardConfig;
+import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion;
+import org.apache.hudi.common.table.view.FileSystemViewStorageConfig;
+import org.apache.hudi.common.table.view.FileSystemViewStorageType;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieCleanConfig;
+import org.apache.hudi.config.HoodieClusteringConfig;
+import 

[GitHub] [hudi] nsivabalan commented on pull request #8849: [UBER] Rollback enhancements

2023-05-31 Thread via GitHub


nsivabalan commented on PR #8849:
URL: https://github.com/apache/hudi/pull/8849#issuecomment-1571156439

   can we please create a jira and link it in title.?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8860: Fix spark sql core flow

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8860:
URL: https://github.com/apache/hudi/pull/8860#issuecomment-1571153775

   
   ## CI report:
   
   * 7dfdce4528315764dd140b5e186f1b6052c9be43 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8859:
URL: https://github.com/apache/hudi/pull/8859#issuecomment-1571153750

   
   ## CI report:
   
   * 641c66d64597d8d83f6234172c3d3dc2c8bc358e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8830: [MINOR] auto generate init client id

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8830:
URL: https://github.com/apache/hudi/pull/8830#issuecomment-1571153675

   
   ## CI report:
   
   * b17cd27589692f3d8cc37865eaa7c75b12ec7141 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17516)
 
   * 065791647cb669b5d18c4c1f4e9dfcf9d4e0c9c1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8826: [DO NOT MERGE][HUDI-6198] Testing Spark 3.4.0 Upgrade on 0.13.1

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8826:
URL: https://github.com/apache/hudi/pull/8826#issuecomment-1571153635

   
   ## CI report:
   
   * e782ae92011fc81144dd9b488e6c2bfd9d5d5906 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17536)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-5257) Spark-Sql duplicates and re-uses record keys under certain configs and use cases

2023-05-31 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler closed HUDI-5257.
-
Resolution: Not A Problem

I think this was due to caching dataframes incorrectly

> Spark-Sql duplicates and re-uses record keys under certain configs and use 
> cases
> 
>
> Key: HUDI-5257
> URL: https://issues.apache.org/jira/browse/HUDI-5257
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
> Attachments: bad_data.txt
>
>
> On a new table with primary key  _row_key and partitioned by partition_path, 
> if you do a bulk insert by:
> {code:java}
> insertDf.createOrReplaceTempView("insert_temp_table")
> spark.sql(s"set hoodie.datasource.write.operation=bulk_insert")
> spark.sql("set hoodie.sql.bulk.insert.enable=true")
> spark.sql("set hoodie.sql.insert.mode=non-strict")
> spark.sql(s"insert into $tableName select * from insert_temp_table") {code}
> you will get data with: [^bad_data.txt] where multiple records have the same 
> key even though they have different primary key values, and that there are 
> multiple files even though there are only 10 records
> changing hoodie.datasource.write.operation=bulk_insert to 
> hoodie.datasource.write.operation=insert causes the data to be inserted 
> correctly. I do not know if it is using bulk insert with this change. 
>  
> However, if you use bulk insert with raw data like 
> {code:java}
> spark.sql(s""" 
> | insert into $tableName values 
> | $values 
> |""".stripMargin
> ){code}
> where $values is something like
> {code:java}
> (1, 'a1', 10, 1000, "2021-01-05"), {code}
> then hoodie.datasource.write.operation=bulk_insert works as expected



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] jonvex opened a new pull request, #8860: Fix spark sql core flow

2023-05-31 Thread via GitHub


jonvex opened a new pull request, #8860:
URL: https://github.com/apache/hudi/pull/8860

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8826: [DO NOT MERGE][HUDI-6198] Testing Spark 3.4.0 Upgrade on 0.13.1

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8826:
URL: https://github.com/apache/hudi/pull/8826#issuecomment-1571147923

   
   ## CI report:
   
   * 6b0dfada89e8aa7a083d717cbe38fff36c3ce745 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17526)
 
   * e782ae92011fc81144dd9b488e6c2bfd9d5d5906 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed pull request #7738: [HUDI-5569] Fixing TableFileSystemView to detect early failed commits

2023-05-31 Thread via GitHub


nsivabalan closed pull request #7738: [HUDI-5569] Fixing TableFileSystemView to 
detect early failed commits
URL: https://github.com/apache/hudi/pull/7738


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #7738: [HUDI-5569] Fixing TableFileSystemView to detect early failed commits

2023-05-31 Thread via GitHub


nsivabalan commented on PR #7738:
URL: https://github.com/apache/hudi/pull/7738#issuecomment-1571131832

   Closing this in favor of https://github.com/apache/hudi/pull/8783


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #8783: Maintain commit timeline even in case of long standing inflights

2023-05-31 Thread via GitHub


nsivabalan commented on code in PR #8783:
URL: https://github.com/apache/hudi/pull/8783#discussion_r1212425833


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -460,19 +465,12 @@ private Stream 
getCommitInstantsToArchive() throws IOException {
   return !(firstSavepoint.isPresent() && 
compareTimestamps(firstSavepoint.get().getTimestamp(), LESSER_THAN_OR_EQUALS, 
s.getTimestamp()));
 }
   }).filter(s -> {
-// Ensure commits >= the oldest pending compaction/replace commit 
is retained
-return oldestPendingCompactionAndReplaceInstant
+// oldestCommitToRetain is the highest completed commit instant 
that is less than the oldest inflight instant.
+// By filter out any commit >= oldestCommitToRetain, we can ensure 
there are no gaps in the timeline
+// when inflight commits are present.
+return oldestCommitToRetain

Review Comment:
   this simplifies a lot :) 



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java:
##
@@ -1416,6 +1418,91 @@ public void 
testArchivalWithMaxDeltaCommitsGuaranteeForCompaction(boolean enable
 }
   }
 
+  @Test

Review Comment:
   can we add some java docs on what we are testing here. 
   simple illustration of timeline might also help



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java:
##
@@ -1416,6 +1418,91 @@ public void 
testArchivalWithMaxDeltaCommitsGuaranteeForCompaction(boolean enable
 }
   }
 
+  @Test
+  public void testGetCommitInstantsToArchiveDuringInflightCommits() throws 
Exception {
+HoodieWriteConfig cfg = initTestTableAndGetWriteConfig(true, 4, 5, 5);
+HoodieTestDataGenerator.createCommitFile(basePath, "100", 
wrapperFs.getConf());
+
+HoodieTestDataGenerator.createReplaceCommitRequestedFile(basePath, "101", 
wrapperFs.getConf());
+HoodieTestDataGenerator.createReplaceCommitInflightFile(basePath, "101", 
wrapperFs.getConf());
+HoodieTestDataGenerator.createCommitFile(basePath, "102", 
wrapperFs.getConf());
+HoodieTestDataGenerator.createCommitFile(basePath, "103", 
wrapperFs.getConf());
+HoodieTestDataGenerator.createCompactionRequestedFile(basePath, "104", 
wrapperFs.getConf());
+HoodieTestDataGenerator.createCompactionAuxiliaryMetadata(basePath,
+new HoodieInstant(State.REQUESTED, HoodieTimeline.COMPACTION_ACTION, 
"104"), wrapperFs.getConf());
+HoodieTestDataGenerator.createCommitFile(basePath, "105", 
wrapperFs.getConf());
+HoodieTestDataGenerator.createCommitFile(basePath, "106", 
wrapperFs.getConf());
+HoodieTestDataGenerator.createCommitFile(basePath, "107", 
wrapperFs.getConf());
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiver archiver = new HoodieTimelineArchiver(cfg, table);
+
+HoodieTimeline timeline = 
metaClient.getActiveTimeline().getWriteTimeline();//getCommitsAndCompactionTimeline();
+assertEquals(8, timeline.countInstants(), "Loaded 8 commits and the count 
should match");
+boolean result = archiver.archiveIfRequired(context);
+assertTrue(result);
+timeline = 
metaClient.getActiveTimeline().reload().getWriteTimeline();//getCommitsAndCompactionTimeline();
+assertTrue(timeline.containsInstant(new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "100")),
+"At least one instant before oldest pending replacecommit need to stay 
in the timeline");
+assertEquals(8, timeline.countInstants(),
+"Since we have a pending replacecommit at 101, we should never archive 
any commit "
++ "after 101 and also to maintain timeline have at least one 
completed commit before pending commit");
+assertTrue(timeline.containsInstant(new HoodieInstant(State.INFLIGHT, 
HoodieTimeline.REPLACE_COMMIT_ACTION, "101")),
+"Inflight replacecommit must still be present");
+assertTrue(timeline.containsInstant(new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "102")),
+"Instants greater than oldest pending commit must be present");
+assertTrue(timeline.containsInstant(new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "103")),
+"Instants greater than oldest pending commit must be present");
+assertTrue(timeline.containsInstant(new HoodieInstant(State.REQUESTED, 
HoodieTimeline.COMPACTION_ACTION, "104")),
+"Instants greater than oldest pending commit must be present");
+assertTrue(timeline.containsInstant(new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "105")),
+"Instants greater than oldest pending commit must be present");
+assertTrue(timeline.containsInstant(new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "106")),
+"Instants greater than oldest pending commit must be present");
+

[GitHub] [hudi] jonvex opened a new pull request, #8859: set operation to bulk insert on fresh table

2023-05-31 Thread via GitHub


jonvex opened a new pull request, #8859:
URL: https://github.com/apache/hudi/pull/8859

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-157898

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * abacb628b9d61b0263f451e34daad0fea65cf496 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17534)
 
   * 720b8aec68410a1e48e57fc540b1885bee6a7423 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17535)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571106196

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * 4c11d10224ebd368cdb83867ec8dbca1c56993f4 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17532)
 
   * abacb628b9d61b0263f451e34daad0fea65cf496 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17534)
 
   * 720b8aec68410a1e48e57fc540b1885bee6a7423 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8858: [HUDI-6301][UBER] Support SqlFileBasedSource for DeltaStreamer

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8858:
URL: https://github.com/apache/hudi/pull/8858#issuecomment-1571101921

   
   ## CI report:
   
   * ac360c703518f4f7a468c86e93f08cbcbacf22b4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17530)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571101877

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * 4c11d10224ebd368cdb83867ec8dbca1c56993f4 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17532)
 
   * abacb628b9d61b0263f451e34daad0fea65cf496 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17534)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8847:
URL: https://github.com/apache/hudi/pull/8847#issuecomment-1571101813

   
   ## CI report:
   
   * fe991dc492e5bec19b4bfd91dc0b210e6b152b7a UNKNOWN
   * ffc65bc577ff5665a4b16f1f3759586fd0b01d2f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17531)
 
   * 19630df001d430cdf4ec6413c6f025e48bb84bb3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17533)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #8829: [HUDI-6277][UBER] Clustering enhancements

2023-05-31 Thread via GitHub


nsivabalan commented on code in PR #8829:
URL: https://github.com/apache/hudi/pull/8829#discussion_r1212410928


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryEqualityPreCommitValidator.java:
##
@@ -57,9 +57,11 @@ protected void validateUsingQuery(String query, String 
prevTableSnapshot, String
 String queryWithPrevSnapshot = 
query.replaceAll(HoodiePreCommitValidatorConfig.VALIDATOR_TABLE_VARIABLE, 
prevTableSnapshot);
 String queryWithNewSnapshot = 
query.replaceAll(HoodiePreCommitValidatorConfig.VALIDATOR_TABLE_VARIABLE, 
newTableSnapshot);
 LOG.info("Running query on previous state: " + queryWithPrevSnapshot);
-Dataset prevRows = sqlContext.sql(queryWithPrevSnapshot);
+Dataset prevRows = sqlContext.sql(queryWithPrevSnapshot).cache();
+LOG.info("Total rows in prevRows " + prevRows.count());
 LOG.info("Running query on new state: " + queryWithNewSnapshot);
-Dataset newRows  = sqlContext.sql(queryWithNewSnapshot);
+Dataset newRows  = sqlContext.sql(queryWithNewSnapshot).cache();
+LOG.info("Total rows in newRows " + newRows.count());

Review Comment:
   should we say "total rows in after state" or something. "newRows" seems not 
a good name



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryInequalityPreCommitValidator.java:
##
@@ -56,9 +56,11 @@ protected void validateUsingQuery(String query, String 
prevTableSnapshot, String
 String queryWithPrevSnapshot = 
query.replaceAll(HoodiePreCommitValidatorConfig.VALIDATOR_TABLE_VARIABLE, 
prevTableSnapshot);
 String queryWithNewSnapshot = 
query.replaceAll(HoodiePreCommitValidatorConfig.VALIDATOR_TABLE_VARIABLE, 
newTableSnapshot);
 LOG.info("Running query on previous state: " + queryWithPrevSnapshot);
-Dataset prevRows = sqlContext.sql(queryWithPrevSnapshot);
+Dataset prevRows = sqlContext.sql(queryWithPrevSnapshot).cache();
+LOG.info("Total rows in prevRows " + prevRows.count());
 LOG.info("Running query on new state: " + queryWithNewSnapshot);
-Dataset newRows  = sqlContext.sql(queryWithNewSnapshot);
+Dataset newRows  = sqlContext.sql(queryWithNewSnapshot).cache();
+LOG.info("Total rows in newRows " + newRows.count());

Review Comment:
   same here. 
   



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/execution/HoodieLazyInsertIterable.java:
##
@@ -90,12 +90,16 @@ public R getResult() {
 }
   }
 
-  static  Function, 
HoodieInsertValueGenResult> getTransformer(Schema schema,
+  /**
+   * Transformer function to help transform a HoodieRecord. This transformer 
is used by BufferedIterator to offload some
+   * expensive operations of transformation to the reader thread.
+   */
+  public  Function, 
HoodieInsertValueGenResult> getTransformer(Schema schema,

 HoodieWriteConfig writeConfig) {
 return getTransformerInternal(schema, writeConfig);
   }
 
-  private static  Function, 
HoodieInsertValueGenResult> getTransformerInternal(Schema schema,
+  public static  Function, 
HoodieInsertValueGenResult> getTransformerInternal(Schema schema,

Review Comment:
   does this need to be static then ? 



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SparkPreCommitValidator.java:
##
@@ -73,7 +79,23 @@ public void validate(String instantTime, 
HoodieWriteMetadata writeResult, Dat
 try {
   validateRecordsBeforeAndAfter(before, after, 
getPartitionsModified(writeResult));
 } finally {
-  LOG.info(getClass() + " validator took " + timer.endTimer() + " ms");
+  long duration = timer.endTimer();
+  LOG.info(getClass() + " validator took " + duration + " ms" + ", metrics 
on? " + getWriteConfig().isMetricsOn());
+  publishRunStats(instantTime, duration);
+}
+  }
+
+  /**
+   * Publish pre-commit validator run stats for a given commit action.
+   */
+  private void publishRunStats(String instantTime, long duration) {
+// Record validator duration metrics.
+if (getWriteConfig().isMetricsOn()) {
+  HoodieTableMetaClient metaClient = getHoodieTable().getMetaClient();
+  Option currentInstant = metaClient.getActiveTimeline()
+  .findInstantsAfterOrEquals(instantTime, 1)
+  .firstInstant();
+  metrics.reportMetrics(currentInstant.get().getAction(), 
getClass().getSimpleName(), duration);

Review Comment:
   not too strong on the suggestion. 
   but we can fetch the write operation type from HoodieCommitMetadata which is 
within HoodieWriteMetadata. 
   we can avoid loading the timeline if we go that route. 
   



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java:
##
@@ -124,8 +125,15 @@ private HoodieData> 
clusteringHandleUpdate(HoodieData>> 

[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571074411

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * ef606cf4ab9b726ee9c9af333145b7e8079fe20a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17522)
 
   * 4c11d10224ebd368cdb83867ec8dbca1c56993f4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17532)
 
   * abacb628b9d61b0263f451e34daad0fea65cf496 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8851: [HUDI-6281] Comprehensive schema evolution supports column change with a default value

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8851:
URL: https://github.com/apache/hudi/pull/8851#issuecomment-1571067264

   
   ## CI report:
   
   * 2db6852dd391973eab275dc7ef70c02bfbc5f652 UNKNOWN
   * 60c1399ac012bc61421f3bb1feb208decbcb6b6a UNKNOWN
   * ef606cf4ab9b726ee9c9af333145b7e8079fe20a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17522)
 
   * 4c11d10224ebd368cdb83867ec8dbca1c56993f4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8847:
URL: https://github.com/apache/hudi/pull/8847#issuecomment-1571067210

   
   ## CI report:
   
   * fe991dc492e5bec19b4bfd91dc0b210e6b152b7a UNKNOWN
   * ffc65bc577ff5665a4b16f1f3759586fd0b01d2f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17531)
 
   * 19630df001d430cdf4ec6413c6f025e48bb84bb3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8847:
URL: https://github.com/apache/hudi/pull/8847#issuecomment-1571061554

   
   ## CI report:
   
   * fe991dc492e5bec19b4bfd91dc0b210e6b152b7a UNKNOWN
   * ffc65bc577ff5665a4b16f1f3759586fd0b01d2f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17531)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] forest455 closed issue #8841: [SUPPORT] presto querying hudi COW table can't get latest snapshot

2023-05-31 Thread via GitHub


forest455 closed issue #8841: [SUPPORT]  presto querying hudi COW table  can't 
get latest snapshot  
URL: https://github.com/apache/hudi/issues/8841


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8847:
URL: https://github.com/apache/hudi/pull/8847#issuecomment-1571020939

   
   ## CI report:
   
   * fe991dc492e5bec19b4bfd91dc0b210e6b152b7a UNKNOWN
   * bf40585b9b516ff7700d78bd4fc2cc9b89854f3e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17527)
 
   * ffc65bc577ff5665a4b16f1f3759586fd0b01d2f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17531)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6302) Fix Record Level Index for update partition path config values

2023-05-31 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-6302:
-

 Summary: Fix Record Level Index for update partition path config 
values
 Key: HUDI-6302
 URL: https://issues.apache.org/jira/browse/HUDI-6302
 Project: Apache Hudi
  Issue Type: Improvement
  Components: index, writer-core
Reporter: sivabalan narayanan


We are adding RLI in 0.14.0. We also need to support update partition path 
config. 

If its set to true, we might have to migrate the record to new partition. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8856: [HUDI-6300] fix file size parallelism not work when init metadata table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8856:
URL: https://github.com/apache/hudi/pull/8856#issuecomment-1571006226

   
   ## CI report:
   
   * 23a574b64681c95c17db47d4c63c86d7e0215ba9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17528)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8847:
URL: https://github.com/apache/hudi/pull/8847#issuecomment-1571006019

   
   ## CI report:
   
   * fe991dc492e5bec19b4bfd91dc0b210e6b152b7a UNKNOWN
   * bf40585b9b516ff7700d78bd4fc2cc9b89854f3e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17527)
 
   * ffc65bc577ff5665a4b16f1f3759586fd0b01d2f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8858: [HUDI-6301][UBER] Support SqlFileBasedSource for DeltaStreamer

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8858:
URL: https://github.com/apache/hudi/pull/8858#issuecomment-1570948967

   
   ## CI report:
   
   * ac360c703518f4f7a468c86e93f08cbcbacf22b4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17530)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6301) Support SqlFileBasedSource for DeltaStreamer

2023-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6301:
-
Labels: pull-request-available  (was: )

> Support SqlFileBasedSource for DeltaStreamer
> 
>
> Key: HUDI-6301
> URL: https://issues.apache.org/jira/browse/HUDI-6301
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8858: [HUDI-6301][UBER] Support SqlFileBasedSource for DeltaStreamer

2023-05-31 Thread via GitHub


hudi-bot commented on PR #8858:
URL: https://github.com/apache/hudi/pull/8858#issuecomment-1570939581

   
   ## CI report:
   
   * ac360c703518f4f7a468c86e93f08cbcbacf22b4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6301) Support SqlFileBasedSource for DeltaStreamer

2023-05-31 Thread Surya Prasanna Yalla (Jira)
Surya Prasanna Yalla created HUDI-6301:
--

 Summary: Support SqlFileBasedSource for DeltaStreamer
 Key: HUDI-6301
 URL: https://issues.apache.org/jira/browse/HUDI-6301
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Surya Prasanna Yalla
Assignee: Surya Prasanna Yalla






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] suryaprasanna opened a new pull request, #8858: [UBER] Support SqlFileBasedSource for DeltaStreamer

2023-05-31 Thread via GitHub


suryaprasanna opened a new pull request, #8858:
URL: https://github.com/apache/hudi/pull/8858

   ### Change Logs
   
   Introducing a new Source type called SqlFileBasedSource which can be used to 
create complex SQL statements by and caches datasets in-memory and running 
complex joins on them.
   
   ### Impact
   
   This is a new feature should not impact existing use cases.
   
   ### Risk level (write none, low medium or high below)
   
   None.
   
   ### Documentation Update
   
   Some documentation is added within the classes, but since new configs and 
source are introduced it requires updates to the 
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >