[hudi] branch master updated (205e48f -> 50fa5a6)

2022-01-05 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 205e48f  [HUDI-3132] Minor fixes for HoodieCatalog
 add 50fa5a6  Update HiveIncrementalPuller to configure filesystem (#4431)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/utilities/HiveIncrementalPuller.java | 3 +++
 1 file changed, 3 insertions(+)


[GitHub] [hudi] codope merged pull request #4431: Update HiveIncrementalPuller.java

2022-01-05 Thread GitBox


codope merged pull request #4431:
URL: https://github.com/apache/hudi/pull/4431


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3158) Reduce warn logs in Spark SQL INSERT OVERWRITE

2022-01-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3158:
-
Status: In Progress  (was: Open)

> Reduce warn logs in Spark SQL INSERT OVERWRITE
> --
>
> Key: HUDI-3158
> URL: https://issues.apache.org/jira/browse/HUDI-3158
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.11.0, 0.10.1
>
>
> {code:java}
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]{code}
> To reduce the repeated warn logs
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3158) Reduce warn logs in Spark SQL INSERT OVERWRITE

2022-01-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3158:
-
Reviewers: Raymond Xu, sivabalan narayanan

> Reduce warn logs in Spark SQL INSERT OVERWRITE
> --
>
> Key: HUDI-3158
> URL: https://issues.apache.org/jira/browse/HUDI-3158
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.11.0, 0.10.1
>
>
> {code:java}
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]{code}
> To reduce the repeated warn logs
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3158) Reduce warn logs in Spark SQL INSERT OVERWRITE

2022-01-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3158:


Assignee: 董可伦  (was: sivabalan narayanan)

> Reduce warn logs in Spark SQL INSERT OVERWRITE
> --
>
> Key: HUDI-3158
> URL: https://issues.apache.org/jira/browse/HUDI-3158
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.11.0, 0.10.1
>
>
> {code:java}
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]
> 22/01/03 19:35:12 WARN ClusteringUtils: No content found in requested file 
> for instant [==>20220103192919722__replacecommit__REQUESTED]{code}
> To reduce the repeated warn logs
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3163) Validate/certify hudi against diff spark 3 versions

2022-01-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3163:
-
Story Points: 2

> Validate/certify hudi against diff spark 3 versions 
> 
>
> Key: HUDI-3163
> URL: https://issues.apache.org/jira/browse/HUDI-3163
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Raymond Xu
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> We have diff spark3 versions. Lets validate/certify diff spark3 versions 
> against 0.10.0 and master.
>  
> I do see this in our github readme. If its already certified, feel free to 
> close it out(link to original ticket where verifications are documented. 
> {code:java}
> # Build against Spark 3.2.0 (default build shipped with the public jars)
> mvn clean package -DskipTests -Dspark3# Build against Spark 3.1.2
> mvn clean package -DskipTests -Dspark3.1.x# Build against Spark 3.0.3
> mvn clean package -DskipTests -Dspark3.0.x {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3163) Validate/certify hudi against diff spark 3 versions

2022-01-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3163:
-
Epic Link: HUDI-1658

> Validate/certify hudi against diff spark 3 versions 
> 
>
> Key: HUDI-3163
> URL: https://issues.apache.org/jira/browse/HUDI-3163
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Raymond Xu
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> We have diff spark3 versions. Lets validate/certify diff spark3 versions 
> against 0.10.0 and master.
>  
> I do see this in our github readme. If its already certified, feel free to 
> close it out(link to original ticket where verifications are documented. 
> {code:java}
> # Build against Spark 3.2.0 (default build shipped with the public jars)
> mvn clean package -DskipTests -Dspark3# Build against Spark 3.1.2
> mvn clean package -DskipTests -Dspark3.1.x# Build against Spark 3.0.3
> mvn clean package -DskipTests -Dspark3.0.x {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-2915) Fix field not found in record error for spark-sql

2022-01-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2915.

Fix Version/s: (was: 0.11.0)
   (was: 0.10.1)
   Resolution: Won't Fix

> Fix field not found in record error for spark-sql
> -
>
> Key: HUDI-2915
> URL: https://issues.apache.org/jira/browse/HUDI-2915
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: pull-request-available
> Attachments: image-2021-12-02-19-37-10-346.png
>
>
> !image-2021-12-02-19-37-10-346.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2915) Fix field not found in record error for spark-sql

2022-01-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2915:
-
Reporter: Forward Xu  (was: Raymond Xu)

> Fix field not found in record error for spark-sql
> -
>
> Key: HUDI-2915
> URL: https://issues.apache.org/jira/browse/HUDI-2915
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-12-02-19-37-10-346.png
>
>
> !image-2021-12-02-19-37-10-346.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-2661) java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy

2022-01-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2661.

Fix Version/s: (was: 0.11.0)
   (was: 0.10.1)
   Resolution: Not A Bug

> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy
> 
>
> Key: HUDI-2661
> URL: https://issues.apache.org/jira/browse/HUDI-2661
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.10.0
>Reporter: Changjun Zhang
>Assignee: Forward Xu
>Priority: Critical
> Attachments: image-2021-11-01-21-47-44-538.png, 
> image-2021-11-01-21-48-22-765.png
>
>
> Hudi Integrate with Spark SQL  :
> when I add :
> {code:sh}
> // Some comments here
> spark-sql --conf 
> 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
> --conf 
> 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
> {code}
> to create a table on an existing hudi table: 
> {code:sql}
> create table testdb.tb_hudi_operation_test using hudi 
> location '/tmp/flinkdb/datas/tb_hudi_operation';
> {code}
> then throw Exception :
>  !image-2021-11-01-21-47-44-538.png|thumbnail! 
>  !image-2021-11-01-21-48-22-765.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] xushiyan commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write

2022-01-05 Thread GitBox


xushiyan commented on a change in pull request #2903:
URL: https://github.com/apache/hudi/pull/2903#discussion_r779342696



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##
@@ -336,7 +337,42 @@ class HoodieSparkSqlWriterSuite extends FunSuite with 
Matchers {
 }
   }
 
-  test("test bulk insert dataset with datasource impl multiple rounds") {
+  test("test read of a table with one failed write") {
+initSparkContext("test_read_table_with_one_failed_write")
+val path = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+try {
+  val hoodieFooTableName = "hoodie_foo_tbl"
+  val fooTableModifier = Map("path" -> path.toAbsolutePath.toString,
+HoodieWriteConfig.TABLE_NAME.key() -> hoodieFooTableName,
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key() -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key() -> 
"partition")
+
+  val fooTableParams = 
HoodieWriterUtils.parametersWithWriteDefaults(fooTableModifier)
+  val props = new Properties()
+  fooTableParams.foreach(entry => props.setProperty(entry._1, entry._2))
+  val metaClient = 
HoodieTableMetaClient.initTableAndGetMetaClient(spark.sparkContext.hadoopConfiguration,
 path.toAbsolutePath.toString, props)
+
+  val partitionAndFileId = new util.HashMap[String, String]()
+  
partitionAndFileId.put(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH, 
"file-1")
+
+  
HoodieTestTable.of(metaClient).withPartitionMetaFiles(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH)
+.addInflightCommit("001")
+.withBaseFilesInPartitions(partitionAndFileId)
+
+  val snapshotDF1 = spark.read.format("org.apache.hudi")
+.load(path.toAbsolutePath.toString + "/*/*/*/*")
+  snapshotDF1.count()
+  assertFalse(true)
+}  catch {
+  case e: InvalidTableException =>
+assertTrue(e.getMessage.contains("Invalid Hoodie Table"))

Review comment:
   Looks like we should fix `InvalidTableException` to say `Hudi` instead 
of `Hoodie` since this is user-facing message.
   
   ```suggestion
   assertTrue(e.getMessage.contains("Invalid Hudi Table"))
   ```

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -106,14 +106,16 @@ class DefaultSource extends RelationProvider
 val metaClient = 
HoodieTableMetaClient.builder().setConf(fs.getConf).setBasePath(tablePath).build()
 val isBootstrappedTable = 
metaClient.getTableConfig.getBootstrapBasePath.isPresent
 val tableType = metaClient.getTableType
-
 // First check if the ConfigUtils.IS_QUERY_AS_RO_TABLE has set by 
HiveSyncTool,
 // or else use query type from QUERY_TYPE_OPT_KEY.
 val queryType = parameters.get(ConfigUtils.IS_QUERY_AS_RO_TABLE)
   .map(is => if (is.toBoolean) QUERY_TYPE_READ_OPTIMIZED_OPT_VAL else 
QUERY_TYPE_SNAPSHOT_OPT_VAL)
   .getOrElse(parameters.getOrElse(QUERY_TYPE_OPT_KEY.key, 
QUERY_TYPE_OPT_KEY.defaultValue()))
 
 log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: 
$tableType, queryType is: $queryType")
+if (metaClient.getCommitsTimeline.filterCompletedInstants.empty()) {
+  throw new InvalidTableException("No valid commits found in the given 
path " + metaClient.getBasePath)
+}

Review comment:
   @nsivabalan since this is a validation check, can we move it to L107 
just below `metaClient` creation?

##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##
@@ -336,7 +337,42 @@ class HoodieSparkSqlWriterSuite extends FunSuite with 
Matchers {
 }
   }
 
-  test("test bulk insert dataset with datasource impl multiple rounds") {
+  test("test read of a table with one failed write") {

Review comment:
   don't think we need a functional test for this. we should be cognizant 
about over-testing (causing long testing time and maintenance effort). The 
simple logic flow can be properly covered by a UT on `createRelation()` to 
assert an exception?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


leesf commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779339698



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import scala.collection.JavaConverters._
+import java.net.URI
+import java.util.{Date, Locale, Properties}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport}
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.DFSPropertiesConfiguration
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, 
HoodieInstantTimeGenerator}
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.sql.{Column, DataFrame, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType}
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, 
Expression, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.sql.types.{DataType, NullType, StringType, 
StructField, StructType}
+
+import java.text.SimpleDateFormat
+
+import scala.collection.immutable.Map
+
+object HoodieSqlCommonUtils extends SparkAdapterSupport {
+  // NOTE: {@code SimpleDataFormat} is NOT thread-safe
+  // TODO replace w/ DateTimeFormatter
+  private val defaultDateFormat =
+  ThreadLocal.withInitial(new java.util.function.Supplier[SimpleDateFormat] {
+override def get() = new SimpleDateFormat("-MM-dd")
+  })
+
+  def isHoodieTable(table: CatalogTable): Boolean = {
+table.provider.map(_.toLowerCase(Locale.ROOT)).orNull == "hudi"
+  }
+
+  def isHoodieTable(tableId: TableIdentifier, spark: SparkSession): Boolean = {
+val table = spark.sessionState.catalog.getTableMetadata(tableId)
+isHoodieTable(table)
+  }
+
+  def isHoodieTable(table: LogicalPlan, spark: SparkSession): Boolean = {
+tripAlias(table) match {
+  case LogicalRelation(_, _, Some(tbl), _) => isHoodieTable(tbl)
+  case relation: UnresolvedRelation =>
+isHoodieTable(sparkAdapter.toTableIdentifier(relation), spark)
+  case _=> false
+}
+  }
+
+  def getTableIdentify(table: LogicalPlan): TableIdentifier = {

Review comment:
   > Let me make another pass at all the pom changes. That seems to be main 
thing here. In the meantime, could you clarify these comments?
   > 
   > Also have you tested these changes across spark 2.x and 3.1/3.2 bundles ?
   
   @vinothchandar Yes, have manually tested with spark3.2.0 & spark 3.1.2 with 
the hudi-spark-bundle jar on spark sql. spark2.x passed as CI passed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


leesf commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779339698



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import scala.collection.JavaConverters._
+import java.net.URI
+import java.util.{Date, Locale, Properties}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport}
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.DFSPropertiesConfiguration
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, 
HoodieInstantTimeGenerator}
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.sql.{Column, DataFrame, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType}
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, 
Expression, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.sql.types.{DataType, NullType, StringType, 
StructField, StructType}
+
+import java.text.SimpleDateFormat
+
+import scala.collection.immutable.Map
+
+object HoodieSqlCommonUtils extends SparkAdapterSupport {
+  // NOTE: {@code SimpleDataFormat} is NOT thread-safe
+  // TODO replace w/ DateTimeFormatter
+  private val defaultDateFormat =
+  ThreadLocal.withInitial(new java.util.function.Supplier[SimpleDateFormat] {
+override def get() = new SimpleDateFormat("-MM-dd")
+  })
+
+  def isHoodieTable(table: CatalogTable): Boolean = {
+table.provider.map(_.toLowerCase(Locale.ROOT)).orNull == "hudi"
+  }
+
+  def isHoodieTable(tableId: TableIdentifier, spark: SparkSession): Boolean = {
+val table = spark.sessionState.catalog.getTableMetadata(tableId)
+isHoodieTable(table)
+  }
+
+  def isHoodieTable(table: LogicalPlan, spark: SparkSession): Boolean = {
+tripAlias(table) match {
+  case LogicalRelation(_, _, Some(tbl), _) => isHoodieTable(tbl)
+  case relation: UnresolvedRelation =>
+isHoodieTable(sparkAdapter.toTableIdentifier(relation), spark)
+  case _=> false
+}
+  }
+
+  def getTableIdentify(table: LogicalPlan): TableIdentifier = {

Review comment:
   > Let me make another pass at all the pom changes. That seems to be main 
thing here. In the meantime, could you clarify these comments?
   > 
   > Also have you tested these changes across spark 2.x and 3.1/3.2 bundles ?
   
   @vinothchandar Yes, have manually tested with spark3.2.0 & spark 3.1.2 with 
the hudi-spark-bundle jar on spark sql. spark2.x passed as CI passed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4471: [HUDI-3125] spark-sql write timestamp directly

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4471:
URL: https://github.com/apache/hudi/pull/4471#issuecomment-1006331887


   
   ## CI report:
   
   * 29b1742747a4195db690d09f09de972ab7f409db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4895)
 
   * a5dcf171a39b236a74b9a70b0eb0b49e74ebc3b5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4934)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4471: [HUDI-3125] spark-sql write timestamp directly

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4471:
URL: https://github.com/apache/hudi/pull/4471#issuecomment-1006330382


   
   ## CI report:
   
   * 29b1742747a4195db690d09f09de972ab7f409db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4895)
 
   * a5dcf171a39b236a74b9a70b0eb0b49e74ebc3b5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4471: [HUDI-3125] spark-sql write timestamp directly

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4471:
URL: https://github.com/apache/hudi/pull/4471#issuecomment-1005421630


   
   ## CI report:
   
   * 29b1742747a4195db690d09f09de972ab7f409db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4895)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4471: [HUDI-3125] spark-sql write timestamp directly

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4471:
URL: https://github.com/apache/hudi/pull/4471#issuecomment-1006330382


   
   ## CI report:
   
   * 29b1742747a4195db690d09f09de972ab7f409db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4895)
 
   * a5dcf171a39b236a74b9a70b0eb0b49e74ebc3b5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006329173


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932)
 
   * d708467de740637a394375335181979a343979bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006327925


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932)
 
   * d708467de740637a394375335181979a343979bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006327925


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932)
 
   * d708467de740637a394375335181979a343979bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006326640


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932)
 
   * d708467de740637a394375335181979a343979bd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006326640


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932)
 
   * d708467de740637a394375335181979a343979bd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006325489


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006325489


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006324209


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan edited a comment on pull request #4440: [HUDI-3100] Add config for hive conditional sync

2022-01-05 Thread GitBox


xushiyan edited a comment on pull request #4440:
URL: https://github.com/apache/hudi/pull/4440#issuecomment-1006324357


   > buildHiveSyncConfig
   
   @nsivabalan This is the changed in the patch already, right?
   
   On a separate note, do you think we should still include this in 0.10.1 
given it adds a new config? So it's a matter of whether we should include a new 
config to fix something in the minor release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on pull request #4440: [HUDI-3100] Add config for hive conditional sync

2022-01-05 Thread GitBox


xushiyan commented on pull request #4440:
URL: https://github.com/apache/hudi/pull/4440#issuecomment-1006324357


   > buildHiveSyncConfig
   
   @nsivabalan This is the changed in the patch already, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006324209


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3183) Fix wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3183:
-
Labels: pull-request-available  (was: )

> Fix wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
> 
>
> Key: HUDI-3183
> URL: https://issues.apache.org/jira/browse/HUDI-3183
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] zhangyue19921010 opened a new pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread GitBox


zhangyue19921010 opened a new pull request #4521:
URL: https://github.com/apache/hudi/pull/4521


   https://issues.apache.org/jira/browse/HUDI-3183
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006302459


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * d9381d0ab97632b4234c9c0ccc5be3b01192bafa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4924)
 
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006323042


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3183) Fix wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-05 Thread Yue Zhang (Jira)
Yue Zhang created HUDI-3183:
---

 Summary: Fix wrong result of HoodieArchivedTimeline loadInstants 
with TimeRangeFilter
 Key: HUDI-3183
 URL: https://issues.apache.org/jira/browse/HUDI-3183
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Yue Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] codope commented on a change in pull request #4451: [HUDI-3104] Kafka-connect support hadoop config environments and properties

2022-01-05 Thread GitBox


codope commented on a change in pull request #4451:
URL: https://github.com/apache/hudi/pull/4451#discussion_r779328474



##
File path: hudi-kafka-connect/pom.xml
##
@@ -271,5 +271,12 @@
 junit-platform-commons
 test
 
+
+

Review comment:
   I see this dependency is required to set env vars at test time. Is there 
a way to avoid this? Maybe add a wrapper class for system.getEnv calls and mock 
that class in test?

##
File path: 
hudi-kafka-connect/src/main/java/org/apache/hudi/connect/utils/KafkaConnectUtils.java
##
@@ -65,6 +70,47 @@
 
   private static final Logger LOG = 
LogManager.getLogger(KafkaConnectUtils.class);
   private static final String HOODIE_CONF_PREFIX = "hoodie.";
+  private static final List DEFAULT_HADOOP_CONF_FILES;
+
+  static {
+DEFAULT_HADOOP_CONF_FILES = new ArrayList<>();
+try {
+  String hadoopConfigPath = System.getenv("HADOOP_CONF_DIR");
+  String hadoopHomePath = System.getenv("HADOOP_HOME");

Review comment:
   Extract HADOOP_CONF_DIR and HADOOP_HOME to constants?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


leesf commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779325386



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -177,7 +177,7 @@ class DefaultSource extends RelationProvider
   outputMode)
   }
 
-  override def shortName(): String = "hudi"
+  override def shortName(): String = "hudi_v1"

Review comment:
   If not change the format, it would conflict with 
hudi-spark2/hudi-spark3.1.x/hudi-spark3 module format.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-3182) Support parquet modular encryption

2022-01-05 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui closed HUDI-3182.
---
Resolution: Duplicate

> Support parquet modular encryption
> --
>
> Key: HUDI-3182
> URL: https://issues.apache.org/jira/browse/HUDI-3182
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: liujinhui
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3182) Support parquet modular encryption

2022-01-05 Thread liujinhui (Jira)
liujinhui created HUDI-3182:
---

 Summary: Support parquet modular encryption
 Key: HUDI-3182
 URL: https://issues.apache.org/jira/browse/HUDI-3182
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: liujinhui






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] xushiyan commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


xushiyan commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779322467



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -177,7 +177,7 @@ class DefaultSource extends RelationProvider
   outputMode)
   }
 
-  override def shortName(): String = "hudi"
+  override def shortName(): String = "hudi_v1"

Review comment:
   @leesf i suppose this refactoring PR not meant to include this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] waywtdcc edited a comment on issue #4508: [SUPPORT]Duplicate Flink Hudi data

2022-01-05 Thread GitBox


waywtdcc edited a comment on issue #4508:
URL: https://github.com/apache/hudi/issues/4508#issuecomment-1006260809


   Hello, have you repeated this question? Bootstrap did not read all data. 
What caused it? @danny0405 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #4485: [HUDI-2947] Fixing checkpoint fetch in detlastreamer

2022-01-05 Thread GitBox


codope commented on a change in pull request #4485:
URL: https://github.com/apache/hudi/pull/4485#discussion_r779317011



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -471,6 +468,22 @@ public void refreshTimeline() throws IOException {
 }).filter(Option::isPresent).findFirst().orElse(Option.empty());
   }
 
+  protected Option 
getLatestCommitMetadataWithValidCheckpointInfo(HoodieTimeline timeline) throws 
IOException {
+return (Option) 
timeline.getReverseOrderedInstants().map(instant -> {
+  try {
+HoodieCommitMetadata commitMetadata = HoodieCommitMetadata
+.fromBytes(timeline.getInstantDetails(instant).get(), 
HoodieCommitMetadata.class);
+if (commitMetadata.getMetadata(CHECKPOINT_KEY) != null || 
commitMetadata.getMetadata(CHECKPOINT_RESET_KEY) != null) {

Review comment:
   Should we check for empty string as well?
   

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -471,6 +468,22 @@ public void refreshTimeline() throws IOException {
 }).filter(Option::isPresent).findFirst().orElse(Option.empty());
   }
 
+  protected Option 
getLatestCommitMetadataWithValidCheckpointInfo(HoodieTimeline timeline) throws 
IOException {

Review comment:
   Why can't we directly return the checkpoint optional instead of 
HoodieCommitMetadata?

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -330,31 +330,28 @@ public void refreshTimeline() throws IOException {
 if (commitTimelineOpt.isPresent()) {
   Option lastCommit = commitTimelineOpt.get().lastInstant();
   if (lastCommit.isPresent()) {
-HoodieCommitMetadata commitMetadata = HoodieCommitMetadata
-
.fromBytes(commitTimelineOpt.get().getInstantDetails(lastCommit.get()).get(), 
HoodieCommitMetadata.class);
-if (cfg.checkpoint != null && 
(StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))
-|| 
!cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY {
-  resumeCheckpointStr = Option.of(cfg.checkpoint);
-} else if 
(!StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_KEY))) {
-  //if previous checkpoint is an empty string, skip resume use 
Option.empty()
-  resumeCheckpointStr = 
Option.of(commitMetadata.getMetadata(CHECKPOINT_KEY));
-} else if 
(HoodieTimeline.compareTimestamps(HoodieTimeline.FULL_BOOTSTRAP_INSTANT_TS,
-HoodieTimeline.LESSER_THAN, lastCommit.get().getTimestamp())) {
-  // if previous commit metadata did not have the checkpoint key, try 
traversing previous commits until we find one.
-  Option prevCheckpoint = 
getPreviousCheckpoint(commitTimelineOpt.get());
-  if (prevCheckpoint.isPresent()) {
-resumeCheckpointStr = prevCheckpoint;
-  } else {
+// if previous commit metadata did not have the checkpoint key, try 
traversing previous commits until we find one.
+Option commitMetadataOption = 
getLatestCommitMetadataWithValidCheckpointInfo(commitTimelineOpt.get());
+if (commitMetadataOption.isPresent()) {
+  HoodieCommitMetadata commitMetadata = commitMetadataOption.get();
+  if (cfg.checkpoint != null && 
(StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))
+  || 
!cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY {
+resumeCheckpointStr = Option.of(cfg.checkpoint);
+  } else if 
(!StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_KEY))) {
+//if previous checkpoint is an empty string, skip resume use 
Option.empty()
+resumeCheckpointStr = 
Option.of(commitMetadata.getMetadata(CHECKPOINT_KEY));
+  } else if 
(HoodieTimeline.compareTimestamps(HoodieTimeline.FULL_BOOTSTRAP_INSTANT_TS,
+  HoodieTimeline.LESSER_THAN, lastCommit.get().getTimestamp())) {
 throw new HoodieDeltaStreamerException(
 "Unable to find previous checkpoint. Please double check if 
this table "
 + "was indeed built via delta streamer. Last Commit :" + 
lastCommit + ", Instants :"
 + 
commitTimelineOpt.get().getInstants().collect(Collectors.toList()) + ", 
CommitMetadata="
 + commitMetadata.toJsonString());
   }
-}
-// KAFKA_CHECKPOINT_TYPE will be honored only for first batch.
-if 
(!StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))) {
-  props.remove(KafkaOffsetGen.Config.KAFKA_CHECKPOINT_TYPE.key());
+  // KAFKA_CHECKPOINT_TYPE will be honored only for first batch.
+  if 
(!StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))) {
+

[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


leesf commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779318890



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/ValidateDuplicateKeyPayload.scala
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi.command
+
+import org.apache.avro.Schema
+import org.apache.avro.generic.{GenericRecord, IndexedRecord}
+import org.apache.hudi.common.model.{DefaultHoodieRecordPayload, HoodieRecord}
+import org.apache.hudi.common.util.{Option => HOption}
+import org.apache.hudi.exception.HoodieDuplicateKeyException
+
+
+import java.util.Properties
+
+/**
+ * Validate the duplicate key for insert statement without enable the 
INSERT_DROP_DUPS_OPT
+ * config.
+ */
+class ValidateDuplicateKeyPayload(record: GenericRecord, orderingVal: 
Comparable[_])

Review comment:
   yes, moved from InsertIntoHoodieTableCommand




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] waywtdcc edited a comment on issue #4508: [SUPPORT]Duplicate Flink Hudi data

2022-01-05 Thread GitBox


waywtdcc edited a comment on issue #4508:
URL: https://github.com/apache/hudi/issues/4508#issuecomment-1006260809


   Hello, have you repeated this question? Bootstrap did not read all data. 
What caused it? @danny0405 
   
   There was a known bug that has been fixed, use the code that i answered 
above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006302459


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * d9381d0ab97632b4234c9c0ccc5be3b01192bafa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4924)
 
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006293124


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * d9381d0ab97632b4234c9c0ccc5be3b01192bafa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4924)
 
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #4350: [HUDI-3047] Basic Implementation of Spark Datasource V2

2022-01-05 Thread GitBox


vinothchandar commented on a change in pull request #4350:
URL: https://github.com/apache/hudi/pull/4350#discussion_r779313494



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -177,7 +177,7 @@ class DefaultSource extends RelationProvider
   outputMode)
   }
 
-  override def shortName(): String = "hudi"
+  override def shortName(): String = "hudi_v1"

Review comment:
   this would cause every job out there to be upgraded? Not sure if we can 
afford to do this. Also would like to clearly understand if the new v2 
implementation will support ALL the existing functionality or be a drop-in 
replacement for the current v1 implementation?
   
   I think its crucial to get aligned on this before we proceed further. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


vinothchandar commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779306660



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import scala.collection.JavaConverters._
+import java.net.URI
+import java.util.{Date, Locale, Properties}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport}
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.DFSPropertiesConfiguration
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, 
HoodieInstantTimeGenerator}
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.sql.{Column, DataFrame, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType}
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, 
Expression, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.sql.types.{DataType, NullType, StringType, 
StructField, StructType}
+
+import java.text.SimpleDateFormat
+
+import scala.collection.immutable.Map
+
+object HoodieSqlCommonUtils extends SparkAdapterSupport {
+  // NOTE: {@code SimpleDataFormat} is NOT thread-safe
+  // TODO replace w/ DateTimeFormatter
+  private val defaultDateFormat =
+  ThreadLocal.withInitial(new java.util.function.Supplier[SimpleDateFormat] {
+override def get() = new SimpleDateFormat("-MM-dd")
+  })
+
+  def isHoodieTable(table: CatalogTable): Boolean = {
+table.provider.map(_.toLowerCase(Locale.ROOT)).orNull == "hudi"
+  }
+
+  def isHoodieTable(tableId: TableIdentifier, spark: SparkSession): Boolean = {
+val table = spark.sessionState.catalog.getTableMetadata(tableId)
+isHoodieTable(table)
+  }
+
+  def isHoodieTable(table: LogicalPlan, spark: SparkSession): Boolean = {
+tripAlias(table) match {
+  case LogicalRelation(_, _, Some(tbl), _) => isHoodieTable(tbl)
+  case relation: UnresolvedRelation =>
+isHoodieTable(sparkAdapter.toTableIdentifier(relation), spark)
+  case _=> false
+}
+  }
+
+  def getTableIdentify(table: LogicalPlan): TableIdentifier = {

Review comment:
   getTableIdentifier?

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/ValidateDuplicateKeyPayload.scala
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi.command
+
+import org.apache.avro.Schema
+import org.apache.avro.generic.{GenericRecord, IndexedRecord}
+import org.apache.hudi.common.model.{DefaultHoodieRecordPayload, HoodieRecord}

[GitHub] [hudi] hudi-bot commented on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006296565


   
   ## CI report:
   
   * 312dd2568ea14fe350bb365dfc83796a672fbe9f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4930)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006295396


   
   ## CI report:
   
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4926)
 
   * 312dd2568ea14fe350bb365dfc83796a672fbe9f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4930)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006295396


   
   ## CI report:
   
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4926)
 
   * 312dd2568ea14fe350bb365dfc83796a672fbe9f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4930)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006294261


   
   ## CI report:
   
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4926)
 
   * 312dd2568ea14fe350bb365dfc83796a672fbe9f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006294261


   
   ## CI report:
   
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4926)
 
   * 312dd2568ea14fe350bb365dfc83796a672fbe9f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006272781


   
   ## CI report:
   
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4926)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006293124


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * d9381d0ab97632b4234c9c0ccc5be3b01192bafa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4924)
 
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006266018


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * d9381d0ab97632b4234c9c0ccc5be3b01192bafa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4924)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1006265895


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 5aab0ab800917602dd3c4c42b42d2b6130d859c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4824)
 
   * 5b9130b16d5931b0031bfc2c6fc051d03fa4f49b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1006286289


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 5b9130b16d5931b0031bfc2c6fc051d03fa4f49b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4519: [HUDI-3180] Include files from completed commits while bootstrapping metadata table

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#issuecomment-1006253880


   
   ## CI report:
   
   * a522d619ceddce3a0241b5363c4762d63a6f7354 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4922)
 
   * fba1437312207e135f0f7aef489b5b16f9fbe495 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4519: [HUDI-3180] Include files from completed commits while bootstrapping metadata table

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#issuecomment-1006278905


   
   ## CI report:
   
   * fba1437312207e135f0f7aef489b5b16f9fbe495 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006244617


   
   ## CI report:
   
   * 26b8cc233c6400127a65f40e3df63b84b51b900a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4923)
 
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4926)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006272781


   
   ## CI report:
   
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4926)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2915) Fix field not found in record error for spark-sql

2022-01-05 Thread Forward Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469665#comment-17469665
 ] 

Forward Xu commented on HUDI-2915:
--

This is the failure of not specifying precombinefield when executing update sql.

> Fix field not found in record error for spark-sql
> -
>
> Key: HUDI-2915
> URL: https://issues.apache.org/jira/browse/HUDI-2915
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-12-02-19-37-10-346.png
>
>
> !image-2021-12-02-19-37-10-346.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2833) Clean up unused archive files instead of expanding indefinitely

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2833:
--
Status: In Progress  (was: Open)

> Clean up unused archive files instead of expanding indefinitely
> ---
>
> Key: HUDI-2833
> URL: https://issues.apache.org/jira/browse/HUDI-2833
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:high
>
> As we know, most of storage do not support append action, so that hoodie will 
> create a new archive file under archived dictionary when archiving.
> As time goes by, there may be thousands of archive files, which most of them 
> is not useful anymore.
> Maybe it is meaningful to have a function to clean these unused archive files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3118) Add default HUDI_DIR in setupKafka.sh

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3118:
--
Status: In Progress  (was: Open)

> Add default HUDI_DIR in setupKafka.sh
> -
>
> Key: HUDI-3118
> URL: https://issues.apache.org/jira/browse/HUDI-3118
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: cdmikechen
>Priority: Major
>  Labels: pull-request-available
>
> Add default HUDI_DIR in setupKafka.sh when $HUDI_DIR is not set



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3096) fixed the bug that the cow table(contains decimalType) write by flink cannot be read by spark

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3096:
--
Status: In Progress  (was: Open)

> fixed the bug that  the cow table(contains decimalType) write by flink cannot 
> be read by spark
> --
>
> Key: HUDI-3096
> URL: https://issues.apache.org/jira/browse/HUDI-3096
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Affects Versions: 0.10.0
> Environment: flink  1.13.1
> spark 3.1.1
>Reporter: Tao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> now,  flink will write decimalType as byte[]
> when spark read that decimal Type, if spark find the precision of current 
> decimal is small spark treat it as int/long which caused the fllow error:
>  
> Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
> column cannot be converted in file 
> hdfs://x/tmp/hudi/hudi_x/46d44c57-aa43-41e2-a8aa-76dcc9dac7e4_0-4-0_20211221201230.parquet.
>  Column: [c7], Expected: decimal(10,4), Found: BINARY
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:517)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in…

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006239468


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 3a20dc12de8e145bd47a10ded50e5296f657de5c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4905)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4907)
 
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * d9381d0ab97632b4234c9c0ccc5be3b01192bafa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4924)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in…

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006266018


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * d9381d0ab97632b4234c9c0ccc5be3b01192bafa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4924)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3096) fixed the bug that the cow table(contains decimalType) write by flink cannot be read by spark

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3096:
--
Reviewers: Danny Chen

> fixed the bug that  the cow table(contains decimalType) write by flink cannot 
> be read by spark
> --
>
> Key: HUDI-3096
> URL: https://issues.apache.org/jira/browse/HUDI-3096
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Affects Versions: 0.10.0
> Environment: flink  1.13.1
> spark 3.1.1
>Reporter: Tao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> now,  flink will write decimalType as byte[]
> when spark read that decimal Type, if spark find the precision of current 
> decimal is small spark treat it as int/long which caused the fllow error:
>  
> Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
> column cannot be converted in file 
> hdfs://x/tmp/hudi/hudi_x/46d44c57-aa43-41e2-a8aa-76dcc9dac7e4_0-4-0_20211221201230.parquet.
>  Column: [c7], Expected: decimal(10,4), Found: BINARY
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:517)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2833) Clean up unused archive files instead of expanding indefinitely

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2833:
--
Reviewers: sivabalan narayanan

> Clean up unused archive files instead of expanding indefinitely
> ---
>
> Key: HUDI-2833
> URL: https://issues.apache.org/jira/browse/HUDI-2833
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:high
>
> As we know, most of storage do not support append action, so that hoodie will 
> create a new archive file under archived dictionary when archiving.
> As time goes by, there may be thousands of archive files, which most of them 
> is not useful anymore.
> Maybe it is meaningful to have a function to clean these unused archive files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3118) Add default HUDI_DIR in setupKafka.sh

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3118:
--
Reviewers: Y Ethan Guo

> Add default HUDI_DIR in setupKafka.sh
> -
>
> Key: HUDI-3118
> URL: https://issues.apache.org/jira/browse/HUDI-3118
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: cdmikechen
>Priority: Major
>  Labels: pull-request-available
>
> Add default HUDI_DIR in setupKafka.sh when $HUDI_DIR is not set



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1006265895


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 5aab0ab800917602dd3c4c42b42d2b6130d859c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4824)
 
   * 5b9130b16d5931b0031bfc2c6fc051d03fa4f49b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1006256702


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 5aab0ab800917602dd3c4c42b42d2b6130d859c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4824)
 
   * 5b9130b16d5931b0031bfc2c6fc051d03fa4f49b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3065) spark auto partition discovery does not work from 0.9.0

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3065:
--
Reviewers: Raymond Xu

> spark auto partition discovery does not work from 0.9.0
> ---
>
> Key: HUDI-3065
> URL: https://issues.apache.org/jira/browse/HUDI-3065
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Yann Byron
>Priority: Major
>  Labels: core-flow-ds, sev:critical, spark
> Fix For: 0.11.0, 0.10.1
>
>
> with 0.8.0, if partition is of the format  "/partitionKey=partitionValue", 
> Spark auto partition discovery will kick in. we can see explicit fields in 
> hudi's table schema. 
> But with 0.9.0, it does not happen. 
> // launch spark shell with 0.8.0 
> {code:java}
> import org.apache.hudi.QuickstartUtils._import 
> scala.collection.JavaConversions._import 
> org.apache.spark.sql.SaveMode._import 
> org.apache.hudi.DataSourceReadOptions._import 
> org.apache.hudi.DataSourceWriteOptions._import 
> org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"val basePath = 
> "file:///tmp/hudi_trips_cow"val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").  options(getQuickstartWriteConfigs).  
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").  option(RECORDKEY_FIELD_OPT_KEY, 
> "uuid").  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").  
> option(TABLE_NAME, tableName).  mode(Overwrite).  save(basePath)
> val tripsSnapshotDF = spark.
>         read.
>         format("hudi").
>         load(basePath)
> tripsSnapshotDF.printSchema {code}
> //output : check for continent, country, city in the end. 
> |– _hoodie_commit_time: string (nullable = true)|
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin_lon: double (nullable = true)
>  |-- driver: string (nullable = true)
>  |-- end_lat: double (nullable = true)
>  |-- end_lon: double (nullable = true)
>  |-- fare: double (nullable = true)
>  |-- partitionpath: string (nullable = true)
>  |-- rider: string (nullable = true)
>  |-- ts: long (nullable = true)
>  |-- uuid: string (nullable = true)
>  |-- continent: string (nullable = true)
>  |-- country: string (nullable = true)
>  |-- city: string (nullable = true)
>  
>  
> Lets run this with 0.9.0.
> {code:java}
> import org.apache.hudi.QuickstartUtils._import 
> scala.collection.JavaConversions._import 
> org.apache.spark.sql.SaveMode._import 
> org.apache.hudi.DataSourceReadOptions._import 
> org.apache.hudi.DataSourceWriteOptions._import 
> org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"val basePath = 
> "file:///tmp/hudi_trips_cow"val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").  options(getQuickstartWriteConfigs).  
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").  option(RECORDKEY_FIELD_OPT_KEY, 
> "uuid").  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").  
> option(TABLE_NAME, tableName).  mode(Overwrite).  save(basePath)
> val tripsSnapshotDF = spark.
>      |   read.
>      |   format("hudi").
>      |   load(basePath )
> tripsSnapshotDF.printSchema {code}
> /output: continent, country, city is missing. 
> root
>  |-- _hoodie_commit_time: string (nullable = true)
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin_lon: double (nullable = true)
>  |-- driver: string (nullable = true)
>  |-- end_lat: double (nullable = true)
>  |-- end_lon: double (nullable = true)
>  |-- fare: double (nullable = true)
>  |-- rider: string (nullable = true)
>  |-- ts: long (nullable = true)
>  |-- uuid: string (nullable = true)
>  |-- partitionpath: string (nullable = true)
>  
> Ref issue: [https://github.com/apache/hudi/issues/3984]
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3165) Enable In Process lock manager for all tests instead of FileSystemBasedTestlock

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3165:
--
Story Points: 0

> Enable In Process lock manager for all tests instead of 
> FileSystemBasedTestlock
> ---
>
> Key: HUDI-3165
> URL: https://issues.apache.org/jira/browse/HUDI-3165
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.1
>
>
> Enable In Process lock manager for all tests instead of 
> FileSystemBasedTestlock



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4417: [HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4417:
URL: https://github.com/apache/hudi/pull/4417#issuecomment-1006243396


   
   ## CI report:
   
   * 27f28703fb0e5f64421fdd08ff7d14db2ce79cb3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4918)
 
   * b83aa3a414f5f4fbf9d4b39f3d6682f15e1f464b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4925)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4417: [HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4417:
URL: https://github.com/apache/hudi/pull/4417#issuecomment-1006262250


   
   ## CI report:
   
   * b83aa3a414f5f4fbf9d4b39f3d6682f15e1f464b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4925)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-3026) HoodieAppendhandle may result in duplicate key for hbase index

2022-01-05 Thread ZiyueGuan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469656#comment-17469656
 ] 

ZiyueGuan commented on HUDI-3026:
-

Thanks for your kind explanation. I have few experience about hudi on flink. 
This problem may only occur w/ spark.

> HoodieAppendhandle may result in duplicate key for hbase index
> --
>
> Key: HUDI-3026
> URL: https://issues.apache.org/jira/browse/HUDI-3026
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: ZiyueGuan
>Assignee: ZiyueGuan
>Priority: Major
>  Labels: pull-request-available
>
> Problem: a same key may occur in two file group when Hbase index is used. 
> These two file group will have same FileID prefix. As Hbase index is global, 
> this is unexpected
> How to repro:
> We should have a table w/o record sorted in spark. Let's say we have five 
> records with key 1,2,3,4,5 to write. They may be iterated in different order. 
> In the first attempt 1, we write three records 5,4,3 to 
> fileID_1_log.1_attempt1. But this attempt failed. Spark will have a try in 
> the second task attempt (attempt 2), we write four records 1,2,3,4 to  
> fileID_1_log.1_attempt2. And then, we find this filegroup is large enough by 
> call canWrite. So hudi write record 5 to fileID_2_log.1_attempt2 and finish 
> this commit.
> When we do compaction, fileID_1_log.1_attempt1 and fileID_1_log.1_attempt2 
> will be compacted. And we finally got 543 + 1234 = 12345 in fileID_1 while we 
> also got 5 in fileID_2. Record 5 will appear in two fileGroup.
> Reason: Markerfile doesn't reconcile log file as code show in  
> [https://github.com/apache/hudi/blob/9a2030ab3190acf600ce4820be9a08929595763e/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java#L553.]
> And log file is actually not fail-safe.
> I'm not sure if [~danny0405] have found this problem too as I find 
> FlinkAppendHandle had been made to always return true. But it was just 
> changed back recently. 
> Solution:
> We may have a quick fix by making canWrite in HoodieAppendHandle always 
> return true. However, I think there may be a more elegant solution that we 
> use append result to generate compaction plan rather than list log file, in 
> which we will have a more granular control on log block instead of log file. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] waywtdcc commented on issue #4508: [SUPPORT]Duplicate Flink Hudi data

2022-01-05 Thread GitBox


waywtdcc commented on issue #4508:
URL: https://github.com/apache/hudi/issues/4508#issuecomment-1006260809


   Hello, have you repeated this question? Bootstrap did not read all data. 
What caused it? @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2780) Mor reads the log file and skips the complete block as a bad block, resulting in data loss

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2780:
--
Reviewers: Alexey Kudinkin

> Mor reads the log file and skips the complete block as a bad block, resulting 
> in data loss
> --
>
> Key: HUDI-2780
> URL: https://issues.apache.org/jira/browse/HUDI-2780
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: jing
>Assignee: jing
>Priority: Critical
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-11-17-15-45-33-031.png, 
> image-2021-11-17-15-46-04-313.png, image-2021-11-17-15-46-14-694.png
>
>
> Check the data in the middle of the bad block through debug, and find that 
> the lost data is in the offset of the bad block, but because of the eof skip 
> during the reading, the compact merge cannot be written to the parquet at 
> that time, but the deltacommit of the time is successful. There are two 
> consecutive hudi magic in the middle of the bad block. Reading blocksize in 
> the next digit actually reads the binary conversion of #HUDI# to 1227030528, 
> which means that the eof exception is reported when the file size is exceeded.
> !image-2021-11-17-15-45-33-031.png!
> Detect the position of the next block and skip the bad block. It should not 
> start from the position after reading the blocksize, but from the position 
> before reading the blocksize
> !image-2021-11-17-15-46-04-313.png!
> !image-2021-11-17-15-46-14-694.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3026) HoodieAppendhandle may result in duplicate key for hbase index

2022-01-05 Thread ZiyueGuan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZiyueGuan updated HUDI-3026:

Description: 
Problem: a same key may occur in two file group when Hbase index is used. These 
two file group will have same FileID prefix. As Hbase index is global, this is 
unexpected

How to repro:

We should have a table w/o record sorted in spark. Let's say we have five 
records with key 1,2,3,4,5 to write. They may be iterated in different order. 

In the first attempt 1, we write three records 5,4,3 to 
fileID_1_log.1_attempt1. But this attempt failed. Spark will have a try in the 
second task attempt (attempt 2), we write four records 1,2,3,4 to  
fileID_1_log.1_attempt2. And then, we find this filegroup is large enough by 
call canWrite. So hudi write record 5 to fileID_2_log.1_attempt2 and finish 
this commit.

When we do compaction, fileID_1_log.1_attempt1 and fileID_1_log.1_attempt2 will 
be compacted. And we finally got 543 + 1234 = 12345 in fileID_1 while we also 
got 5 in fileID_2. Record 5 will appear in two fileGroup.

Reason: Markerfile doesn't reconcile log file as code show in  
[https://github.com/apache/hudi/blob/9a2030ab3190acf600ce4820be9a08929595763e/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java#L553.]

And log file is actually not fail-safe.

I'm not sure if [~danny0405] have found this problem too as I find 
FlinkAppendHandle had been made to always return true. But it was just changed 
back recently. 

Solution:

We may have a quick fix by making canWrite in HoodieAppendHandle always return 
true. However, I think there may be a more elegant solution that we use append 
result to generate compaction plan rather than list log file, in which we will 
have a more granular control on log block instead of log file. 

  was:
Problem: a same key may occur in two file group when Hbase index is used. These 
two file group will have same FileID prefix. As Hbase index is global, this is 
unexpected

How to repro:

We should have a table w/o record sorted in spark. Let's say we have 1,2,3,4,5 
records to write. They may be iterated in different order. 

In the first attempt 1, we write 543 to fileID_1_log.1_attempt1. But this 
attempt failed. Spark will have a try in the second task attempt (attempt 2), 
we write 1234 to  fileID_1_log.1_attempt2. And then, we find this filegroup is 
large enough by call canWrite. So hudi write record 5 to 
fileID_2_log.1_attempt2 and finish this commit.

When we do compaction, fileID_1_log.1_attempt1 and fileID_1_log.1_attempt2 will 
be compacted. And we finally got 543 + 1234 = 12345 in fileID_1 while we also 
got 5 in fileID_2. Record 5 will appear in two fileGroup.

Reason: Markerfile doesn't reconcile log file as code show in  
[https://github.com/apache/hudi/blob/9a2030ab3190acf600ce4820be9a08929595763e/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java#L553.]

And log file is actually not fail-safe.

I'm not sure if [~danny0405] have found this problem too as I find 
FlinkAppendHandle had been made to always return true. But it was just changed 
back recently. 

Solution:

We may have a quick fix by making canWrite in HoodieAppendHandle always return 
true. However, I think there may be a more elegant solution that we use append 
result to generate compaction plan rather than list log file, in which we will 
have a more granular control on log block instead of log file. 


> HoodieAppendhandle may result in duplicate key for hbase index
> --
>
> Key: HUDI-3026
> URL: https://issues.apache.org/jira/browse/HUDI-3026
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: ZiyueGuan
>Assignee: ZiyueGuan
>Priority: Major
>  Labels: pull-request-available
>
> Problem: a same key may occur in two file group when Hbase index is used. 
> These two file group will have same FileID prefix. As Hbase index is global, 
> this is unexpected
> How to repro:
> We should have a table w/o record sorted in spark. Let's say we have five 
> records with key 1,2,3,4,5 to write. They may be iterated in different order. 
> In the first attempt 1, we write three records 5,4,3 to 
> fileID_1_log.1_attempt1. But this attempt failed. Spark will have a try in 
> the second task attempt (attempt 2), we write four records 1,2,3,4 to  
> fileID_1_log.1_attempt2. And then, we find this filegroup is large enough by 
> call canWrite. So hudi write record 5 to fileID_2_log.1_attempt2 and finish 
> this commit.
> When we do compaction, fileID_1_log.1_attempt1 and fileID_1_log.1_attempt2 
> will be compacted. And we finally got 543 + 1234 = 12345 in fileID_1 while we 
> also got 5 in fileID_2. Record 5 will appear in two fileGroup.
> Reason: Markerfile doesn't reconcile log file 

[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1006256702


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 5aab0ab800917602dd3c4c42b42d2b6130d859c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4824)
 
   * 5b9130b16d5931b0031bfc2c6fc051d03fa4f49b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1003288028


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 5aab0ab800917602dd3c4c42b42d2b6130d859c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4824)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-3132) Minor fixes for HoodieCatalog

2022-01-05 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469650#comment-17469650
 ] 

Danny Chen commented on HUDI-3132:
--

This is a small fix, move it out of 0.10.1

> Minor fixes for HoodieCatalog
> -
>
> Key: HUDI-3132
> URL: https://issues.apache.org/jira/browse/HUDI-3132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: dalongliu
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-3132) Minor fixes for HoodieCatalog

2022-01-05 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-3132.
--

> Minor fixes for HoodieCatalog
> -
>
> Key: HUDI-3132
> URL: https://issues.apache.org/jira/browse/HUDI-3132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: dalongliu
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3132) Minor fixes for HoodieCatalog

2022-01-05 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-3132:
-
Fix Version/s: (was: 0.10.1)

> Minor fixes for HoodieCatalog
> -
>
> Key: HUDI-3132
> URL: https://issues.apache.org/jira/browse/HUDI-3132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: dalongliu
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-3132) Minor fixes for HoodieCatalog

2022-01-05 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469649#comment-17469649
 ] 

Danny Chen commented on HUDI-3132:
--

Fixed via master branch: 205e48f53f24086419c01dd8652748347ecd295f

> Minor fixes for HoodieCatalog
> -
>
> Key: HUDI-3132
> URL: https://issues.apache.org/jira/browse/HUDI-3132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: dalongliu
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4519: [HUDI-3180] Include files from completed commits while bootstrapping metadata table

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#issuecomment-1006252388


   
   ## CI report:
   
   * a522d619ceddce3a0241b5363c4762d63a6f7354 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4922)
 
   * fba1437312207e135f0f7aef489b5b16f9fbe495 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4519: [HUDI-3180] Include files from completed commits while bootstrapping metadata table

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#issuecomment-1006253880


   
   ## CI report:
   
   * a522d619ceddce3a0241b5363c4762d63a6f7354 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4922)
 
   * fba1437312207e135f0f7aef489b5b16f9fbe495 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (eee715b -> 205e48f)

2022-01-05 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from eee715b  [HUDI-3168] Fixing null schema with empty commit in 
incremental relation (#4513)
 add 205e48f  [HUDI-3132] Minor fixes for HoodieCatalog

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java   | 4 +++-
 .../test/java/org/apache/hudi/table/catalog/TestHoodieCatalog.java   | 5 +
 2 files changed, 8 insertions(+), 1 deletion(-)


[GitHub] [hudi] yanghua commented on a change in pull request #4483: [HUDI-2370] [TEST] Parquet Encryption

2022-01-05 Thread GitBox


yanghua commented on a change in pull request #4483:
URL: https://github.com/apache/hudi/pull/4483#discussion_r779274659



##
File path: 
hudi-spark-datasource/hudi-spark3/src/main/java/org/apache/hudi/spark3/crypto/kms/InMemoryKMS.java
##
@@ -0,0 +1,103 @@
+package org.apache.hudi.spark3.crypto.kms;

Review comment:
   Please make sure the License header is placed in the right place?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 closed pull request #4486: [HUDI-3132] Minor fixes for HoodieCatalog

2022-01-05 Thread GitBox


danny0405 closed pull request #4486:
URL: https://github.com/apache/hudi/pull/4486


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4519: [HUDI-3180] Include files from completed commits while bootstrapping metadata table

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#issuecomment-1006220286


   
   ## CI report:
   
   * a522d619ceddce3a0241b5363c4762d63a6f7354 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4922)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4519: [HUDI-3180] Include files from completed commits while bootstrapping metadata table

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#issuecomment-1006252388


   
   ## CI report:
   
   * a522d619ceddce3a0241b5363c4762d63a6f7354 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4922)
 
   * fba1437312207e135f0f7aef489b5b16f9fbe495 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2833) Clean up unused archive files instead of expanding indefinitely

2022-01-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2833:
--
Sprint: Hudi 0.10.1 -  2021/01/03

> Clean up unused archive files instead of expanding indefinitely
> ---
>
> Key: HUDI-2833
> URL: https://issues.apache.org/jira/browse/HUDI-2833
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:high
>
> As we know, most of storage do not support append action, so that hoodie will 
> create a new archive file under archived dictionary when archiving.
> As time goes by, there may be thousands of archive files, which most of them 
> is not useful anymore.
> Maybe it is meaningful to have a function to clean these unused archive files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan commented on a change in pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.

2022-01-05 Thread GitBox


nsivabalan commented on a change in pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#discussion_r779271122



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##
@@ -134,12 +161,199 @@ public boolean archiveIfRequired(HoodieEngineContext 
context) throws IOException
 LOG.info("No Instants to archive");
   }
 
+  if (config.getArchiveAutoMergeEnable()) {
+mergeArchiveFilesIfNecessary(context);
+  }
   return success;
 } finally {
   close();
 }
   }
 
+  private void mergeArchiveFilesIfNecessary(HoodieEngineContext context) 
throws IOException {
+Path planPath = new Path(metaClient.getArchivePath(), 
mergeArchivePlanName);
+// Flush reminded content if existed and open a new write
+reOpenWriter();
+// List all archive files
+FileStatus[] fsStatuses = metaClient.getFs().globStatus(
+new Path(metaClient.getArchivePath() + "/.commits_.archive*"));
+List mergeCandidate = new ArrayList<>();
+int archiveFilesCompactBatch = config.getArchiveFilesMergeBatchSize();
+long smallFileLimitBytes = config.getArchiveMergeSmallFileLimitBytes();
+
+for (FileStatus fs: fsStatuses) {
+  if (fs.getLen() < smallFileLimitBytes) {
+mergeCandidate.add(fs);
+  }
+  if (mergeCandidate.size() >= archiveFilesCompactBatch) {
+break;
+  }
+}
+
+if (mergeCandidate.size() >= archiveFilesCompactBatch) {
+  List candidateFiles = mergeCandidate.stream().map(fs -> 
fs.getPath().toString()).collect(Collectors.toList());
+  // before merge archive files build merge plan
+  String logFileName = computeLogFileName();
+  buildArchiveMergePlan(candidateFiles, planPath, logFileName);
+  // merge archive files
+  mergeArchiveFiles(mergeCandidate);
+  // after merge, delete the small archive files.
+  deleteFilesParallelize(metaClient, candidateFiles, context, true);
+  // finally, delete archiveMergePlan which means merge small archive 
files operatin is succeed.
+  metaClient.getFs().delete(planPath, false);
+}
+  }
+
+  /**
+   * Get final written archive file name based on storageSchemes support 
append or not.
+   */
+  private String computeLogFileName() throws IOException {
+if (!StorageSchemes.isAppendSupported(metaClient.getFs().getScheme())) {
+  String logWriteToken = writer.getLogFile().getLogWriteToken();
+  HoodieLogFile hoodieLogFile = 
writer.getLogFile().rollOver(metaClient.getFs(), logWriteToken);
+  return hoodieLogFile.getFileName();
+} else {
+  return writer.getLogFile().getFileName();

Review comment:
   @vinothchandar @yihua : Do you folks any thoughts here. This is the only 
pending item we need to resolve. rest of the patch looks fine to me. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006244617


   
   ## CI report:
   
   * 26b8cc233c6400127a65f40e3df63b84b51b900a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4923)
 
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4926)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.

2022-01-05 Thread GitBox


nsivabalan commented on a change in pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#discussion_r779269087



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##
@@ -134,12 +161,199 @@ public boolean archiveIfRequired(HoodieEngineContext 
context) throws IOException
 LOG.info("No Instants to archive");
   }
 
+  if (config.getArchiveAutoMergeEnable()) {
+mergeArchiveFilesIfNecessary(context);
+  }
   return success;
 } finally {
   close();
 }
   }
 
+  private void mergeArchiveFilesIfNecessary(HoodieEngineContext context) 
throws IOException {
+Path planPath = new Path(metaClient.getArchivePath(), 
mergeArchivePlanName);
+// Flush reminded content if existed and open a new write
+reOpenWriter();
+// List all archive files
+FileStatus[] fsStatuses = metaClient.getFs().globStatus(

Review comment:
   yes, sounds good to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006243558


   
   ## CI report:
   
   * 26b8cc233c6400127a65f40e3df63b84b51b900a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4923)
 
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot commented on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006243558


   
   ## CI report:
   
   * 26b8cc233c6400127a65f40e3df63b84b51b900a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4923)
 
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4520: [HUDI-3179][Stacked on 4417] Extracted common `AbstractHoodieTableFileIndex` to be shared across engines

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4520:
URL: https://github.com/apache/hudi/pull/4520#issuecomment-1006242403


   
   ## CI report:
   
   * 26b8cc233c6400127a65f40e3df63b84b51b900a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4923)
 
   * dfcae119aabb9aa43a98de5f868984f84ebe7c2f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4417: [HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication

2022-01-05 Thread GitBox


hudi-bot removed a comment on pull request #4417:
URL: https://github.com/apache/hudi/pull/4417#issuecomment-1006242303


   
   ## CI report:
   
   * 27f28703fb0e5f64421fdd08ff7d14db2ce79cb3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4918)
 
   * b83aa3a414f5f4fbf9d4b39f3d6682f15e1f464b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   >