[GitHub] [hudi] hudi-bot commented on pull request #7551: [HUDI-5446] Record level index write support Part2

2022-12-23 Thread GitBox


hudi-bot commented on PR #7551:
URL: https://github.com/apache/hudi/pull/7551#issuecomment-1364481106

   
   ## CI report:
   
   * 2f7a7f635afdfadf5c1165df3edd4ca5ecab499e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13946)
 
   * 34b0d6408a6885f11b5aa0840d8cbbf4d9b863c6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13956)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled (#7480)

2022-12-23 Thread mengtao
This is an automated email from the ASF dual-hosted git repository.

mengtao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 64b814ea23 [HUDI-5400] Fix read issues when Hudi-FULL schema evolution 
is not enabled (#7480)
64b814ea23 is described below

commit 64b814ea237bd1576af3673d04c7bb965218fdef
Author: voonhous 
AuthorDate: Sat Dec 24 15:41:59 2022 +0800

[HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled 
(#7480)
---
 .../parquet/HoodieParquetFileFormatHelper.scala|  72 ++
 .../hudi/TestAvroSchemaResolutionSupport.scala | 794 +
 ...Spark24HoodieVectorizedParquetRecordReader.java | 185 +
 .../parquet/Spark24HoodieParquetFileFormat.scala   |  62 +-
 .../parquet/Spark31HoodieParquetFileFormat.scala   |  12 +-
 .../Spark32PlusHoodieParquetFileFormat.scala   |  10 +-
 6 files changed, 1116 insertions(+), 19 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieParquetFileFormatHelper.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieParquetFileFormatHelper.scala
new file mode 100644
index 00..ce1a719cb9
--- /dev/null
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieParquetFileFormatHelper.scala
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.parquet.hadoop.metadata.FileMetaData
+import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructField, 
StructType}
+
+object HoodieParquetFileFormatHelper {
+
+  def buildImplicitSchemaChangeInfo(hadoopConf: Configuration,
+parquetFileMetaData: FileMetaData,
+requiredSchema: StructType): 
(java.util.Map[Integer, org.apache.hudi.common.util.collection.Pair[DataType, 
DataType]], StructType) = {
+val implicitTypeChangeInfo: java.util.Map[Integer, 
org.apache.hudi.common.util.collection.Pair[DataType, DataType]] = new 
java.util.HashMap()
+val convert = new ParquetToSparkSchemaConverter(hadoopConf)
+val fileStruct = convert.convert(parquetFileMetaData.getSchema)
+val fileStructMap = fileStruct.fields.map(f => (f.name, f.dataType)).toMap
+val sparkRequestStructFields = requiredSchema.map(f => {
+  val requiredType = f.dataType
+  if (fileStructMap.contains(f.name) && !isDataTypeEqual(requiredType, 
fileStructMap(f.name))) {
+implicitTypeChangeInfo.put(new 
Integer(requiredSchema.fieldIndex(f.name)), 
org.apache.hudi.common.util.collection.Pair.of(requiredType, 
fileStructMap(f.name)))
+StructField(f.name, fileStructMap(f.name), f.nullable)
+  } else {
+f
+  }
+})
+(implicitTypeChangeInfo, StructType(sparkRequestStructFields))
+  }
+
+  def isDataTypeEqual(requiredType: DataType, fileType: DataType): Boolean = 
(requiredType, fileType) match {
+case (requiredType, fileType) if requiredType == fileType => true
+
+case (ArrayType(rt, _), ArrayType(ft, _)) =>
+  // Do not care about nullability as schema evolution require fields to 
be nullable
+  isDataTypeEqual(rt, ft)
+
+case (MapType(requiredKey, requiredValue, _), MapType(fileKey, fileValue, 
_)) =>
+  // Likewise, do not care about nullability as schema evolution require 
fields to be nullable
+  isDataTypeEqual(requiredKey, fileKey) && isDataTypeEqual(requiredValue, 
fileValue)
+
+case (StructType(requiredFields), StructType(fileFields)) =>
+  // Find fields that are in requiredFields and fileFields as they might 
not be the same during add column + change column operations
+  val commonFieldNames = requiredFields.map(_.name) intersect 
fileFields.map(_.name)
+
+  // Need to match by name instead of StructField as name will stay the 
same whilst type may change
+  val fileFilteredFields = fileFields.filter(f 

[GitHub] [hudi] xiarixiaoyao merged pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-23 Thread GitBox


xiarixiaoyao merged PR #7480:
URL: https://github.com/apache/hudi/pull/7480


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-23 Thread GitBox


xiarixiaoyao commented on PR #7480:
URL: https://github.com/apache/hudi/pull/7480#issuecomment-1364480552

   @voonhous 
   Thanks for your contribution


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7551: [HUDI-5446] Record level index write support Part2

2022-12-23 Thread GitBox


hudi-bot commented on PR #7551:
URL: https://github.com/apache/hudi/pull/7551#issuecomment-1364480327

   
   ## CI report:
   
   * 2f7a7f635afdfadf5c1165df3edd4ca5ecab499e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13946)
 
   * 34b0d6408a6885f11b5aa0840d8cbbf4d9b863c6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2022-12-23 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1364479440

   
   ## CI report:
   
   * 11e6334804d1a20f915965a34138a5b5e6e44adf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13951)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7543: [HUDI-5401] Ensure user-provided hive metastore uri is set in HiveConf if not already set

2022-12-23 Thread GitBox


hudi-bot commented on PR #7543:
URL: https://github.com/apache/hudi/pull/7543#issuecomment-1364470731

   
   ## CI report:
   
   * 2a45845fabc82807e1d65aecdf7728b4968aa131 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13934)
 
   * bcad79c521981691cb46362463ad53ed5babe546 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13955)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7543: [HUDI-5401] Ensure user-provided hive metastore uri is set in HiveConf if not already set

2022-12-23 Thread GitBox


hudi-bot commented on PR #7543:
URL: https://github.com/apache/hudi/pull/7543#issuecomment-1364470128

   
   ## CI report:
   
   * 2a45845fabc82807e1d65aecdf7728b4968aa131 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13934)
 
   * bcad79c521981691cb46362463ad53ed5babe546 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7552: [HUDI-5446] Adding write support for Record level index Part1

2022-12-23 Thread GitBox


hudi-bot commented on PR #7552:
URL: https://github.com/apache/hudi/pull/7552#issuecomment-1364461301

   
   ## CI report:
   
   * 80f587051221a0d10b5cc0e041855f83dadca90c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13950)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7553: [HUDI-5408] Rollback partially failed commits in MDT in all cases

2022-12-23 Thread GitBox


hudi-bot commented on PR #7553:
URL: https://github.com/apache/hudi/pull/7553#issuecomment-1364461306

   
   ## CI report:
   
   * 8f5b05486bad7c6e9f70d560d6b5560094973f50 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13954)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7544: [HUDI-5433] Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread GitBox


hudi-bot commented on PR #7544:
URL: https://github.com/apache/hudi/pull/7544#issuecomment-1364461295

   
   ## CI report:
   
   * 6bcf99cee270008dd2248a74373b8ae8a890c50c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13941)
 
   * bef2f0cb7f3d76ecc27b7f294b61d7b540b79e24 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13953)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7527: [HUDI-5411] Avoid virtual key info for COW table in the input format

2022-12-23 Thread GitBox


hudi-bot commented on PR #7527:
URL: https://github.com/apache/hudi/pull/7527#issuecomment-1364461285

   
   ## CI report:
   
   * f25fdb45646c869c63ee1d0ace6f544e079a2c01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13936)
 
   * bd719350c9e2cf6dfa396e81f9a276325eff827e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13952)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2022-12-23 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1364461139

   
   ## CI report:
   
   * 337a8df71e2d48bccd7ff803855be60fee30f971 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13922)
 
   * 11e6334804d1a20f915965a34138a5b5e6e44adf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13951)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7553: [HUDI-5408] Rollback partially failed commits in MDT in all cases

2022-12-23 Thread GitBox


hudi-bot commented on PR #7553:
URL: https://github.com/apache/hudi/pull/7553#issuecomment-1364460774

   
   ## CI report:
   
   * 8f5b05486bad7c6e9f70d560d6b5560094973f50 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7552: [HUDI-5446] Adding write support for Record level index Part1

2022-12-23 Thread GitBox


hudi-bot commented on PR #7552:
URL: https://github.com/apache/hudi/pull/7552#issuecomment-1364460770

   
   ## CI report:
   
   * 0e380082bf8c95c464a7bbb991e9dcec37d0c7a0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13947)
 
   * 80f587051221a0d10b5cc0e041855f83dadca90c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13950)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7544: [HUDI-5433] Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread GitBox


hudi-bot commented on PR #7544:
URL: https://github.com/apache/hudi/pull/7544#issuecomment-1364460763

   
   ## CI report:
   
   * 6bcf99cee270008dd2248a74373b8ae8a890c50c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13941)
 
   * bef2f0cb7f3d76ecc27b7f294b61d7b540b79e24 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7527: [HUDI-5411] Avoid virtual key info for COW table in the input format

2022-12-23 Thread GitBox


hudi-bot commented on PR #7527:
URL: https://github.com/apache/hudi/pull/7527#issuecomment-1364460755

   
   ## CI report:
   
   * f25fdb45646c869c63ee1d0ace6f544e079a2c01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13936)
 
   * bd719350c9e2cf6dfa396e81f9a276325eff827e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2022-12-23 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1364460586

   
   ## CI report:
   
   * 337a8df71e2d48bccd7ff803855be60fee30f971 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13922)
 
   * 11e6334804d1a20f915965a34138a5b5e6e44adf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7550: minor edit to deploy scripts to avoid running tests as part of bundle uploads

2022-12-23 Thread GitBox


hudi-bot commented on PR #7550:
URL: https://github.com/apache/hudi/pull/7550#issuecomment-1364459852

   
   ## CI report:
   
   * 144c9ef6e5f5b1ee2f0bbdd8a7684a7cdb52a352 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13945)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope opened a new pull request, #7553: [HUDI-5408] Rollback partially failed commits in MDT in all cases

2022-12-23 Thread GitBox


codope opened a new pull request, #7553:
URL: https://github.com/apache/hudi/pull/7553

   ### Change Logs
   
   Fixes HUDI-5408 by doing rollback partially failed commits in metadata table 
in all cases.
   This is still WIP. Please do not merge.
   
   ### Impact
   
   Impacts metadata table based file system view.
   
   ### Risk level (write none, low medium or high below)
   
   high
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2022-12-23 Thread GitBox


alexeykudinkin commented on code in PR #6361:
URL: https://github.com/apache/hudi/pull/6361#discussion_r1055978201


##
hudi-common/src/main/java/org/apache/hudi/internal/schema/action/TableChange.java:
##
@@ -83,10 +83,16 @@ abstract class BaseColumnChange implements TableChange {
 protected final InternalSchema internalSchema;
 protected final Map id2parent;
 protected final Map> 
positionChangeMap = new HashMap<>();
+protected final boolean caseSensitive;
 
 BaseColumnChange(InternalSchema schema) {
+  this(schema, false);

Review Comment:
   This change is necessary properly handle cases of different case-sensitivity 
modes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: Revert "[HUDI-5409] Avoid file index and use fs view cache in COW input format (#7493)" (#7526)

2022-12-23 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2567ada6d5 Revert "[HUDI-5409] Avoid file index and use fs view cache 
in COW input format (#7493)" (#7526)
2567ada6d5 is described below

commit 2567ada6d5654bf8463cb55e25f5d662aa5a8475
Author: Sagar Sumit 
AuthorDate: Sat Dec 24 09:06:49 2022 +0530

Revert "[HUDI-5409] Avoid file index and use fs view cache in COW input 
format (#7493)" (#7526)

This reverts commit cc1c1e7b33d9c95e5a2ba0e9a1db428d1e1b2a00.
---
 .../hudi/execution/TestDisruptorMessageQueue.java  |   4 +-
 .../hadoop/HoodieCopyOnWriteTableInputFormat.java  | 144 +++--
 .../HoodieMergeOnReadTableInputFormat.java |  30 ++---
 .../hudi/hadoop/utils/HoodieInputFormatUtils.java  |   2 +-
 4 files changed, 61 insertions(+), 119 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/execution/TestDisruptorMessageQueue.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/execution/TestDisruptorMessageQueue.java
index 7d324e5296..76c22f96e7 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/execution/TestDisruptorMessageQueue.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/execution/TestDisruptorMessageQueue.java
@@ -39,7 +39,6 @@ import org.apache.spark.TaskContext;
 import org.apache.spark.TaskContext$;
 import org.junit.jupiter.api.AfterEach;
 import org.junit.jupiter.api.BeforeEach;
-import org.junit.jupiter.api.Disabled;
 import org.junit.jupiter.api.Test;
 import org.junit.jupiter.api.Timeout;
 import scala.Tuple2;
@@ -86,11 +85,10 @@ public class TestDisruptorMessageQueue extends 
HoodieClientTestHarness {
 
   // Test to ensure that we are reading all records from queue iterator in the 
same order
   // without any exceptions.
-  @Disabled("Disabled for unblocking 0.12.2 release. Disruptor queue is not 
part of this minor release. Tracked in HUDI-5410")
   @SuppressWarnings("unchecked")
   @Test
   @Timeout(value = 60)
-  public void testRecordReading() {
+  public void testRecordReading() throws Exception {
 
 final List hoodieRecords = 
dataGen.generateInserts(instantTime, 100);
 ArrayList beforeRecord = new ArrayList<>();
diff --git 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
index ce441bf2e2..140e7ff5b6 100644
--- 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
+++ 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
@@ -18,9 +18,21 @@
 
 package org.apache.hudi.hadoop;
 
+import org.apache.avro.Schema;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.FileInputFormat;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.InputSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hadoop.mapreduce.Job;
 import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.common.engine.HoodieLocalEngineContext;
-import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.model.FileSlice;
 import org.apache.hudi.common.model.HoodieBaseFile;
 import org.apache.hudi.common.model.HoodieLogFile;
@@ -30,8 +42,7 @@ import org.apache.hudi.common.table.HoodieTableMetaClient;
 import org.apache.hudi.common.table.TableSchemaResolver;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
-import org.apache.hudi.common.table.view.FileSystemViewManager;
-import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.CollectionUtils;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.StringUtils;
 import org.apache.hudi.exception.HoodieException;
@@ -39,42 +50,21 @@ import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.hadoop.realtime.HoodieVirtualKeyInfo;
 import org.apache.hudi.hadoop.utils.HoodieHiveUtils;
 import org.apache.hudi.hadoop.utils.HoodieInputFormatUtils;
-import org.apache.hudi.metadata.HoodieTableMetadataUtil;
-
-import org.apache.avro.Schema;
-import org.apache.hadoop.fs.FileStatus;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.io.ArrayWritable;
-import org.apache.hadoop.io.NullWritable;
-import org.apache.hadoop.mapred.FileInputFormat;
-import org.apache.hadoop.mapred.FileSplit;

[GitHub] [hudi] codope merged pull request #7526: Revert "[HUDI-5409] Avoid file index and use fs view cache in COW input format (#7493)"

2022-12-23 Thread GitBox


codope merged PR #7526:
URL: https://github.com/apache/hudi/pull/7526


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on pull request #7526: Revert "[HUDI-5409] Avoid file index and use fs view cache in COW input format (#7493)"

2022-12-23 Thread GitBox


codope commented on PR #7526:
URL: https://github.com/apache/hudi/pull/7526#issuecomment-1364452092

   CI Job is crashing midway here and in other PRs, unrelated to the code 
changes.
   ```
   [ERROR] The forked VM terminated without properly saying goodbye. VM crash 
or System.exit called?
   [ERROR] Command was /bin/sh -c cd /home/vsts/work/1/s/hudi-utilities && 
/usr/lib/jvm/temurin-8-jdk-amd64/jre/bin/java -Xmx2g 
org.apache.maven.surefire.booter.ForkedBooter 
/home/vsts/work/1/s/hudi-utilities/target/surefire 
2022-12-23T15-05-05_751-jvmRun1 surefire4377487798425727200tmp 
surefire_85379564389384658807tmp
   [ERROR] Error occurred in starting fork, check output in log
   [ERROR] Process Exit Code: 255
   [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The 
forked VM terminated without properly saying goodbye. VM crash or System.exit 
called?
   [ERROR] Command was /bin/sh -c cd /home/vsts/work/1/s/hudi-utilities && 
/usr/lib/jvm/temurin-8-jdk-amd64/jre/bin/java -Xmx2g 
org.apache.maven.surefire.booter.ForkedBooter 
/home/vsts/work/1/s/hudi-utilities/target/surefire 
2022-12-23T15-05-05_751-jvmRun1 surefire4377487798425727200tmp 
surefire_85379564389384658807tmp
   [ERROR] Error occurred in starting fork, check output in log
   [ERROR] Process Exit Code: 255
   [ERROR]  at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
   [ERROR]  at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:282)
   [ERROR]  at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:245)
   [ERROR]  at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
   [ERROR]  at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7552: [HUDI-5446] Adding write support for Record level index Part1

2022-12-23 Thread GitBox


hudi-bot commented on PR #7552:
URL: https://github.com/apache/hudi/pull/7552#issuecomment-1364451471

   
   ## CI report:
   
   * 0e380082bf8c95c464a7bbb991e9dcec37d0c7a0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13947)
 
   * 80f587051221a0d10b5cc0e041855f83dadca90c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7336: [HUDI-5297][HUDI-5298] Refactoring WriteStatus

2022-12-23 Thread GitBox


hudi-bot commented on PR #7336:
URL: https://github.com/apache/hudi/pull/7336#issuecomment-1364451388

   
   ## CI report:
   
   * 39c20cdca85c676fc3919fdec68bb53f720c1f25 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13948)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7552: [HUDI-5446] Adding write support for Record level index Part1

2022-12-23 Thread GitBox


hudi-bot commented on PR #7552:
URL: https://github.com/apache/hudi/pull/7552#issuecomment-1364450622

   
   ## CI report:
   
   * 0e380082bf8c95c464a7bbb991e9dcec37d0c7a0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13947)
 
   * 80f587051221a0d10b5cc0e041855f83dadca90c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7551: [HUDI-5446] Record level index write support Part2

2022-12-23 Thread GitBox


hudi-bot commented on PR #7551:
URL: https://github.com/apache/hudi/pull/7551#issuecomment-1364450618

   
   ## CI report:
   
   * 2f7a7f635afdfadf5c1165df3edd4ca5ecab499e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13946)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7336: [HUDI-5297][HUDI-5298] Refactoring WriteStatus

2022-12-23 Thread GitBox


hudi-bot commented on PR #7336:
URL: https://github.com/apache/hudi/pull/7336#issuecomment-1364450561

   
   ## CI report:
   
   * e59e6a4a5b05ba630659f62abe7074f998b9271f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13337)
 
   * 39c20cdca85c676fc3919fdec68bb53f720c1f25 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7552: [HUDI-5446] Adding write support for Record level index Part1

2022-12-23 Thread GitBox


hudi-bot commented on PR #7552:
URL: https://github.com/apache/hudi/pull/7552#issuecomment-1364449972

   
   ## CI report:
   
   * 0e380082bf8c95c464a7bbb991e9dcec37d0c7a0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7551: [HUDI-5446] Record level index write support Part2

2022-12-23 Thread GitBox


hudi-bot commented on PR #7551:
URL: https://github.com/apache/hudi/pull/7551#issuecomment-1364449966

   
   ## CI report:
   
   * 2f7a7f635afdfadf5c1165df3edd4ca5ecab499e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7550: minor edit to deploy scripts to avoid running tests as part of bundle uploads

2022-12-23 Thread GitBox


hudi-bot commented on PR #7550:
URL: https://github.com/apache/hudi/pull/7550#issuecomment-1364449955

   
   ## CI report:
   
   * 144c9ef6e5f5b1ee2f0bbdd8a7684a7cdb52a352 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13945)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #7552: [HUDI-5446] Adding write support for Record level index Part1

2022-12-23 Thread GitBox


nsivabalan opened a new pull request, #7552:
URL: https://github.com/apache/hudi/pull/7552

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7550: minor edit to deploy scripts to avoid running tests as part of bundle uploads

2022-12-23 Thread GitBox


hudi-bot commented on PR #7550:
URL: https://github.com/apache/hudi/pull/7550#issuecomment-1364439224

   
   ## CI report:
   
   * 144c9ef6e5f5b1ee2f0bbdd8a7684a7cdb52a352 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5446) Add support to write record level index to MDT

2022-12-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5446:
-
Labels: pull-request-available  (was: )

> Add support to write record level index to MDT
> --
>
> Key: HUDI-5446
> URL: https://issues.apache.org/jira/browse/HUDI-5446
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Add support to write our record level index partition to MDT



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] nsivabalan opened a new pull request, #7551: [HUDI-5446] Record level index write support

2022-12-23 Thread GitBox


nsivabalan opened a new pull request, #7551:
URL: https://github.com/apache/hudi/pull/7551

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] satishkotha opened a new pull request, #7550: minor edit to deploy scripts to avoid running tests as part of bundle uploads

2022-12-23 Thread GitBox


satishkotha opened a new pull request, #7550:
URL: https://github.com/apache/hudi/pull/7550

   ### Change Logs
   
   minor edit to deploy scripts to avoid running tests as part of bundle uploads
   
   ### Impact
   
   Deploy is lot simpler. Tests are run already before deploy steps
   
   ### Risk level (write none, low medium or high below)
   
   Low 
   
   ### Documentation Update
   
   None
   ### Contributor's checklist
   
   - [X ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ X] Change Logs and Impact were stated clearly
   - [ X] Adequate tests were added if applicable
   - [ X] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy opened a new pull request, #7549: [DOCS] improve spark quickstart, add info about table maintenance and async …

2022-12-23 Thread GitBox


kazdy opened a new pull request, #7549:
URL: https://github.com/apache/hudi/pull/7549

   …services when metadata table is enabled
   
   ### Change Logs
   
   Improve spark quickstart, add info about table maintenance and async 
services when metadata table is enabled (one should be aware OCC needs to be 
enabled even in single writer mode)
   I  as a user would like to be informed about this in quick start, since I 
rarely read pages under "concepts" where this information is provided. It's 
esp. important since MT is enabled by default starting from 0.11, and 
misconfiguration can cause data loss.
   Fixes spark quickstart from 0.10 to current.
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   improve spark quickstart, add info about table maintenance and async 
services when metadata table is enabled (one should be aware OCC needs to be 
enabled even in single writer mode)
   I  as a user would like to be informed about this in quick start, since I 
rarely read pages under "concepts" where this information is provided. It's 
esp. important since MT is enabled by default starting from 0.11, and 
misconfiguration can cause data loss.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy opened a new pull request, #7548: fix when I click on Update or MergeInto link in spark quickstart it d…

2022-12-23 Thread GitBox


kazdy opened a new pull request, #7548:
URL: https://github.com/apache/hudi/pull/7548

   …oes not bring me to the update/merge into documentation
   
   ### Change Logs
   
   Split Update and MergeInto to two tables, now when I click on Update or 
MergeInto it brings me there. Before only URL was changing, I had to be in 
SparkSQL tab first to go straight to Update or MergeInto section of quickstart. 
Fixed in 0.12.1 and current.
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   Split Update and MergeInto to two tables, now when I click on Update or 
MergeInto it brings me there. Before only URL was changing, I had to be in 
SparkSQL tab first to go straight to Update or MergeInto section of quickstart. 
Fixed in 0.12.1 and current.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7544: [HUDI-5433] Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread GitBox


nsivabalan commented on code in PR #7544:
URL: https://github.com/apache/hudi/pull/7544#discussion_r1056676874


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1019,21 +1021,27 @@ protected void compactIfNecessary(BaseHoodieWriteClient 
writeClient, String inst
 // finish off any pending compactions if any from previous attempt.
 writeClient.runAnyPendingCompactions();
 
-String latestDeltaCommitTime = 
metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants().lastInstant()
-.get().getTimestamp();
-List pendingInstants = 
dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-.findInstantsBefore(instantTime).getInstants();
+String latestDeltaCommitTimeInMetadataTable = 
metadataMetaClient.reloadActiveTimeline()
+.getDeltaCommitTimeline()
+.filterCompletedInstants()
+.lastInstant().orElseThrow(() -> new HoodieMetadataException("No 
completed deltacommit in metadata table"))
+.getTimestamp();
+List pendingInstantsInDataTable = 
dataMetaClient.reloadActiveTimeline()
+.filterInflightsAndRequested()
+.getInstantsAsStream()
+.filter(instant -> !instant.getTimestamp().equals(instantTime))
+.collect(Collectors.toList());
 
-if (!pendingInstants.isEmpty()) {
+if (!pendingInstantsInDataTable.isEmpty()) {
   LOG.info(String.format("Cannot compact metadata table as there are %d 
inflight instants before latest deltacommit %s: %s",
-  pendingInstants.size(), latestDeltaCommitTime, 
Arrays.toString(pendingInstants.toArray(;
+  pendingInstantsInDataTable.size(), 
latestDeltaCommitTimeInMetadataTable, 
Arrays.toString(pendingInstantsInDataTable.toArray(;

Review Comment:
   can we fix the logging statement. it says "before". 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy opened a new pull request, #7547: add DROP TABLE, TRUNCATE TABLE docs to spark quick start guide, minor syntax fixes to ALTER TABLE docs

2022-12-23 Thread GitBox


kazdy opened a new pull request, #7547:
URL: https://github.com/apache/hudi/pull/7547

   ### Change Logs
   
   add DROP TABLE, TRUNCATE TABLE docs to spark quick start guide 0.12.1 and 
current, minor style fixes to ALTER TABLE docs
   
   ### Impact
   
   low
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   add DROP TABLE, TRUNCATE TABLE docs to spark quick start guide 0.12.1 and 
current, minor style fixes to ALTER TABLE
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table management service

2022-12-23 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1364306798

   
   ## CI report:
   
   * 4ed7915b066588e620163ecd0abafc6c61fd587a UNKNOWN
   * 46f647dae80082ea855489fd86c42b17368af005 UNKNOWN
   * 097828528e3529f0590d38be30e27e2787a64fa2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13944)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #7474: [HUDI-5246] Added validation for Partition Path to not begin with "/"

2022-12-23 Thread GitBox


xushiyan commented on PR #7474:
URL: https://github.com/apache/hudi/pull/7474#issuecomment-1364171946

   i would suggest only do validation at config level not for the actual data 
(partition paths). This validation should not incur any performance hit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7526: Revert "[HUDI-5409] Avoid file index and use fs view cache in COW input format (#7493)"

2022-12-23 Thread GitBox


hudi-bot commented on PR #7526:
URL: https://github.com/apache/hudi/pull/7526#issuecomment-1364136970

   
   ## CI report:
   
   * f375175f14386b3297abc5279df62c64f53083bd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13932)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2022-12-23 Thread GitBox


yihua commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1364130124

   @danny0405 I think Hudi supports the Flink compactor in service mode.  Is 
there any config to get it right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7544: [HUDI-5433] Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread GitBox


hudi-bot commented on PR #7544:
URL: https://github.com/apache/hudi/pull/7544#issuecomment-1364127113

   
   ## CI report:
   
   * 6bcf99cee270008dd2248a74373b8ae8a890c50c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13941)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #7545: [SUPPORT]How to sync data from Kafka to Hudi when use Flink SQL canal-json format

2022-12-23 Thread GitBox


yihua commented on issue #7545:
URL: https://github.com/apache/hudi/issues/7545#issuecomment-1364126549

   @danny0405 @yuzhaojing could you guys provide any insight here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table management service

2022-12-23 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1364068271

   
   ## CI report:
   
   * 4ed7915b066588e620163ecd0abafc6c61fd587a UNKNOWN
   * 46f647dae80082ea855489fd86c42b17368af005 UNKNOWN
   * 24f7f28bb6f5d124e250afae9e5a93b51cb0bd07 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13943)
 
   * 097828528e3529f0590d38be30e27e2787a64fa2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13944)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5913: [HUDI-4287] Optimize Flink checkpoint meta mechanism to fix mistaken pending instants

2022-12-23 Thread GitBox


hudi-bot commented on PR #5913:
URL: https://github.com/apache/hudi/pull/5913#issuecomment-1364066729

   
   ## CI report:
   
   * fbb9ed33252880509289e8385fa8615e7f46b7fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13938)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table management service

2022-12-23 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1364061275

   
   ## CI report:
   
   * 4ed7915b066588e620163ecd0abafc6c61fd587a UNKNOWN
   * 46f647dae80082ea855489fd86c42b17368af005 UNKNOWN
   * a614791ea45854805a892bf53ec99f0dcdad30c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13939)
 
   * 24f7f28bb6f5d124e250afae9e5a93b51cb0bd07 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13943)
 
   * 097828528e3529f0590d38be30e27e2787a64fa2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-23 Thread GitBox


hudi-bot commented on PR #4966:
URL: https://github.com/apache/hudi/pull/4966#issuecomment-1364060351

   
   ## CI report:
   
   * 2f04d4f01d2898541e9f089f33385de0c78ee66b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13937)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen opened a new issue, #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2022-12-23 Thread GitBox


Leoyzen opened a new issue, #7546:
URL: https://github.com/apache/hudi/issues/7546

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   Yes
   
   **Describe the problem you faced**
   
   I'm using HoodieFlinkCompactor to do offline compaction job.
   And it failed to using service mode.
   
   The failure is `Cannot have more than one execute() or executeAsync() call 
in a single environment`.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. start a standalone flink compactor job
   2. enabling service mode
   3. the job fails when "the parallism" jobs done(the next loop)
   4. the job restart
   
   **Expected behavior**
   
   the second loop(which more than the first "parallism" jobs done) success 
when using service mode.
   
   **Environment Description**
   
   * Hudi version :
   
   0.12.1
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   job config
   
   ```bash
   --path oss://dengine-lake-zjk/cloudcode_prod/dwd_egc_adv_req_outra
   --compaction-max-memory 2048
   --seq LIFO
   --compaction-tasks 16
   --plan-select-strategy all
   --min-compaction-interval-seconds 30
   --service
   ```
   
   **Stacktrace**
   
   ```LOG
   2022-12-23 23:14:05,976 [pool-17-thread-1] INFO  
org.apache.flink.api.java.typeutils.TypeExtractor[] - Class class 
org.apache.hudi.common.model.CompactionOperation cannot be used as a POJO type 
because not all fields are valid POJO fields, and must be processed as 
GenericType. Please read the Flink documentation on "Data Types & 
Serialization" for details of the effect on performance.
   2022-12-23 23:14:05,983 [pool-17-thread-1] WARN  
org.apache.flink.resourceplan.applyagent.StreamGraphModifier [] - Path of 
resource plan is not specified, do nothing.
   2022-12-23 23:14:05,983 [pool-17-thread-1] ERROR 
org.apache.hudi.client.RunsTableService  [] - Shutting down 
compaction service due to exception
   org.apache.flink.util.FlinkRuntimeException: Cannot have more than one 
execute() or executeAsync() call in a single environment.
at 
org.apache.flink.client.program.StreamContextEnvironment.validateAllowedExecution(StreamContextEnvironment.java:199)
 ~[flink-dist-1.15-vvr-6.0.2-3-SNAPSHOT.jar:1.15-vvr-6.0.2-3-SNAPSHOT]
at 
org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:187)
 ~[flink-dist-1.15-vvr-6.0.2-3-SNAPSHOT.jar:1.15-vvr-6.0.2-3-SNAPSHOT]
at 
org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:119)
 ~[flink-dist-1.15-vvr-6.0.2-3-SNAPSHOT.jar:1.15-vvr-6.0.2-3-SNAPSHOT]
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1985)
 ~[flink-dist-1.15-vvr-6.0.2-3-SNAPSHOT.jar:1.15-vvr-6.0.2-3-SNAPSHOT]
at 
org.apache.hudi.sink.compact.HoodieFlinkCompactor$AsyncCompactionService.compact(HoodieFlinkCompactor.java:322)
 ~[flink-hudi-bundle-1.3-SNAPSHOT-jar-with-dependencies-20221217104900.jar:?]
at 
org.apache.hudi.sink.compact.HoodieFlinkCompactor$AsyncCompactionService.lambda$startService$0(HoodieFlinkCompactor.java:204)
 ~[flink-hudi-bundle-1.3-SNAPSHOT-jar-with-dependencies-20221217104900.jar:?]
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
 [?:1.8.0_102]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) 
[?:1.8.0_102]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) 
[?:1.8.0_102]
at java.lang.Thread.run(Thread.java:834) [?:1.8.0_102]
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table management service

2022-12-23 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1364004381

   
   ## CI report:
   
   * 4ed7915b066588e620163ecd0abafc6c61fd587a UNKNOWN
   * 46f647dae80082ea855489fd86c42b17368af005 UNKNOWN
   * a614791ea45854805a892bf53ec99f0dcdad30c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13939)
 
   * 24f7f28bb6f5d124e250afae9e5a93b51cb0bd07 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13943)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5470) DFSPropertiesConfiguration support read include confs from diferenet DFS path

2022-12-23 Thread Xianghu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghu Wang updated HUDI-5470:
---
Description: 
For now, if set `include=xxx.properties` in our props file, we have to make 
sure these two props files are in the same path.

This patch removes the above restrictions,that means those props can locate in 
different path

```

s3://hudi_bucket/confs/delta.properties

include=s3://hudi_bucket/confs/common/base.properties

```

 

 

  was:
For now, if set `include=xxx.properties` in our props file, we have to make 
sure these two props files are in the same path.

This patch removes the above restrictions,that means we can those props can 
locate in different path

 

 


> DFSPropertiesConfiguration support read include confs from diferenet DFS path
> -
>
> Key: HUDI-5470
> URL: https://issues.apache.org/jira/browse/HUDI-5470
> Project: Apache Hudi
>  Issue Type: Task
>  Components: deltastreamer
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
> Fix For: 0.13.0
>
>
> For now, if set `include=xxx.properties` in our props file, we have to make 
> sure these two props files are in the same path.
> This patch removes the above restrictions,that means those props can locate 
> in different path
> ```
> s3://hudi_bucket/confs/delta.properties
> include=s3://hudi_bucket/confs/common/base.properties
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table management service

2022-12-23 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1364000527

   
   ## CI report:
   
   * 4ed7915b066588e620163ecd0abafc6c61fd587a UNKNOWN
   * 46f647dae80082ea855489fd86c42b17368af005 UNKNOWN
   * a614791ea45854805a892bf53ec99f0dcdad30c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13939)
 
   * 24f7f28bb6f5d124e250afae9e5a93b51cb0bd07 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5470) DFSPropertiesConfiguration support read include confs from diferenet DFS path

2022-12-23 Thread Xianghu Wang (Jira)
Xianghu Wang created HUDI-5470:
--

 Summary: DFSPropertiesConfiguration support read include confs 
from diferenet DFS path
 Key: HUDI-5470
 URL: https://issues.apache.org/jira/browse/HUDI-5470
 Project: Apache Hudi
  Issue Type: Task
  Components: deltastreamer
Reporter: Xianghu Wang
Assignee: Xianghu Wang
 Fix For: 0.13.0


For now, if set `include=xxx.properties` in our props file, we have to make 
sure these two props files are in the same path.

This patch removes the above restrictions,that means we can those props can 
locate in different path

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7527: [HUDI-5411] Avoid virtual key info for COW table in the input format

2022-12-23 Thread GitBox


hudi-bot commented on PR #7527:
URL: https://github.com/apache/hudi/pull/7527#issuecomment-1363997295

   
   ## CI report:
   
   * f25fdb45646c869c63ee1d0ace6f544e079a2c01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table management service

2022-12-23 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1363996282

   
   ## CI report:
   
   * 4ed7915b066588e620163ecd0abafc6c61fd587a UNKNOWN
   * 46f647dae80082ea855489fd86c42b17368af005 UNKNOWN
   * a614791ea45854805a892bf53ec99f0dcdad30c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13939)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7543: [HUDI-5401] Ensure user-provided hive metastore uri is set in HiveConf if not already set

2022-12-23 Thread GitBox


hudi-bot commented on PR #7543:
URL: https://github.com/apache/hudi/pull/7543#issuecomment-1363933814

   
   ## CI report:
   
   * 2a45845fabc82807e1d65aecdf7728b4968aa131 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13934)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7542: [HUDI-5469] Hive doesn't respect the space at the end of partition path, so remove it to avoid dupl…

2022-12-23 Thread GitBox


hudi-bot commented on PR #7542:
URL: https://github.com/apache/hudi/pull/7542#issuecomment-1363933776

   
   ## CI report:
   
   * e33e8c83aeb7709022ac01b8a2a83f21da797156 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-5456) Flink streaming read skips uncommitted instants

2022-12-23 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651636#comment-17651636
 ] 

Danny Chen commented on HUDI-5456:
--

Fixed via master branch: 2a472f4c4367b30544a93c87448156964df12202

> Flink streaming read skips uncommitted instants
> ---
>
> Key: HUDI-5456
> URL: https://issues.apache.org/jira/browse/HUDI-5456
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5456) Flink streaming read skips uncommitted instants

2022-12-23 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-5456.
--

> Flink streaming read skips uncommitted instants
> ---
>
> Key: HUDI-5456
> URL: https://issues.apache.org/jira/browse/HUDI-5456
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (bdfaa4e116 -> 2a472f4c43)

2022-12-23 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from bdfaa4e116 HUDI-5398. Fix Typo in hudi-integ-test#README.md. (#7477)
 add 2a472f4c43 [HUDI-5456] Flink streaming read skips uncommitted instants 
(#7540)

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/hudi/source/IncrementalInputSplits.java  | 5 ++---
 .../java/org/apache/hudi/source/TestIncrementalInputSplits.java   | 8 
 2 files changed, 10 insertions(+), 3 deletions(-)



[GitHub] [hudi] danny0405 merged pull request #7540: [HUDI-5456] Flink streaming read skips uncommitted instants

2022-12-23 Thread GitBox


danny0405 merged PR #7540:
URL: https://github.com/apache/hudi/pull/7540


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] With-winds opened a new issue, #7545: [SUPPORT]How to sync data from Kafka to Hudi when use Flink SQL canal-json format

2022-12-23 Thread GitBox


With-winds opened a new issue, #7545:
URL: https://github.com/apache/hudi/issues/7545

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   When read from kafka use Flink SQL canal-json format,unable to capture 
delete event,What should I do to deal with it.
   
   **Expected behavior**
   
   Sync data from Kafka to Hudi when use Flink SQL canal-json format
   
   **Environment Description**
   
   * Hudi version : 0.12.1
   
   * Hive version : 2.3.7
   
   * Hadoop version : 2.7.3
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on issue #7125: [SUPPORT] Metadata Column Stats Index failing with Merge on Read table.

2022-12-23 Thread GitBox


codope commented on issue #7125:
URL: https://github.com/apache/hudi/issues/7125#issuecomment-1363825375

   Synced up on call and we looked at the clean commit metadata. The issue does 
not seem to be related to metadata table. One thing to note is that the table 
is multi-level partitioned i.e. there are multiple partition fields and 
`filePathsToBeDeletedPerPartition` was empty for quite a few partitions. 
Suggested a workaround disable and re-enable metadata then the commits should 
go through. Followups:
   1. Requested to share the non-empty `filePathsToBeDeletedPerPartition`. OP 
will get back after checking with the team.
   2. We need to reproduce the issue with similar configs but multiple 
partition fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7544: [HUDI-5433] Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread GitBox


hudi-bot commented on PR #7544:
URL: https://github.com/apache/hudi/pull/7544#issuecomment-1363822466

   
   ## CI report:
   
   * 6bcf99cee270008dd2248a74373b8ae8a890c50c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13941)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2022-12-23 Thread GitBox


hudi-bot commented on PR #6782:
URL: https://github.com/apache/hudi/pull/6782#issuecomment-1363821452

   
   ## CI report:
   
   * 0a57ee15196e2b6978dbf74efad95cd58f6e7f13 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13923)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7544: [HUDI-5433] Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread GitBox


hudi-bot commented on PR #7544:
URL: https://github.com/apache/hudi/pull/7544#issuecomment-1363817609

   
   ## CI report:
   
   * 6bcf99cee270008dd2248a74373b8ae8a890c50c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7540: [HUDI-5456] Flink streaming read skips uncommitted instants

2022-12-23 Thread GitBox


hudi-bot commented on PR #7540:
URL: https://github.com/apache/hudi/pull/7540#issuecomment-1363813108

   
   ## CI report:
   
   * 18923541a39e6c1ec15b82a7dec143fcc51ae5d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13914)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13930)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7528: [HUDI-5443] Fixing exception trying to read MOR table after `NestedSchemaPruning` rule has been applied

2022-12-23 Thread GitBox


hudi-bot commented on PR #7528:
URL: https://github.com/apache/hudi/pull/7528#issuecomment-1363813064

   
   ## CI report:
   
   * f3a439884f90500e29da0075f4d0ad7d73a484b3 UNKNOWN
   * 636b3000094521146d90c541b8cfd3b4ee6e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5433) Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-5433:
--
Status: Patch Available  (was: In Progress)

> Fix the way we deduce the pending instants for MDT writes
> -
>
> Key: HUDI-5433
> URL: https://issues.apache.org/jira/browse/HUDI-5433
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> we trigger compaction in MDT, only when there are no pending inflights apart 
> from the one thats currently updating the MDT. So we use below code snippet 
> for it. 
>  
> {code:java}
>  List pendingInstants = 
> dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
> .findInstantsBefore(instantTime).getInstants(); {code}
> As you could see, we use "findInstantsBefore" which could not yield right 
> results at all times.
>  
> So, we need to find all inflight instants and see if there are any except the 
> current commit thats updating the MDT. If there are any, we should defer 
> compaction.
> Impact:
> writes to MDT might fail if there was any missed inflight and later it was 
> rolledback. Users have to disable MDT and make progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5433) Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5433:
-
Labels: pull-request-available  (was: )

> Fix the way we deduce the pending instants for MDT writes
> -
>
> Key: HUDI-5433
> URL: https://issues.apache.org/jira/browse/HUDI-5433
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> we trigger compaction in MDT, only when there are no pending inflights apart 
> from the one thats currently updating the MDT. So we use below code snippet 
> for it. 
>  
> {code:java}
>  List pendingInstants = 
> dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
> .findInstantsBefore(instantTime).getInstants(); {code}
> As you could see, we use "findInstantsBefore" which could not yield right 
> results at all times.
>  
> So, we need to find all inflight instants and see if there are any except the 
> current commit thats updating the MDT. If there are any, we should defer 
> compaction.
> Impact:
> writes to MDT might fail if there was any missed inflight and later it was 
> rolledback. Users have to disable MDT and make progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] codope opened a new pull request, #7544: [HUDI-5433] Fix the way we deduce the pending instants for MDT writes

2022-12-23 Thread GitBox


codope opened a new pull request, #7544:
URL: https://github.com/apache/hudi/pull/7544

   ### Change Logs
   
   Compaction in metadata table (MDT) is triggered only when there are no 
pending inflight instants apart from the one that's currently updating (let's 
say C3) the MDT. The current code only checks for inflight instants before C3. 
In fact, we should be considering all inflight instants except C3. This PR 
fixes that behavior.
   
   ### Impact
   
   MDT compaction.
   
   ### Risk level (write none, low medium or high below)
   
   medium
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-23 Thread GitBox


hudi-bot commented on PR #7480:
URL: https://github.com/apache/hudi/pull/7480#issuecomment-1363745805

   
   ## CI report:
   
   * a3015b0c6ecdfbdaebdb6f539f3ba167eabb327c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13899)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-23 Thread GitBox


hudi-bot commented on PR #7480:
URL: https://github.com/apache/hudi/pull/7480#issuecomment-1363741831

   
   ## CI report:
   
   * ac9854b7f378ba0968373d76033ce07f962f9a49 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13926)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13940)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on a diff in pull request #7542: [HUDI-5469] Hive doesn't respect the space at the end of partition path, so remove it to avoid dupl…

2022-12-23 Thread GitBox


fengjian428 commented on code in PR #7542:
URL: https://github.com/apache/hudi/pull/7542#discussion_r1056143997


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -131,6 +131,9 @@ public static String getRecordPartitionPath(GenericRecord 
record, List p
   } else {
 if (encodePartitionPath) {
   fieldVal = PartitionPathEncodeUtils.escapePathName(fieldVal);
+} else {
+  // Hive doesn't respect the space at the end, so remove it to avoid 
duplicate keys error
+  fieldVal = fieldVal.trim();
 }

Review Comment:
   Haven't tested it against other engines besides HMS.  maybe can make a 
configuration for it and infer from HMS? WDYT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on a diff in pull request #7542: [HUDI-5469] Hive doesn't respect the space at the end of partition path, so remove it to avoid dupl…

2022-12-23 Thread GitBox


fengjian428 commented on code in PR #7542:
URL: https://github.com/apache/hudi/pull/7542#discussion_r1056143997


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -131,6 +131,9 @@ public static String getRecordPartitionPath(GenericRecord 
record, List p
   } else {
 if (encodePartitionPath) {
   fieldVal = PartitionPathEncodeUtils.escapePathName(fieldVal);
+} else {
+  // Hive doesn't respect the space at the end, so remove it to avoid 
duplicate keys error
+  fieldVal = fieldVal.trim();
 }

Review Comment:
   Haven't tested it against other engines besides HMS. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on a diff in pull request #7542: [HUDI-5469] Hive doesn't respect the space at the end of partition path, so remove it to avoid dupl…

2022-12-23 Thread GitBox


fengjian428 commented on code in PR #7542:
URL: https://github.com/apache/hudi/pull/7542#discussion_r1056143997


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -131,6 +131,9 @@ public static String getRecordPartitionPath(GenericRecord 
record, List p
   } else {
 if (encodePartitionPath) {
   fieldVal = PartitionPathEncodeUtils.escapePathName(fieldVal);
+} else {
+  // Hive doesn't respect the space at the end, so remove it to avoid 
duplicate keys error
+  fieldVal = fieldVal.trim();
 }

Review Comment:
   Haven't tested other engines besides HMS



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org