[GitHub] [hudi] hudi-bot commented on pull request #6059: [HUDI-1575] Early Conflict Detection For Multi-writer

2022-07-08 Thread GitBox


hudi-bot commented on PR #6059:
URL: https://github.com/apache/hudi/pull/6059#issuecomment-1179491203

   
   ## CI report:
   
   * 58ea19aec0a87cf9567e805acb577dba7f1281bc UNKNOWN
   * eb5767f06f802074f1efd40f6696e3a313df3c69 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5937: [HUDI-4298] When reading the mor table with QUERY_TYPE_SNAPSHOT,Unabl…

2022-07-08 Thread GitBox


hudi-bot commented on PR #5937:
URL: https://github.com/apache/hudi/pull/5937#issuecomment-1179486063

   
   ## CI report:
   
   * 710bd4a41b9b884750aaedcafcbabcdddc599e15 UNKNOWN
   * 176881ae3e8411a1f49d2a69bfa1ce52befd261e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9799)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars closed pull request #6065: [HUDI-3503] Add call procedure for CleanCommand

2022-07-08 Thread GitBox


XuQianJin-Stars closed pull request #6065: [HUDI-3503]  Add call procedure for 
CleanCommand
URL: https://github.com/apache/hudi/pull/6065


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6059: [HUDI-1575] Early Conflict Detection For Multi-writer

2022-07-08 Thread GitBox


hudi-bot commented on PR #6059:
URL: https://github.com/apache/hudi/pull/6059#issuecomment-1179478572

   
   ## CI report:
   
   * 58ea19aec0a87cf9567e805acb577dba7f1281bc UNKNOWN
   * 75704b9e85a30643ac667f5779c3d493c8cbadfb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9789)
 
   * eb5767f06f802074f1efd40f6696e3a313df3c69 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5937: [HUDI-4298] When reading the mor table with QUERY_TYPE_SNAPSHOT,Unabl…

2022-07-08 Thread GitBox


hudi-bot commented on PR #5937:
URL: https://github.com/apache/hudi/pull/5937#issuecomment-1179478547

   
   ## CI report:
   
   * 710bd4a41b9b884750aaedcafcbabcdddc599e15 UNKNOWN
   * ce3ff4a5ff17b2fdebd0b5ef1ac06c46f84c59e7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9768)
 
   * 176881ae3e8411a1f49d2a69bfa1ce52befd261e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9799)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6059: [HUDI-1575] Early Conflict Detection For Multi-writer

2022-07-08 Thread GitBox


hudi-bot commented on PR #6059:
URL: https://github.com/apache/hudi/pull/6059#issuecomment-1179478147

   
   ## CI report:
   
   * 58ea19aec0a87cf9567e805acb577dba7f1281bc UNKNOWN
   * 75704b9e85a30643ac667f5779c3d493c8cbadfb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9789)
 
   * eb5767f06f802074f1efd40f6696e3a313df3c69 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5937: [HUDI-4298] When reading the mor table with QUERY_TYPE_SNAPSHOT,Unabl…

2022-07-08 Thread GitBox


hudi-bot commented on PR #5937:
URL: https://github.com/apache/hudi/pull/5937#issuecomment-1179478129

   
   ## CI report:
   
   * 710bd4a41b9b884750aaedcafcbabcdddc599e15 UNKNOWN
   * ce3ff4a5ff17b2fdebd0b5ef1ac06c46f84c59e7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9768)
 
   * 176881ae3e8411a1f49d2a69bfa1ce52befd261e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6066: [HUDI-4372] Enable matadata table by default for flink

2022-07-08 Thread GitBox


hudi-bot commented on PR #6066:
URL: https://github.com/apache/hudi/pull/6066#issuecomment-1179462118

   
   ## CI report:
   
   * cc7277e131b4c19c268202adc1a86d3745603696 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9797)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6065: [HUDI-3503] Add call procedure for CleanCommand

2022-07-08 Thread GitBox


hudi-bot commented on PR #6065:
URL: https://github.com/apache/hudi/pull/6065#issuecomment-1179462114

   
   ## CI report:
   
   * 708c4225afc3106d46e6fd844d48fc46d69ebc5a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6065: [HUDI-3503] Add call procedure for CleanCommand

2022-07-08 Thread GitBox


hudi-bot commented on PR #6065:
URL: https://github.com/apache/hudi/pull/6065#issuecomment-1179455222

   
   ## CI report:
   
   * 708c4225afc3106d46e6fd844d48fc46d69ebc5a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jiezi2026 opened a new issue, #6070: [SUPPORT]'hoodie.datasource.write.hive_style_partitioning':'true' does not take effect in hudi-0.11.1 & spark 3.2.1

2022-07-08 Thread GitBox


jiezi2026 opened a new issue, #6070:
URL: https://github.com/apache/hudi/issues/6070

   In hudi-0.11.1, when I use spark 3.2.1 for data initialization (bulk_insert 
mode),'hoodie datasource. write. hive_ style_ Partitioning':'true' does not 
take effect. But it works in upsert mode.
   
   
   
![image](https://user-images.githubusercontent.com/98273236/178086681-7b261036-dcda-4941-a47a-3c496ba47fef.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on pull request #6065: [HUDI-3503] Add call procedure for CleanCommand

2022-07-08 Thread GitBox


XuQianJin-Stars commented on PR #6065:
URL: https://github.com/apache/hudi/pull/6065#issuecomment-1179454814

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-3500] Add call procedure for RepairsCommand (#6053)

2022-07-08 Thread forwardxu
This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6566fc6625 [HUDI-3500] Add call procedure for RepairsCommand (#6053)
6566fc6625 is described below

commit 6566fc6625072b76dd121f0b28ce7b1ef11b6259
Author: superche <73096722+hechao-u...@users.noreply.github.com>
AuthorDate: Sat Jul 9 09:29:14 2022 +0800

[HUDI-3500] Add call procedure for RepairsCommand (#6053)
---
 .../org/apache/spark/sql/hudi/DeDupeType.scala |  28 ++
 .../org/apache/spark/sql/hudi/DedupeSparkJob.scala | 245 ++
 .../org/apache/spark/sql/hudi/SparkHelpers.scala   | 134 ++
 .../hudi/command/procedures/HoodieProcedures.scala |   5 +
 .../RepairAddpartitionmetaProcedure.scala  |  89 
 .../RepairCorruptedCleanFilesProcedure.scala   |  86 
 .../procedures/RepairDeduplicateProcedure.scala|  86 
 .../RepairMigratePartitionMetaProcedure.scala  | 112 +
 .../RepairOverwriteHoodiePropsProcedure.scala  |  89 
 .../src/test/resources/table-config.properties |  21 +
 .../sql/hudi/procedure/TestRepairsProcedure.scala  | 507 +
 11 files changed, 1402 insertions(+)

diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DeDupeType.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DeDupeType.scala
new file mode 100644
index 00..93cec470ec
--- /dev/null
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DeDupeType.scala
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+object DeDupeType extends Enumeration {
+
+  type dedupeType = Value
+
+  val INSERT_TYPE = Value("insert_type")
+  val UPDATE_TYPE = Value("update_type")
+  val UPSERT_TYPE = Value("upsert_type")
+}
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DedupeSparkJob.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DedupeSparkJob.scala
new file mode 100644
index 00..b6f610e7d7
--- /dev/null
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DedupeSparkJob.scala
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.{HoodieBaseFile, HoodieRecord}
+import org.apache.hudi.common.table.HoodieTableMetaClient
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView
+import org.apache.hudi.exception.HoodieException
+import org.apache.log4j.Logger
+import org.apache.spark.sql.{DataFrame, Row, SQLContext}
+
+import java.util.stream.Collectors
+import scala.collection.JavaConversions._
+import scala.collection.mutable.{Buffer, HashMap, HashSet, ListBuffer}
+
+/**
+  * Spark job to de-duplicate data present in a partition path
+  */
+class DedupeSparkJob(basePath: String,
+ duplicatedPartitionPath: String,
+ repairOutputPath: String,
+ sqlContext: SQLContext,
+ fs: FileSystem,
+ dedupeType: DeDupeType.Value) {
+
+  val sparkHelper = new SparkHelper

[GitHub] [hudi] XuQianJin-Stars merged pull request #6053: [HUDI-3500] Add call procedure for RepairsCommand

2022-07-08 Thread GitBox


XuQianJin-Stars merged PR #6053:
URL: https://github.com/apache/hudi/pull/6053


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-1822) [Umbrella] Multi Modal Indexing

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1822:
-
Fix Version/s: 0.11.0
   (was: 0.12.0)

> [Umbrella] Multi Modal Indexing
> ---
>
> Key: HUDI-1822
> URL: https://issues.apache.org/jira/browse/HUDI-1822
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: index
>Reporter: satish
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.11.0
>
>
> RFC-27 umbrella ticket. Goal is to support global range index to improve 
> query planning time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-1822) [Umbrella] Multi Modal Indexing

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-1822.

 Reviewers: Ethan Guo, Sagar Sumit
Resolution: Done

> [Umbrella] Multi Modal Indexing
> ---
>
> Key: HUDI-1822
> URL: https://issues.apache.org/jira/browse/HUDI-1822
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: index
>Reporter: satish
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.11.0
>
>
> RFC-27 umbrella ticket. Goal is to support global range index to improve 
> query planning time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6066: [HUDI-4372] Enable matadata table by default for flink

2022-07-08 Thread GitBox


hudi-bot commented on PR #6066:
URL: https://github.com/apache/hudi/pull/6066#issuecomment-1179441018

   
   ## CI report:
   
   * fcad9be8d245787907953977cd3567a03c842cdc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9787)
 
   * cc7277e131b4c19c268202adc1a86d3745603696 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9797)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-2429) [UMBRELLA] Comprehensive Schema evolution in Hudi

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2429.

Resolution: Done

> [UMBRELLA] Comprehensive Schema evolution in Hudi
> -
>
> Key: HUDI-2429
> URL: https://issues.apache.org/jira/browse/HUDI-2429
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: tao meng
>Assignee: tao meng
>Priority: Blocker
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.11.0
>
>
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution]
>  
> Support comprehensive schema evolution in Hudi
>  * rename cols
>  * drop cols
>  * reorder cols
>  * re-add cols



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-2429) [UMBRELLA] Comprehensive Schema evolution in Hudi

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2429:
-
Fix Version/s: 0.11.0
   (was: 0.12.0)

> [UMBRELLA] Comprehensive Schema evolution in Hudi
> -
>
> Key: HUDI-2429
> URL: https://issues.apache.org/jira/browse/HUDI-2429
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: tao meng
>Assignee: tao meng
>Priority: Blocker
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.11.0
>
>
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution]
>  
> Support comprehensive schema evolution in Hudi
>  * rename cols
>  * drop cols
>  * reorder cols
>  * re-add cols



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6066: [HUDI-4372] Enable matadata table by default for flink

2022-07-08 Thread GitBox


hudi-bot commented on PR #6066:
URL: https://github.com/apache/hudi/pull/6066#issuecomment-1179439908

   
   ## CI report:
   
   * fcad9be8d245787907953977cd3567a03c842cdc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9787)
 
   * cc7277e131b4c19c268202adc1a86d3745603696 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4015) Integ test Infra

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4015:
-
Fix Version/s: (was: 0.12.0)

> Integ test Infra
> 
>
> Key: HUDI-4015
> URL: https://issues.apache.org/jira/browse/HUDI-4015
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4324) Remove useJdbc config from meta sync tools

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-4324:


Assignee: Raymond Xu  (was: Jian Feng)

> Remove useJdbc config from meta sync tools
> --
>
> Key: HUDI-4324
> URL: https://issues.apache.org/jira/browse/HUDI-4324
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: meta-sync
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4137) Implement SnowflakeSyncTool to support Hudi to Snowflake Integration

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4137:
-
Status: In Progress  (was: Open)

> Implement SnowflakeSyncTool to support Hudi to Snowflake Integration
> 
>
> Key: HUDI-4137
> URL: https://issues.apache.org/jira/browse/HUDI-4137
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: integration, pull-request-available
> Fix For: 0.12.0
>
>
> Implement SnowflakeSyncTool similar to the BigQuerySyncTool to support Hudi 
> to Snowflake Integration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on a diff in pull request #5855: [HUDI-4249] Fixing in-memory `HoodieData` implementation to operate lazily

2022-07-08 Thread GitBox


xushiyan commented on code in PR #5855:
URL: https://github.com/apache/hudi/pull/5855#discussion_r917164576


##
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java:
##
@@ -107,6 +108,19 @@ public static  HashMap combine(Map one, 
Map another) {
 return combined;
   }
 
+  /**
+   * Combines provided {@link Map}s into one, returning new instance of {@link 
HashMap}.
+   *
+   * NOTE: That values associated with overlapping keys from the second map, 
will override
+   *   values from the first one
+   */
+  public static  HashMap combine(Map one, Map another, 
BiFunction merge) {

Review Comment:
   UT cover this?



##
hudi-common/src/main/java/org/apache/hudi/common/data/HoodieListPairData.java:
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.data;
+
+import org.apache.hudi.common.function.SerializableBiFunction;
+import org.apache.hudi.common.function.SerializableFunction;
+import org.apache.hudi.common.function.SerializablePairFunction;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.Pair;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collector;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static 
org.apache.hudi.common.function.FunctionWrapper.throwingMapToPairWrapper;
+import static 
org.apache.hudi.common.function.FunctionWrapper.throwingMapWrapper;
+
+/**
+ * In-memory implementation of {@link HoodiePairData} holding internally a 
{@link Stream} of {@link Pair}s.
+ *
+ * NOTE: This is an in-memory counterpart for {@code HoodieJavaPairRDD}, and 
it strives to provide
+ *   similar semantic as RDD container -- all intermediate (non-terminal, 
not de-referencing
+ *   the stream like "collect", "groupBy", etc) operations are executed 
*lazily*.
+ *   This allows to make sure that compute/memory churn is minimal since 
only necessary
+ *   computations will ultimately be performed.
+ *
+ * @param  type of the key in the pair
+ * @param  type of the value in the pair
+ */
+public class HoodieListPairData extends HoodiePairData {

Review Comment:
   add UT for this?



##
hudi-common/src/main/java/org/apache/hudi/common/data/HoodieListPairData.java:
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.data;
+
+import org.apache.hudi.common.function.SerializableBiFunction;
+import org.apache.hudi.common.function.SerializableFunction;
+import org.apache.hudi.common.function.SerializablePairFunction;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.Pair;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collector;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static 
org.apache.hudi.common.function.FunctionWrapper.throwingMapToPairWrapper;
+import static 
org.apache.hudi.common.function.FunctionWrapper.throwingMapWrapper;
+
+/**
+ * In-memory implementation of {@link HoodiePairData} holding internally a 
{@link Stream} of {@link Pair}s.
+ *
+ * NOTE: This i

[GitHub] [hudi] hudi-bot commented on pull request #5855: [HUDI-4249] Fixing in-memory `HoodieData` implementation to operate lazily

2022-07-08 Thread GitBox


hudi-bot commented on PR #5855:
URL: https://github.com/apache/hudi/pull/5855#issuecomment-1179420559

   
   ## CI report:
   
   * 17041eb177d92a050fee2f9c26a58b61066a730f UNKNOWN
   * fd4df3739b15fb1fccea78644aaf0c53e2549bed Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5855: [HUDI-4249] Fixing in-memory `HoodieData` implementation to operate lazily

2022-07-08 Thread GitBox


hudi-bot commented on PR #5855:
URL: https://github.com/apache/hudi/pull/5855#issuecomment-1179394918

   
   ## CI report:
   
   * 17041eb177d92a050fee2f9c26a58b61066a730f UNKNOWN
   * 348fb378321389b8c1b9d2add9864e2b0fb52975 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9775)
 
   * fd4df3739b15fb1fccea78644aaf0c53e2549bed Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5855: [HUDI-4249] Fixing in-memory `HoodieData` implementation to operate lazily

2022-07-08 Thread GitBox


hudi-bot commented on PR #5855:
URL: https://github.com/apache/hudi/pull/5855#issuecomment-1179364304

   
   ## CI report:
   
   * 17041eb177d92a050fee2f9c26a58b61066a730f UNKNOWN
   * 348fb378321389b8c1b9d2add9864e2b0fb52975 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9775)
 
   * fd4df3739b15fb1fccea78644aaf0c53e2549bed UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4915: [HUDI-3764] Allow loading external configs while querying Hudi tables with Spark

2022-07-08 Thread GitBox


hudi-bot commented on PR #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1179356740

   
   ## CI report:
   
   * 065cd528826007f6f40154ab75d5a769447823f4 UNKNOWN
   * c7060fd1e5e5b479f97d6fe5c307d1d6449ba8ed Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9368)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4915: [HUDI-3764] Allow loading external configs while querying Hudi tables with Spark

2022-07-08 Thread GitBox


hudi-bot commented on PR #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1179354046

   
   ## CI report:
   
   * 065cd528826007f6f40154ab75d5a769447823f4 UNKNOWN
   * c7060fd1e5e5b479f97d6fe5c307d1d6449ba8ed Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9368)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on pull request #4915: [HUDI-3764] Allow loading external configs while querying Hudi tables with Spark

2022-07-08 Thread GitBox


zhedoubushishi commented on PR #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1179338744

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] desaismi opened a new issue, #6069: [SUPPORT] /hoodie/temp Folder and contents not getting deleted

2022-07-08 Thread GitBox


desaismi opened a new issue, #6069:
URL: https://github.com/apache/hudi/issues/6069

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   Upon writing to tables in s3 using Hudi, Hudi creates 
`.hoodie/.temp/` artifacts in the metadata folder folder for 
the table. After write is complete, the temp artifacts get deleted along with 
the `.temp/ ` folder. For a couple of our tables, we have noticed the temp 
artifacts never got deleted. We want to figure out why this occurred, and if 
it's safe to manually delete the artifacts remaining from past writes.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   Not sure, we are writing to the same table every 10 minutes consistently and 
seeing this occur once for a couple of tables
   
   **Expected behavior**
   
   We expect that the temp artifacts are deleted after each write to a table
   
   **Environment Description**
   
   * Hudi version : 0.8.0
   
   * Spark version : Spark 2.4.7
   
   * Hive version : Hive 2.3.7
   
   * Hadoop version : Amazon 2.10.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-2531) [UMBRELLA] Support Dataset APIs in writer paths

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2531:
-
Fix Version/s: (was: 0.12.0)

> [UMBRELLA] Support Dataset APIs in writer paths
> ---
>
> Key: HUDI-2531
> URL: https://issues.apache.org/jira/browse/HUDI-2531
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: spark
>Reporter: Raymond Xu
>Assignee: XiaoyuGeng
>Priority: Major
>  Labels: hudi-umbrellas
>
> To make use of Dataset APIs in writer paths instead of RDD.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-1236) [UMBRELLA] Integ Test suite infra

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-1236.

Resolution: Done

> [UMBRELLA] Integ Test suite infra 
> --
>
> Key: HUDI-1236
> URL: https://issues.apache.org/jira/browse/HUDI-1236
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Testing, tests-ci
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: hudi-umbrellas
> Fix For: 0.12.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Long running test suite that checks for correctness across all deployment 
> modes (batch/streaming) and writers (deltastreamer/spark) and readers (hive, 
> presto, spark)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1179313622

   
   ## CI report:
   
   * 4cb269cf6fadc020ed8e512d673d3435a69e4740 UNKNOWN
   * 0c113baae795ddf7f884b868dcde0ddc3f8adb92 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


xiarixiaoyao commented on code in PR #6013:
URL: https://github.com/apache/hudi/pull/6013#discussion_r917097007


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -0,0 +1,894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.catalog;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieFileFormat;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.exception.HoodieCatalogException;
+import org.apache.hudi.hadoop.utils.HoodieInputFormatUtils;
+import org.apache.hudi.sync.common.util.ConfigUtils;
+import org.apache.hudi.table.format.FilePathUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.avro.Schema;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.sql.parser.hive.ddl.SqlAlterHiveDatabase;
+import org.apache.flink.sql.parser.hive.ddl.SqlAlterHiveDatabaseOwner;
+import org.apache.flink.sql.parser.hive.ddl.SqlCreateHiveDatabase;
+import org.apache.flink.table.catalog.AbstractCatalog;
+import org.apache.flink.table.catalog.CatalogBaseTable;
+import org.apache.flink.table.catalog.CatalogDatabase;
+import org.apache.flink.table.catalog.CatalogDatabaseImpl;
+import org.apache.flink.table.catalog.CatalogFunction;
+import org.apache.flink.table.catalog.CatalogPartition;
+import org.apache.flink.table.catalog.CatalogPartitionSpec;
+import org.apache.flink.table.catalog.CatalogPropertiesUtil;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.catalog.CatalogView;
+import org.apache.flink.table.catalog.ObjectPath;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.exceptions.DatabaseAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.DatabaseNotEmptyException;
+import org.apache.flink.table.catalog.exceptions.DatabaseNotExistException;
+import org.apache.flink.table.catalog.exceptions.FunctionAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.FunctionNotExistException;
+import 
org.apache.flink.table.catalog.exceptions.PartitionAlreadyExistsException;
+import org.apache.flink.table.catalog.exceptions.PartitionNotExistException;
+import org.apache.flink.table.catalog.exceptions.PartitionSpecInvalidException;
+import org.apache.flink.table.catalog.exceptions.TableAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.TableNotExistException;
+import org.apache.flink.table.catalog.exceptions.TableNotPartitionedException;
+import org.apache.flink.table.catalog.exceptions.TablePartitionedException;
+import org.apache.flink.table.catalog.stats.CatalogColumnStatistics;
+import org.apache.flink.table.catalog.stats.CatalogTableStatistics;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.AlreadyExistsException;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.InvalidOperationException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownDBException;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.

[jira] [Updated] (HUDI-3039) [Umbrella] Bucket Index

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3039:
-
Fix Version/s: 0.11.0
   (was: 0.12.0)

> [Umbrella] Bucket Index
> ---
>
> Key: HUDI-3039
> URL: https://issues.apache.org/jira/browse/HUDI-3039
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: index
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
> Fix For: 0.11.0
>
>
> RFC-29 umbrella ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-3039) [Umbrella] Bucket Index

2022-07-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-3039.

Resolution: Done

> [Umbrella] Bucket Index
> ---
>
> Key: HUDI-3039
> URL: https://issues.apache.org/jira/browse/HUDI-3039
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: index
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
> Fix For: 0.11.0
>
>
> RFC-29 umbrella ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


xiarixiaoyao commented on code in PR #6013:
URL: https://github.com/apache/hudi/pull/6013#discussion_r917090371


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -0,0 +1,894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.catalog;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieFileFormat;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.exception.HoodieCatalogException;
+import org.apache.hudi.hadoop.utils.HoodieInputFormatUtils;
+import org.apache.hudi.sync.common.util.ConfigUtils;
+import org.apache.hudi.table.format.FilePathUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.avro.Schema;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.sql.parser.hive.ddl.SqlAlterHiveDatabase;
+import org.apache.flink.sql.parser.hive.ddl.SqlAlterHiveDatabaseOwner;
+import org.apache.flink.sql.parser.hive.ddl.SqlCreateHiveDatabase;
+import org.apache.flink.table.catalog.AbstractCatalog;
+import org.apache.flink.table.catalog.CatalogBaseTable;
+import org.apache.flink.table.catalog.CatalogDatabase;
+import org.apache.flink.table.catalog.CatalogDatabaseImpl;
+import org.apache.flink.table.catalog.CatalogFunction;
+import org.apache.flink.table.catalog.CatalogPartition;
+import org.apache.flink.table.catalog.CatalogPartitionSpec;
+import org.apache.flink.table.catalog.CatalogPropertiesUtil;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.catalog.CatalogView;
+import org.apache.flink.table.catalog.ObjectPath;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.exceptions.DatabaseAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.DatabaseNotEmptyException;
+import org.apache.flink.table.catalog.exceptions.DatabaseNotExistException;
+import org.apache.flink.table.catalog.exceptions.FunctionAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.FunctionNotExistException;
+import 
org.apache.flink.table.catalog.exceptions.PartitionAlreadyExistsException;
+import org.apache.flink.table.catalog.exceptions.PartitionNotExistException;
+import org.apache.flink.table.catalog.exceptions.PartitionSpecInvalidException;
+import org.apache.flink.table.catalog.exceptions.TableAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.TableNotExistException;
+import org.apache.flink.table.catalog.exceptions.TableNotPartitionedException;
+import org.apache.flink.table.catalog.exceptions.TablePartitionedException;
+import org.apache.flink.table.catalog.stats.CatalogColumnStatistics;
+import org.apache.flink.table.catalog.stats.CatalogTableStatistics;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.AlreadyExistsException;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.InvalidOperationException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownDBException;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.

[GitHub] [hudi] xiarixiaoyao commented on pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-07-08 Thread GitBox


xiarixiaoyao commented on PR #6046:
URL: https://github.com/apache/hudi/pull/6046#issuecomment-1179289240

   nice work!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (fc8d96246a -> b686c07407)

2022-07-08 Thread mengtao
This is an automated email from the ASF dual-hosted git repository.

mengtao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from fc8d96246a [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post 
schema evolution. (#5995)
 add b686c07407 [HUDI-4276] Reconcile schema-inject null values for missing 
fields and add new fields (#6017)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/client/BaseHoodieWriteClient.java  | 17 +++--
 .../table/action/commit/HoodieMergeHelper.java |  2 +-
 .../scala/org/apache/hudi/HoodieSparkUtils.scala   |  9 +--
 .../java/org/apache/hudi/avro/HoodieAvroUtils.java | 42 ++--
 .../hudi/common/config/HoodieCommonConfig.java |  7 ++
 .../table/log/AbstractHoodieLogRecordReader.java   |  4 +-
 .../schema/action/InternalSchemaMerger.java| 10 ++-
 .../schema/utils/AvroSchemaEvolutionUtils.java | 74 ++--
 .../internal/schema/utils/InternalSchemaUtils.java |  7 +-
 .../org/apache/hudi/avro/TestHoodieAvroUtils.java  | 30 +
 .../schema/utils/TestAvroSchemaEvolutionUtils.java | 78 ++
 .../scala/org/apache/hudi/DataSourceOptions.scala  |  7 +-
 .../org/apache/hudi/HoodieSparkSqlWriter.scala | 24 +--
 .../org/apache/hudi/TestHoodieSparkUtils.scala |  4 +-
 .../org/apache/spark/sql/hudi/TestSpark3DDL.scala  | 68 ++-
 15 files changed, 273 insertions(+), 110 deletions(-)



[GitHub] [hudi] xiarixiaoyao merged pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-08 Thread GitBox


xiarixiaoyao merged PR #6017:
URL: https://github.com/apache/hudi/pull/6017


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-08 Thread GitBox


xiarixiaoyao commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1179284265

   ci passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni opened a new issue, #6068: [SUPPORT] Partition prunning broken with metadata disable

2022-07-08 Thread GitBox


parisni opened a new issue, #6068:
URL: https://github.com/apache/hudi/issues/6068

   hudi 0.11.1
   spark 3.2.1
   ---
   
   I have a huge performance drop when disabling metadata table at read time.
   Here is a reproductible example with 2k partitions. (spotted in production 
with 40k partitions)
   
   ```
   basePath = "/tmp/test_table"
   df = spark.range(1,2000).selectExpr("id", "id as part", "id as combine")
   hudi_options = {
   "hoodie.table.name": "test_table",
   "hoodie.datasource.write.recordkey.field": "id",
   "hoodie.datasource.write.partitionpath.field": "part",
   "hoodie.datasource.write.table.name": "test_table",
   "hoodie.datasource.write.operation": "bulk_insert",
   "hoodie.datasource.write.precombine.field": "combine",
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.hive_style_partitioning": "true",
   "hoodie.datasource.hive_sync.enable": "false",
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.metadata.enable": "true",
   }
   
(df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath))
   ```
   
   Then try both (restating spark between two tests)
   ```
   
spark.read.format("hudi").option("hoodie.metadata.enable","true").load(basePath).filter("part=1").show()
   
spark.read.format("hudi").option("hoodie.metadata.enable","false").load(basePath).filter("part=1").show()
   ```
   
   the former is fast, while the later, is as slow as reading the whole table 
(no partition prunning)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1179233985

   
   ## CI report:
   
   * 185ec5470953fc3420fb85c8d72c40bbf317f995 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9791)
 
   * 4cb269cf6fadc020ed8e512d673d3435a69e4740 UNKNOWN
   * 0c113baae795ddf7f884b868dcde0ddc3f8adb92 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1179227887

   
   ## CI report:
   
   * 8032fcaf15f8b246c8652ed5df49fa01564e2649 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9790)
 
   * 185ec5470953fc3420fb85c8d72c40bbf317f995 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9791)
 
   * 4cb269cf6fadc020ed8e512d673d3435a69e4740 UNKNOWN
   * 0c113baae795ddf7f884b868dcde0ddc3f8adb92 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on pull request #4915: [HUDI-3764] Allow loading external configs while querying Hudi tables with Spark

2022-07-08 Thread GitBox


zhedoubushishi commented on PR #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1179223141

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient

2022-07-08 Thread GitBox


fengjian428 commented on issue #6038:
URL: https://github.com/apache/hudi/issues/6038#issuecomment-1179203197

   Already discussed with @tommss via slack, I recommend using SparkWriteClient
   @tommss do you have any update about this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 opened a new issue, #6067: [SUPPORT] what the Incremental query should be ?

2022-07-08 Thread GitBox


fengjian428 opened a new issue, #6067:
URL: https://github.com/apache/hudi/issues/6067

   
   **Describe the problem you faced**
   
   Refer to the definition on 
https://hudi.apache.org/docs/quick-start-guide#incremental-query, I thought 
incremental query can query a change data history, but after using it, I found 
it only returns one version of FIle ID, should we improve this?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. create table hudi_cow (
 id int,
 name string,
 price double,
 ts long,
 par string
   ) using hudi
   tblproperties (
 type = 'cow',
 primaryKey = 'id',
 preCombineField = 'ts'
   )
   partitioned by (par)
   location 'basepath';
   2. 
   insert into hudi_cow select 1, 'a1', 20, 1000,'p1';
   insert into hudi_cow select 2, 'b1', 10, 344,'p1';
   insert into hudi_cow select 1, 'a3', 21, 2000,'p1';
   
   3. val hudi_df_begin = spark.read.format("hudi").
 option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY, 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL).
 option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "0").
 load("basepath/*")
   
   **Expected behavior**
   
   should return all 3 records, but only got two records
   
   **Environment Description**
   
   * Hudi version : master
   
   * Spark version : 3.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : no
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on issue #6058: [SUPPORT] Incremental and snapshot reads shows different results

2022-07-08 Thread GitBox


fengjian428 commented on issue #6058:
URL: https://github.com/apache/hudi/issues/6058#issuecomment-1179190199

   
   
   no, Hudi always keeps the latest version of records in DFS.
   
   > It looks like the response is here: #2841
   > 
   > So if I have a job that commits data 1 time per 15 minutes, to keep weekly 
data do I need to set the parameter `hoodie.cleaner.commits.retained` to 4 * 24 
* 7?
   
   For the different result issue, could you upload the file under /.hoodie 
here? or just give a screenshot first is also ok
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1179186723

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 028507e70c6ab8ea5682742495205c88f3c8c623 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9135)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1179183542

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 42ba86dce3ec0b072cc6ec727a27ff7fd8a8a51f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9715)
 
   * 028507e70c6ab8ea5682742495205c88f3c8c623 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6059: [HUDI-1575] Early Conflict Detection For Multi-writer

2022-07-08 Thread GitBox


hudi-bot commented on PR #6059:
URL: https://github.com/apache/hudi/pull/6059#issuecomment-1179177485

   
   ## CI report:
   
   * 58ea19aec0a87cf9567e805acb577dba7f1281bc UNKNOWN
   * 75704b9e85a30643ac667f5779c3d493c8cbadfb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-08 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1179177388

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   * 096cf9e861a5589ee57da2bc95e4e4cb28828431 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9788)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #6056: [SUPPORT] Metadata table suddenly not cleaned / compacted anymore

2022-07-08 Thread GitBox


yihua commented on issue #6056:
URL: https://github.com/apache/hudi/issues/6056#issuecomment-1179171076

   After inspecting the timeline and `.hoodie` folder, it is likely that the 
rollback of the pending requested commits does not kick in.  I'll investigate 
that.  Meantime, you can manually roll back those commits through Hudi CLI 
using `commit rollback` before restarting the jobs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #6066: [HUDI-4372] Enable matadata table by default for flink

2022-07-08 Thread GitBox


yihua commented on PR #6066:
URL: https://github.com/apache/hudi/pull/6066#issuecomment-1179157538

   @danny0405 In `HoodieMetadataConfig`, the default of 
"hoodie.metadata.enable" is hardcoded to `false` for Flink engine by 
`getDefaultMetadataEnable()`.  Could you change that as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1179125217

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 42ba86dce3ec0b072cc6ec727a27ff7fd8a8a51f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9715)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1179120702

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 00d5fed1954348b749859f8f81fec593422df774 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8758)
 
   * 42ba86dce3ec0b072cc6ec727a27ff7fd8a8a51f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (f20acb8dc3 -> fc8d96246a)

2022-07-08 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from f20acb8dc3 [HUDI-4367] Support copyToTable on call (#6054)
 add fc8d96246a [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post 
schema evolution. (#5995)

No new revisions were added by this update.

Summary of changes:
 .../hudi/aws/sync/AWSGlueCatalogSyncClient.java| 34 --
 1 file changed, 18 insertions(+), 16 deletions(-)



[GitHub] [hudi] xushiyan merged pull request #5995: [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution.

2022-07-08 Thread GitBox


xushiyan merged PR #5995:
URL: https://github.com/apache/hudi/pull/5995


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1179068194

   
   ## CI report:
   
   * c5051b524f5e241103afbdd9394d0a14a660c3bd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9779)
 
   * 8032fcaf15f8b246c8652ed5df49fa01564e2649 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9790)
 
   * 185ec5470953fc3420fb85c8d72c40bbf317f995 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9791)
 
   * 4cb269cf6fadc020ed8e512d673d3435a69e4740 UNKNOWN
   * 0c113baae795ddf7f884b868dcde0ddc3f8adb92 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1179059344

   
   ## CI report:
   
   * c5051b524f5e241103afbdd9394d0a14a660c3bd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9779)
 
   * 8032fcaf15f8b246c8652ed5df49fa01564e2649 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9790)
 
   * 185ec5470953fc3420fb85c8d72c40bbf317f995 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9791)
 
   * 4cb269cf6fadc020ed8e512d673d3435a69e4740 UNKNOWN
   * 0c113baae795ddf7f884b868dcde0ddc3f8adb92 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1179054663

   
   ## CI report:
   
   * c5051b524f5e241103afbdd9394d0a14a660c3bd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9779)
 
   * 8032fcaf15f8b246c8652ed5df49fa01564e2649 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9790)
 
   * 185ec5470953fc3420fb85c8d72c40bbf317f995 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9791)
 
   * 4cb269cf6fadc020ed8e512d673d3435a69e4740 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6059: [HUDI-1575] Early Conflict Detection For Multi-writer

2022-07-08 Thread GitBox


hudi-bot commented on PR #6059:
URL: https://github.com/apache/hudi/pull/6059#issuecomment-1179049897

   
   ## CI report:
   
   * 58ea19aec0a87cf9567e805acb577dba7f1281bc UNKNOWN
   * 66e21e6f632355ec72e89ca876c35953acf2b684 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9786)
 
   * 75704b9e85a30643ac667f5779c3d493c8cbadfb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6066: [HUDI-4372] Enable matadata table by default for flink

2022-07-08 Thread GitBox


hudi-bot commented on PR #6066:
URL: https://github.com/apache/hudi/pull/6066#issuecomment-1179049982

   
   ## CI report:
   
   * fcad9be8d245787907953977cd3567a03c842cdc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9787)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6053: [HUDI-3500] Add call procedure for RepairsCommand

2022-07-08 Thread GitBox


hudi-bot commented on PR #6053:
URL: https://github.com/apache/hudi/pull/6053#issuecomment-1179049828

   
   ## CI report:
   
   * 2f650218bde3804e2f1b293db2c15e3bc0f73d90 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9785)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] rishabhbandi commented on issue #6055: Hudi Partial Update not working by using MERGE statement on Hudi External Table

2022-07-08 Thread GitBox


rishabhbandi commented on issue #6055:
URL: https://github.com/apache/hudi/issues/6055#issuecomment-1179026939

   @hassan-ammar below command being used to create the spark shell - 
spark-shell --jars 
gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.22.2.jar,/edge_data/code/svcordrdats/pipeline-resources/hudi-support-jars/hudi-spark-bundle_2.12-0.11.0.jar
 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.kryoserializer.buffer.max=512m --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' 
--conf 'spark.sql.catalogImplementation=hive'
   
   
   you can save the hudi config as mentioned in my jira as a hudiConf.conf file 
and use that conf file in the options method.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


wzx140 commented on code in PR #5629:
URL: https://github.com/apache/hudi/pull/5629#discussion_r916849149


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##
@@ -135,21 +131,22 @@ public void runMerge(HoodieTable>, HoodieData encoderCache = new ThreadLocal<>();
-  ThreadLocal decoderCache = new ThreadLocal<>();
   wrapper = new 
BoundedInMemoryExecutor(table.getConfig().getWriteBufferLimitBytes(), 
readerIterator,
   new UpdateHandler(mergeHandle), record -> {
 if (!externalSchemaTransformation) {
   return record;
 }
-// TODO Other type of record need to change
-return transformRecordBasedOnNewSchema(gReader, gWriter, encoderCache, 
decoderCache, (GenericRecord) ((HoodieRecord)record).getData());
+try {
+  return ((HoodieRecord) record).rewriteRecord(writerSchema, 
readerSchema, new TypedProperties());

Review Comment:
   Properties is not need. will remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


wzx140 commented on code in PR #5629:
URL: https://github.com/apache/hudi/pull/5629#discussion_r916849149


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##
@@ -135,21 +131,22 @@ public void runMerge(HoodieTable>, HoodieData encoderCache = new ThreadLocal<>();
-  ThreadLocal decoderCache = new ThreadLocal<>();
   wrapper = new 
BoundedInMemoryExecutor(table.getConfig().getWriteBufferLimitBytes(), 
readerIterator,
   new UpdateHandler(mergeHandle), record -> {
 if (!externalSchemaTransformation) {
   return record;
 }
-// TODO Other type of record need to change
-return transformRecordBasedOnNewSchema(gReader, gWriter, encoderCache, 
decoderCache, (GenericRecord) ((HoodieRecord)record).getData());
+try {
+  return ((HoodieRecord) record).rewriteRecord(writerSchema, 
readerSchema, new TypedProperties());

Review Comment:
   will remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] minihippo commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


minihippo commented on code in PR #5629:
URL: https://github.com/apache/hudi/pull/5629#discussion_r914608155


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieParquetDataBlock.java:
##
@@ -93,53 +89,41 @@ protected byte[] serializeRecords(List 
records) throws IOException
 }
 
 Schema writerSchema = new 
Schema.Parser().parse(super.getLogBlockHeader().get(HeaderMetadataType.SCHEMA));
-
-HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(
-new AvroSchemaConverter().convert(writerSchema), writerSchema, 
Option.empty());
-
-HoodieParquetConfig avroParquetConfig =
-new HoodieParquetConfig<>(
-writeSupport,
-compressionCodecName.get(),
-ParquetWriter.DEFAULT_BLOCK_SIZE,
-ParquetWriter.DEFAULT_PAGE_SIZE,
-1024 * 1024 * 1024,
-new Configuration(),
-
Double.parseDouble(String.valueOf(0.1)));//HoodieStorageConfig.PARQUET_COMPRESSION_RATIO.defaultValue()));
-
 ByteArrayOutputStream baos = new ByteArrayOutputStream();
-
 try (FSDataOutputStream outputStream = new FSDataOutputStream(baos)) {
-  try (HoodieParquetStreamWriter parquetWriter = new 
HoodieParquetStreamWriter(outputStream, avroParquetConfig)) {
-for (HoodieRecord record : records) {
+  HoodieFileWriter parquetWriter = null;
+  HoodieStorageConfig storageConfig =  
HoodieStorageConfig.newBuilder().build();

Review Comment:
   A better way to do it



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkParquetReader.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.avro.Schema;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.HoodieInternalRowUtils;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.model.HoodieFileFormat;
+import org.apache.hudi.common.util.BaseFileUtils;
+import org.apache.hudi.common.util.ClosableIterator;
+import org.apache.hudi.common.util.ParquetReaderIterator;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.hadoop.api.ReadSupport;
+import org.apache.parquet.hadoop.util.HadoopInputFile;
+import org.apache.parquet.io.InputFile;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport;
+import org.apache.spark.sql.internal.SQLConf;
+import org.apache.spark.sql.types.StructType;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+public class HoodieSparkParquetReader implements HoodieSparkFileReader {
+
+  private final Path path;
+  private final Configuration conf;
+  private final BaseFileUtils parquetUtils;
+  private List readerIterators = new ArrayList<>();
+
+  public HoodieSparkParquetReader(Configuration conf, Path path) {
+this.path = path;
+this.conf = conf;
+this.parquetUtils = BaseFileUtils.getInstance(HoodieFileFormat.PARQUET);
+  }
+
+  @Override
+  public String[] readMinMaxRecordKeys() {
+return parquetUtils.readMinMaxRecordKeys(conf, path);
+  }
+
+  @Override
+  public BloomFilter readBloomFilter() {
+return parquetUtils.readBloomFilterFromMetadata(conf, path);
+  }
+
+  @Override
+  public Set filterRowKeys(Set candidateRowKeys) {
+return parquetUtils.filterRowKeys(conf, path, candidateRowKeys);
+  }
+
+  @Override
+  public ClosableIterator getInternalRowIterator(Schema schema) 
throws IOException {
+StructType structType = HoodieInternalRowUtils.getCachedSchema(schema);
+conf.set(ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA(), 
structType.json());
+// todo: get it from spark context
+conf.setBoolean(SQLConf.PARQUET_BINARY_AS_STRING().key(),false);
+conf.setBoolean(SQLConf.PARQUET_INT96_AS_TIMESTAMP().key(), true);
+InputFile inputFile = HadoopInputFile.fromPath(path, conf);
+ParquetReader reader = new ParquetReader.Builder(inputFile) {
+  @Override
+  protected ReadSupport getReadSupport() {
+return new ParquetReadSuppo

[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1179001089

   
   ## CI report:
   
   * c5051b524f5e241103afbdd9394d0a14a660c3bd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9779)
 
   * 8032fcaf15f8b246c8652ed5df49fa01564e2649 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9790)
 
   * 185ec5470953fc3420fb85c8d72c40bbf317f995 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] cuibo01 commented on a diff in pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


cuibo01 commented on code in PR #6013:
URL: https://github.com/apache/hudi/pull/6013#discussion_r916826110


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -0,0 +1,894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.catalog;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieFileFormat;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.exception.HoodieCatalogException;
+import org.apache.hudi.hadoop.utils.HoodieInputFormatUtils;
+import org.apache.hudi.sync.common.util.ConfigUtils;
+import org.apache.hudi.table.format.FilePathUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.avro.Schema;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.sql.parser.hive.ddl.SqlAlterHiveDatabase;
+import org.apache.flink.sql.parser.hive.ddl.SqlAlterHiveDatabaseOwner;
+import org.apache.flink.sql.parser.hive.ddl.SqlCreateHiveDatabase;
+import org.apache.flink.table.catalog.AbstractCatalog;
+import org.apache.flink.table.catalog.CatalogBaseTable;
+import org.apache.flink.table.catalog.CatalogDatabase;
+import org.apache.flink.table.catalog.CatalogDatabaseImpl;
+import org.apache.flink.table.catalog.CatalogFunction;
+import org.apache.flink.table.catalog.CatalogPartition;
+import org.apache.flink.table.catalog.CatalogPartitionSpec;
+import org.apache.flink.table.catalog.CatalogPropertiesUtil;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.catalog.CatalogView;
+import org.apache.flink.table.catalog.ObjectPath;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.exceptions.DatabaseAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.DatabaseNotEmptyException;
+import org.apache.flink.table.catalog.exceptions.DatabaseNotExistException;
+import org.apache.flink.table.catalog.exceptions.FunctionAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.FunctionNotExistException;
+import 
org.apache.flink.table.catalog.exceptions.PartitionAlreadyExistsException;
+import org.apache.flink.table.catalog.exceptions.PartitionNotExistException;
+import org.apache.flink.table.catalog.exceptions.PartitionSpecInvalidException;
+import org.apache.flink.table.catalog.exceptions.TableAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.TableNotExistException;
+import org.apache.flink.table.catalog.exceptions.TableNotPartitionedException;
+import org.apache.flink.table.catalog.exceptions.TablePartitionedException;
+import org.apache.flink.table.catalog.stats.CatalogColumnStatistics;
+import org.apache.flink.table.catalog.stats.CatalogTableStatistics;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.AlreadyExistsException;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.InvalidOperationException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownDBException;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.

[GitHub] [hudi] cuibo01 commented on a diff in pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


cuibo01 commented on code in PR #6013:
URL: https://github.com/apache/hudi/pull/6013#discussion_r916825060


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -0,0 +1,894 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.catalog;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieFileFormat;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.exception.HoodieCatalogException;
+import org.apache.hudi.hadoop.utils.HoodieInputFormatUtils;
+import org.apache.hudi.sync.common.util.ConfigUtils;
+import org.apache.hudi.table.format.FilePathUtils;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.avro.Schema;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.sql.parser.hive.ddl.SqlAlterHiveDatabase;
+import org.apache.flink.sql.parser.hive.ddl.SqlAlterHiveDatabaseOwner;
+import org.apache.flink.sql.parser.hive.ddl.SqlCreateHiveDatabase;
+import org.apache.flink.table.catalog.AbstractCatalog;
+import org.apache.flink.table.catalog.CatalogBaseTable;
+import org.apache.flink.table.catalog.CatalogDatabase;
+import org.apache.flink.table.catalog.CatalogDatabaseImpl;
+import org.apache.flink.table.catalog.CatalogFunction;
+import org.apache.flink.table.catalog.CatalogPartition;
+import org.apache.flink.table.catalog.CatalogPartitionSpec;
+import org.apache.flink.table.catalog.CatalogPropertiesUtil;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.catalog.CatalogView;
+import org.apache.flink.table.catalog.ObjectPath;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.exceptions.DatabaseAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.DatabaseNotEmptyException;
+import org.apache.flink.table.catalog.exceptions.DatabaseNotExistException;
+import org.apache.flink.table.catalog.exceptions.FunctionAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.FunctionNotExistException;
+import 
org.apache.flink.table.catalog.exceptions.PartitionAlreadyExistsException;
+import org.apache.flink.table.catalog.exceptions.PartitionNotExistException;
+import org.apache.flink.table.catalog.exceptions.PartitionSpecInvalidException;
+import org.apache.flink.table.catalog.exceptions.TableAlreadyExistException;
+import org.apache.flink.table.catalog.exceptions.TableNotExistException;
+import org.apache.flink.table.catalog.exceptions.TableNotPartitionedException;
+import org.apache.flink.table.catalog.exceptions.TablePartitionedException;
+import org.apache.flink.table.catalog.stats.CatalogColumnStatistics;
+import org.apache.flink.table.catalog.stats.CatalogTableStatistics;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.AlreadyExistsException;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.InvalidOperationException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownDBException;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.

[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1178996622

   
   ## CI report:
   
   * c5051b524f5e241103afbdd9394d0a14a660c3bd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9779)
 
   * 8032fcaf15f8b246c8652ed5df49fa01564e2649 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9790)
 
   * 185ec5470953fc3420fb85c8d72c40bbf317f995 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5995: [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution.

2022-07-08 Thread GitBox


hudi-bot commented on PR #5995:
URL: https://github.com/apache/hudi/pull/5995#issuecomment-1178983083

   
   ## CI report:
   
   * 62b02fd7de0e13ef06467d346497adfd69f99677 UNKNOWN
   * ef59231579c15d869a5c32db5bab7e30c5fc0e7f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9783)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


wzx140 commented on code in PR #5629:
URL: https://github.com/apache/hudi/pull/5629#discussion_r916796368


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##
@@ -84,21 +80,21 @@ public void runMerge(HoodieTable>, HoodieData gWriter;
-final GenericDatumReader gReader;
 Schema readSchema;
+Schema readerSchema;
+Schema writerSchema;

Review Comment:
   Will fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-08 Thread GitBox


wzx140 commented on code in PR #5629:
URL: https://github.com/apache/hudi/pull/5629#discussion_r916795887


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieDeleteHelper.java:
##
@@ -84,8 +86,13 @@ public HoodieWriteMetadata> 
execute(String instantTime,
 dedupedKeys = keys.repartition(parallelism);
   }
 
-  HoodieData> dedupedRecords =
-  dedupedKeys.map(key -> new HoodieAvroRecord(key, new 
EmptyHoodieRecordPayload()));
+  HoodieData dedupedRecords;
+  if (config.getRecordType() == HoodieRecordType.AVRO) {
+dedupedRecords =
+dedupedKeys.map(key -> new HoodieAvroRecord(key, new 
EmptyHoodieRecordPayload()));
+  } else {
+dedupedRecords = dedupedKeys.map(key -> new HoodieEmptyRecord<>(key, 
config.getRecordType()));
+  }

Review Comment:
   Yes, I will add some comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1178927681

   
   ## CI report:
   
   * c5051b524f5e241103afbdd9394d0a14a660c3bd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9779)
 
   * 8032fcaf15f8b246c8652ed5df49fa01564e2649 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9790)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1178923894

   
   ## CI report:
   
   * c5051b524f5e241103afbdd9394d0a14a660c3bd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9779)
 
   * 8032fcaf15f8b246c8652ed5df49fa01564e2649 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nochimow commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

2022-07-08 Thread GitBox


nochimow commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1178921465

   Same here. Still waiting for an update to upgrade our Hudi from 0.9


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6061: [HUDI-2150] Rename/Restructure configs for better modularity

2022-07-08 Thread GitBox


hudi-bot commented on PR #6061:
URL: https://github.com/apache/hudi/pull/6061#issuecomment-1178919708

   
   ## CI report:
   
   * 0dd3e11fa3848660f8f38d27969292151068799b UNKNOWN
   * 8681a0df94cbe30169fe4822260a0b38a08a5bf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9782)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-4374) Support BULK_INSERT row-writing on streaming Dataset/DataFrame

2022-07-08 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-4374:
-

 Summary: Support BULK_INSERT row-writing on streaming 
Dataset/DataFrame 
 Key: HUDI-4374
 URL: https://issues.apache.org/jira/browse/HUDI-4374
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit
Assignee: Sagar Sumit
 Fix For: 0.12.0


With structured streaming setup, when Hudi table is written from a streaming 
source, then HoodieStreamingSink calls HoodieSparkSqlWriter.write(). If 
BULK_INSERT operation type is set, then HoodieSparkSqlWriter.write() internally 
calls HoodieSparkSqlWriter.bulkInsertAsRow() which does a simple 
df.write.format("hudi").options(...).save(). The 'write' call does not work on 
streaming Dataset/DataFrame.
{code:java}
org.apache.spark.sql.AnalysisException: 'write' can not be called on streaming 
Dataset/DataFrame
    at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
    at org.apache.spark.sql.Dataset.write(Dataset.scala:3377)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:557)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:178)
    at 
org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$2(HoodieStreamingSink.scala:91)
    at scala.util.Try$.apply(Try.scala:213)
    at 
org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$1(HoodieStreamingSink.scala:90)
    at org.apache.hudi.HoodieStreamingSink.retry(HoodieStreamingSink.scala:166)
    at 
org.apache.hudi.HoodieStreamingSink.addBatch(HoodieStreamingSink.scala:89) 
{code}
Bulk insert can still be done by not going via the row-writing path. But, we 
need to fix the HoodieStreamingSink to support bulk insert via row-writing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #5409: [HUDI-3959] Rename class name for spark rdd reader

2022-07-08 Thread GitBox


hudi-bot commented on PR #5409:
URL: https://github.com/apache/hudi/pull/5409#issuecomment-1178862559

   
   ## CI report:
   
   * ce2821af9c594e381495433b7a0ee142347cd15c UNKNOWN
   * b32a1caaf9bd94606af20aca09b2011a34e110d4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9780)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6065: [HUDI-3503] Add call procedure for CleanCommand

2022-07-08 Thread GitBox


hudi-bot commented on PR #6065:
URL: https://github.com/apache/hudi/pull/6065#issuecomment-1178814977

   
   ## CI report:
   
   * 708c4225afc3106d46e6fd844d48fc46d69ebc5a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6059: [HUDI-1575] Early Conflict Detection For Multi-writer

2022-07-08 Thread GitBox


hudi-bot commented on PR #6059:
URL: https://github.com/apache/hudi/pull/6059#issuecomment-1178807009

   
   ## CI report:
   
   * 58ea19aec0a87cf9567e805acb577dba7f1281bc UNKNOWN
   * f79cccf7b6b38289f3d70da5e25e247ef2fc87f1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9766)
 
   * 66e21e6f632355ec72e89ca876c35953acf2b684 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9786)
 
   * 75704b9e85a30643ac667f5779c3d493c8cbadfb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-08 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1178806870

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   * dd5bce9cad8c64f5afb7d31663e9f281eee866c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9763)
 
   * 096cf9e861a5589ee57da2bc95e4e4cb28828431 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9788)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ccchenhe commented on issue #6034: [SUPPORT] Flink Bucket Index Can't Update Records Using 0.11.1

2022-07-08 Thread GitBox


ccchenhe commented on issue #6034:
URL: https://github.com/apache/hudi/issues/6034#issuecomment-1178805367

   > I see many useless options for the SQL, can you try this option again:
   > 
   > ```sql
   >   'hoodie.table.type' = 'COPY_ON_WRITE' 
   >  ,'hoodie.datasource.write.recordkey.field' = 'database,table,id'
   >  ,'hoodie.datasource.write.precombine.field' = 'update_time'
   >  ,'hoodie.parquet.compression.codec'= 'snappy'
   >  ,'connector' = 'hudi'
   >  ,'path' = '$hdfsPath'
   >  ,'index.bootstrap.enabled' = 'true'
   >  ,'index.type' = 'BUCKET'
   >  ,'hive_sync.partition_fields' = 'grass_date'
   >  ,'hive_sync.metastore.uris' = '$thrift://xxx'
   >  ,'hive_sync.db' = '$hiveDatabaseName'
   >  ,'hive_sync.table' = '$hiveTableName'
   >  ,'hive_sync.enable' = 'true'
   >  ,'hive_sync.use_jdbc' = 'false'
   >  ,'hive_sync.mode' = 'hms'
   >  ,'hoodie.datasource.write.hive_style_partitioning'= 'true'
   >  ,'write.tasks'='4'
   >  ,'write.index_bootstrap.tasks'='32'
   >  ,'write.rate.limit' = '64000'
   >  ,'write.precombine.field' = 'update_time'
   >  ,'hoodie.datasource.write.partitionpath.field' = 'grass_date'
   >  ,'hive_sync.partition_extractor_class' = 
'org.apache.hudi.hive.MultiPartKeysValueExtractor'
   >  ,'hoodie.bucket.index.num.buckets' = '20'
   >  ,'hoodie.bucket.index.hash.field' = 'database,table,id'
   > ```
   
   after testing , not works : (
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6059: [HUDI-1575] Early Conflict Detection For Multi-writer

2022-07-08 Thread GitBox


hudi-bot commented on PR #6059:
URL: https://github.com/apache/hudi/pull/6059#issuecomment-1178803001

   
   ## CI report:
   
   * 58ea19aec0a87cf9567e805acb577dba7f1281bc UNKNOWN
   * f79cccf7b6b38289f3d70da5e25e247ef2fc87f1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9766)
 
   * 66e21e6f632355ec72e89ca876c35953acf2b684 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9786)
 
   * 75704b9e85a30643ac667f5779c3d493c8cbadfb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-08 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1178802903

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   * dd5bce9cad8c64f5afb7d31663e9f281eee866c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9763)
 
   * 096cf9e861a5589ee57da2bc95e4e4cb28828431 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6013: [HUDI-4089] Support HMS for flink HoodieCatalog

2022-07-08 Thread GitBox


hudi-bot commented on PR #6013:
URL: https://github.com/apache/hudi/pull/6013#issuecomment-1178798507

   
   ## CI report:
   
   * c5051b524f5e241103afbdd9394d0a14a660c3bd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9779)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-08 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1178736414

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   * dd5bce9cad8c64f5afb7d31663e9f281eee866c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9763)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6065: [HUDI-3503] Add call procedure for CleanCommand

2022-07-08 Thread GitBox


hudi-bot commented on PR #6065:
URL: https://github.com/apache/hudi/pull/6065#issuecomment-1178736603

   
   ## CI report:
   
   * 3a2585a211eaefa224b62f17e5faed6a2f2f5c3a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9778)
 
   * 708c4225afc3106d46e6fd844d48fc46d69ebc5a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on pull request #6066: [HUDI-4372] Enable matadata table by default for flink

2022-07-08 Thread GitBox


XuQianJin-Stars commented on PR #6066:
URL: https://github.com/apache/hudi/pull/6066#issuecomment-1178707572

   This change is estimated to be released after version 0.12.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6065: [HUDI-3503] Add call procedure for CleanCommand

2022-07-08 Thread GitBox


hudi-bot commented on PR #6065:
URL: https://github.com/apache/hudi/pull/6065#issuecomment-1178692166

   
   ## CI report:
   
   * d1165514b2fa40be8d6ad5c5edbc15a4e916c015 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9777)
 
   * 3a2585a211eaefa224b62f17e5faed6a2f2f5c3a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9778)
 
   * 708c4225afc3106d46e6fd844d48fc46d69ebc5a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-08 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1178692017

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   * dd5bce9cad8c64f5afb7d31663e9f281eee866c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9763)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-08 Thread GitBox


xiarixiaoyao commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1178691528

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4373) Consistent bucket index write path for Flink engine

2022-07-08 Thread Yuwei Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuwei Xiao updated HUDI-4373:
-
Status: Open  (was: In Progress)

> Consistent bucket index write path for Flink engine
> ---
>
> Key: HUDI-4373
> URL: https://issues.apache.org/jira/browse/HUDI-4373
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Yuwei Xiao
>Assignee: Yuwei Xiao
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Simple bucket index (with fixed bucket number) is ready for Flink engine and 
> has been used widely in the community. 
> Since spark now support consistent bucket (dynamic bucket number), we should 
> bridge the gap and bring this feature to Flink too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4373) Consistent bucket index write path for Flink engine

2022-07-08 Thread Yuwei Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuwei Xiao updated HUDI-4373:
-
Epic Link: HUDI-3000

> Consistent bucket index write path for Flink engine
> ---
>
> Key: HUDI-4373
> URL: https://issues.apache.org/jira/browse/HUDI-4373
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Yuwei Xiao
>Assignee: Yuwei Xiao
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Simple bucket index (with fixed bucket number) is ready for Flink engine and 
> has been used widely in the community. 
> Since spark now support consistent bucket (dynamic bucket number), we should 
> bridge the gap and bring this feature to Flink too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4373) Consistent bucket index write path for Flink engine

2022-07-08 Thread Yuwei Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuwei Xiao updated HUDI-4373:
-
Status: In Progress  (was: Open)

> Consistent bucket index write path for Flink engine
> ---
>
> Key: HUDI-4373
> URL: https://issues.apache.org/jira/browse/HUDI-4373
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Yuwei Xiao
>Assignee: Yuwei Xiao
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Simple bucket index (with fixed bucket number) is ready for Flink engine and 
> has been used widely in the community. 
> Since spark now support consistent bucket (dynamic bucket number), we should 
> bridge the gap and bring this feature to Flink too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6066: [HUDI-4372] Enable matadata table by default for flink

2022-07-08 Thread GitBox


hudi-bot commented on PR #6066:
URL: https://github.com/apache/hudi/pull/6066#issuecomment-1178688019

   
   ## CI report:
   
   * fcad9be8d245787907953977cd3567a03c842cdc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9787)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4373) Consistent bucket index write path for Flink engine

2022-07-08 Thread Yuwei Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuwei Xiao updated HUDI-4373:
-
Parent: (was: HUDI-3000)
Issue Type: New Feature  (was: Sub-task)

> Consistent bucket index write path for Flink engine
> ---
>
> Key: HUDI-4373
> URL: https://issues.apache.org/jira/browse/HUDI-4373
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Yuwei Xiao
>Assignee: Yuwei Xiao
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Simple bucket index (with fixed bucket number) is ready for Flink engine and 
> has been used widely in the community. 
> Since spark now support consistent bucket (dynamic bucket number), we should 
> bridge the gap and bring this feature to Flink too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >