date:20220824

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1225818529

   
   ## CI report:
   
   * 50a075377f3723d1f8d4c222f653f0ae7446b28c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10898)
 
   * 214c313d89e0d8abc5ea356d0fc10c475b138ad2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10923)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-24 Thread GitBox



hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1225816974

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 8bd34a6bee3084bdc6029f3c0740cf06906acfd5 UNKNOWN
   * 819824dcc83e97a7a36dab27cb2f877c113de4c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10920)
 
   * 5165092d2dca99c4e684d76811ff3d38ca0ee049 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10922)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false

2022-08-24 Thread GitBox



hudi-bot commented on PR #6450:
URL: https://github.com/apache/hudi/pull/6450#issuecomment-1225811549

   
   ## CI report:
   
   * 50a075377f3723d1f8d4c222f653f0ae7446b28c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10898)
 
   * 214c313d89e0d8abc5ea356d0fc10c475b138ad2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-24 Thread GitBox



hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1225809494

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 8bd34a6bee3084bdc6029f3c0740cf06906acfd5 UNKNOWN
   * 819824dcc83e97a7a36dab27cb2f877c113de4c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10920)
 
   * 5165092d2dca99c4e684d76811ff3d38ca0ee049 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread GitBox



hudi-bot commented on PR #6486:
URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225803574

   
   ## CI report:
   
   * d6b7c487e76c46460a2fb0c9647aeea901d17995 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10921)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4708) RFC for diagnostic reporter

2022-08-24 Thread Sagar Sumit (Jira)

Sagar Sumit created HUDI-4708:
-

 Summary: RFC for diagnostic reporter
 Key: HUDI-4708
 URL: https://issues.apache.org/jira/browse/HUDI-4708
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4707) Diagnostic Reporter

2022-08-24 Thread Sagar Sumit (Jira)

Sagar Sumit created HUDI-4707:
-

 Summary: Diagnostic Reporter
 Key: HUDI-4707
 URL: https://issues.apache.org/jira/browse/HUDI-4707
 Project: Apache Hudi
  Issue Type: Epic
Reporter: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-24 Thread GitBox



hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1225649936

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 8bd34a6bee3084bdc6029f3c0740cf06906acfd5 UNKNOWN
   * 819824dcc83e97a7a36dab27cb2f877c113de4c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10920)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread GitBox



hudi-bot commented on PR #6486:
URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225645757

   
   ## CI report:
   
   * d6b7c487e76c46460a2fb0c9647aeea901d17995 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10921)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-24 Thread GitBox



hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1225644286

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 8bd34a6bee3084bdc6029f3c0740cf06906acfd5 UNKNOWN
   * a80d4bdd93c349b09b6e640dd2229379f2173ff0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10661)
 
   * 819824dcc83e97a7a36dab27cb2f877c113de4c6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10920)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread GitBox



hudi-bot commented on PR #6486:
URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225640170

   
   ## CI report:
   
   * d6b7c487e76c46460a2fb0c9647aeea901d17995 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-24 Thread GitBox



hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1225638673

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 8bd34a6bee3084bdc6029f3c0740cf06906acfd5 UNKNOWN
   * a80d4bdd93c349b09b6e640dd2229379f2173ff0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10661)
 
   * 819824dcc83e97a7a36dab27cb2f877c113de4c6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4706) Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4706:
-
Labels: pull-request-available  (was: )

> Fix InternalSchemaChangeApplier#applyAddChange error to add nest type
> -
>
> Key: HUDI-4706
> URL: https://issues.apache.org/jira/browse/HUDI-4706
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Frank Wong
>Priority: Major
>  Labels: pull-request-available
>
> Forget to remove parent name in InternalSchemaChangeApplier#applyAddChange
>  
> // line 52
> TableChanges.ColumnAddChange add = 
> TableChanges.ColumnAddChange.get(latestSchema);
> String parentName = TableChangesHelper.getParentName(colName);
> // insert col a.b inside of b
> // we need to insert col a inside of b
> // see the usage of ColumnAddChange#addColumns
> add.addColumns(parentName, colName, colType, doc);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] wzx140 opened a new pull request, #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread GitBox



wzx140 opened a new pull request, #6486:
URL: https://github.com/apache/hudi/pull/6486

   InternalSchemaChangeApplier#applyAddChange forget to remove parent name when 
calling ColumnAddChange#addColumns
   
   ### Change Logs
   
   Add removing parent name when calling ColumnAddChange#addColumns in 
InternalSchemaChangeApplier#applyAddChange
   
   ### Impact
   
   Error when use InternalSchemaChangeApplier#applyAddChange to add a nest type
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4706) Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread Frank Wong (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Wong updated HUDI-4706:
-
Description: 
Forget to remove parent name in InternalSchemaChangeApplier#applyAddChange

 

// line 52

TableChanges.ColumnAddChange add = 
TableChanges.ColumnAddChange.get(latestSchema);
String parentName = TableChangesHelper.getParentName(colName);

// insert col a.b inside of b

// we need to insert col a inside of b

// see the usage of ColumnAddChange#addColumns
add.addColumns(parentName, colName, colType, doc);

  was:
Forget to remove parent name in InternalSchemaChangeApplier#applyAddChange

 

// line 52

TableChanges.ColumnAddChange add = 
TableChanges.ColumnAddChange.get(latestSchema);
String parentName = TableChangesHelper.getParentName(colName);

// add a column called a.b inside of b We need insert a inside of b

// see the usage of ColumnAddChange#addColumns
add.addColumns(parentName, colName, colType, doc);


> Fix InternalSchemaChangeApplier#applyAddChange error to add nest type
> -
>
> Key: HUDI-4706
> URL: https://issues.apache.org/jira/browse/HUDI-4706
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Frank Wong
>Priority: Major
>
> Forget to remove parent name in InternalSchemaChangeApplier#applyAddChange
>  
> // line 52
> TableChanges.ColumnAddChange add = 
> TableChanges.ColumnAddChange.get(latestSchema);
> String parentName = TableChangesHelper.getParentName(colName);
> // insert col a.b inside of b
> // we need to insert col a inside of b
> // see the usage of ColumnAddChange#addColumns
> add.addColumns(parentName, colName, colType, doc);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4706) Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-24 Thread Frank Wong (Jira)

Frank Wong created HUDI-4706:


 Summary: Fix InternalSchemaChangeApplier#applyAddChange error to 
add nest type
 Key: HUDI-4706
 URL: https://issues.apache.org/jira/browse/HUDI-4706
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Frank Wong


Forget to remove parent name in InternalSchemaChangeApplier#applyAddChange

 

// line 52

TableChanges.ColumnAddChange add = 
TableChanges.ColumnAddChange.get(latestSchema);
String parentName = TableChangesHelper.getParentName(colName);

// add a column called a.b inside of b We need insert a inside of b

// see the usage of ColumnAddChange#addColumns
add.addColumns(parentName, colName, colType, doc);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] rubenssoto commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

2022-08-24 Thread GitBox



rubenssoto commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1225597366

   Great to know, I will test, thank you so much guys!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: HUDI-4687 add show_invalid_parquet procedure (#6480)

2022-08-24 Thread forwardxu

This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1e162bb73a HUDI-4687 add show_invalid_parquet procedure (#6480)
1e162bb73a is described below

commit 1e162bb73af9f39024e6e5d098958b9c0d926e6a
Author: shaoxiong.zhan <31836510+microbe...@users.noreply.github.com>
AuthorDate: Wed Aug 24 19:28:26 2022 +0800

HUDI-4687 add show_invalid_parquet procedure (#6480)

Co-authored-by: zhanshaoxiong 
---
 .../hudi/command/procedures/HoodieProcedures.scala |  1 +
 .../procedures/ShowInvalidParquetProcedure.scala   | 83 ++
 .../TestShowInvalidParquetProcedure.scala  | 71 ++
 3 files changed, 155 insertions(+)

diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/HoodieProcedures.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/HoodieProcedures.scala
index b245b54f61..49c88e5cd6 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/HoodieProcedures.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/HoodieProcedures.scala
@@ -82,6 +82,7 @@ object HoodieProcedures {
 mapBuilder.put(RepairOverwriteHoodiePropsProcedure.NAME, 
RepairOverwriteHoodiePropsProcedure.builder)
 mapBuilder.put(RunCleanProcedure.NAME, RunCleanProcedure.builder)
 mapBuilder.put(ValidateHoodieSyncProcedure.NAME, 
ValidateHoodieSyncProcedure.builder)
+mapBuilder.put(ShowInvalidParquetProcedure.NAME, 
ShowInvalidParquetProcedure.builder)
 mapBuilder.build
   }
 }
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowInvalidParquetProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowInvalidParquetProcedure.scala
new file mode 100644
index 00..11d170bbed
--- /dev/null
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowInvalidParquetProcedure.scala
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi.command.procedures
+
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.SerializableConfiguration
+import org.apache.hudi.common.fs.FSUtils
+import 
org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS
+import org.apache.parquet.hadoop.ParquetFileReader
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, 
StructType}
+
+import java.util.function.Supplier
+
+class ShowInvalidParquetProcedure extends BaseProcedure with ProcedureBuilder {
+  private val PARAMETERS = Array[ProcedureParameter](
+ProcedureParameter.required(0, "path", DataTypes.StringType, None)
+  )
+
+  private val OUTPUT_TYPE = new StructType(Array[StructField](
+StructField("path", DataTypes.StringType, nullable = true, Metadata.empty))
+  )
+
+  def parameters: Array[ProcedureParameter] = PARAMETERS
+
+  def outputType: StructType = OUTPUT_TYPE
+
+  override def call(args: ProcedureArgs): Seq[Row] = {
+super.checkArgs(PARAMETERS, args)
+
+val srcPath = getArgValueOrDefault(args, 
PARAMETERS(0)).get.asInstanceOf[String]
+val partitionPaths: java.util.List[String] = 
FSUtils.getAllPartitionPaths(new HoodieSparkEngineContext(jsc), srcPath, false, 
false)
+val javaRdd: JavaRDD[String] = jsc.parallelize(partitionPaths, 
partitionPaths.size())
+val serHadoopConf = new 
SerializableConfiguration(jsc.hadoopConfiguration())
+javaRdd.rdd.map(part => {
+  val fs = FSUtils.getFs(new Path(srcPath), serHadoopConf.get())
+  FSUtils.getAllDataFilesInPartition(fs, FSUtils.getPartitionPath(srcPath, 
part))
+}).flatMap(_.toList)
+  .filter(status => {
+val filePath = status.getPath
+var

[GitHub] [hudi] XuQianJin-Stars merged pull request #6480: [HUDI-4687] add show_invalid_parquet procedure

2022-08-24 Thread GitBox



XuQianJin-Stars merged PR #6480:
URL: https://github.com/apache/hudi/pull/6480


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6480: [HUDI-4687] add show_invalid_parquet procedure

2022-08-24 Thread GitBox



hudi-bot commented on PR #6480:
URL: https://github.com/apache/hudi/pull/6480#issuecomment-1225560364

   
   ## CI report:
   
   * c667cc5a20e0f37406d4729188dd619abe384e30 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10916)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-24 Thread GitBox



hudi-bot commented on PR #6000:
URL: https://github.com/apache/hudi/pull/6000#issuecomment-1225559313

   
   ## CI report:
   
   * f97660e842e3302f48e47d2801cf7436eaa1728d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10918)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] y0908105023 closed pull request #5188: [HUDI-3759] hive-exec shade is lost in flink-bundle-jar

2022-08-24 Thread GitBox



y0908105023 closed pull request #5188: [HUDI-3759] hive-exec shade is lost in 
flink-bundle-jar
URL: https://github.com/apache/hudi/pull/5188


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6485: [HUDI-4528] Add diff tool to compare commit metadata

2022-08-24 Thread GitBox



hudi-bot commented on PR #6485:
URL: https://github.com/apache/hudi/pull/6485#issuecomment-1225496278

   
   ## CI report:
   
   * c10b69a5a241b1544f878a61c1f41b282a7dbeb1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10917)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-24 Thread GitBox



hudi-bot commented on PR #6000:
URL: https://github.com/apache/hudi/pull/6000#issuecomment-1225495448

   
   ## CI report:
   
   * 06f352b0235cbbac215174c2755fca24009799c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10912)
 
   * f97660e842e3302f48e47d2801cf7436eaa1728d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10918)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-24 Thread GitBox



hudi-bot commented on PR #6000:
URL: https://github.com/apache/hudi/pull/6000#issuecomment-1225490223

   
   ## CI report:
   
   * 06f352b0235cbbac215174c2755fca24009799c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10912)
 
   * f97660e842e3302f48e47d2801cf7436eaa1728d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query

2022-08-24 Thread GitBox



hudi-bot commented on PR #6484:
URL: https://github.com/apache/hudi/pull/6484#issuecomment-1225485349

   
   ## CI report:
   
   * 5fa2c9cab3ed92e80292666043cbbd71cb24dc23 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10915)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on issue #6424: [SUPPORT] After schema evaluation, when time travel queries the historical data, the results show the latest schema instead of the historical sch

2022-08-24 Thread GitBox



xiarixiaoyao commented on issue #6424:
URL: https://github.com/apache/hudi/issues/6424#issuecomment-1225468199

   let me fixed it this week， 
   but maybe we need a config to control this behavior， 
   It also makes sense to return the latest schema  WDYT @YannByron @xxWSHxx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-4603) Improve HMS Catalog function in flink

2022-08-24 Thread xiaozhongcheng (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaozhongcheng closed HUDI-4603.

Fix Version/s: (was: 0.13.0)
   Resolution: Fixed

> Improve HMS Catalog function in flink
> -
>
> Key: HUDI-4603
> URL: https://issues.apache.org/jira/browse/HUDI-4603
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xiaozhongcheng
>Priority: Major
>  Labels: pull-request-available
>
> 1. Make users can choose to sync ro and rt table or not when using hms 
> catalog in flink.
> Currently, if users use hms catalog in flink, ro and rt table will be also 
> sync to the hms.
> But If I just want to sync the metadata of hudi table, I don't want to sync 
> ro and rt table.
> So, I think users should be able to choose to sync the ro and rt table or not.
>  
> 2. Make users can sync ro and rt table correctly when the table is 
> partitioned table.
> Currently, If users create the partitioned table in hms catalog, but the 
> partition field is not in the form of /mm/dd. If they don't make 
> hive_sync.partition_extractor_class to 
> org.apache.hudi.hive.HiveStylePartitionValueExtractor, ro and rt table will 
> not be synced correctly
> The stacks is list below:
>  
> {code:java}
> org.apache.hudi.exception.HoodieException: Got runtime exception when hive 
> syncing student22
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:144) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at 
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.doSyncHive(StreamWriteOperatorCoordinator.java:335)
>  ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at 
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.syncHive(StreamWriteOperatorCoordinator.java:326)
>  ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at 
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.handleEndInputEvent(StreamWriteOperatorCoordinator.java:426)
>  ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at 
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$handleEventFromOperator$3(StreamWriteOperatorCoordinator.java:278)
>  ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at 
> org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
>  ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_241]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_241]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_241]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_241]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_241]
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table student22_ro
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:340) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:157) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   ... 10 more
> Caused by: java.lang.IllegalArgumentException: Partition path school=beida is 
> not in the form /mm/dd 
>   at 
> org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor.extractPartitionValuesInPath(SlashEncodedDayPartitionValueExtractor.java:58)
>  ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at 
> org.apache.hudi.sync.common.HoodieSyncClient.getPartitionEvents(HoodieSyncClient.java:144)
>  ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:318) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:157) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141) 
> ~[hudi-flink1.15-bundle-0.12.0-rc2.jar:0.12.0-rc2]
>   ... 10 more {code}
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-4383) Make hudi-flink-bundle module compile with the correct flink version

2022-08-24 Thread xiaozhongcheng (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaozhongcheng closed HUDI-4383.

Fix Version/s: 0.12.1
   (was: 0.13.0)
   Resolution: Fixed

> Make hudi-flink-bundle module compile with the correct flink version
> 
>
> Key: HUDI-4383
> URL: https://issues.apache.org/jira/browse/HUDI-4383
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: xiaozhongcheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> At present, If I compile hudi withe the compile command of 
> {code:java}
> mvn clean package -DskipTests -Dhive.version=3.1.2 -Dscala.version=2.12.12 
> -Dscala.binary.version=2.12 -Pflink-bundle-shade-hive3 -Pflink1.15 
> -Dcheckstyle.skip=true -Dspark3.2 -Dscala-2.12 {code}
> the flink version of hudi-flink-bundle is still 1.14.
> Compile output:
> {code:java}
> [root@hadoop3 hudi-master]# mvn clean package -DskipTests 
> -Dhive.version=3.1.2 -Dscala.version=2.12.12 -Dscala.binary.version=2.12 
> -Pflink-bundle-shade-hive3 -Pflink1.15 -Dcheckstyle.skip=true -Dspark3.2 
> -Dscala-2.12
> [INFO] Scanning for projects...
> [WARNING] 
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-common:jar:0.12.0-SNAPSHOT
> [WARNING] 
> 'dependencyManagement.dependencies.dependency.(groupId:artifactId:type:classifier)'
>  must be unique: org.apache.logging.log4j:log4j-core:jar -> version 
> ${log4j2.version} vs ${log4j.test.version} @ 
> org.apache.hudi:hudi:0.12.0-SNAPSHOT, /data/hudi-master/pom.xml, line 1175, 
> column 19
> [WARNING] 
> 'dependencyManagement.dependencies.dependency.(groupId:artifactId:type:classifier)'
>  must be unique: org.apache.logging.log4j:log4j-api:jar -> version 
> ${log4j2.version} vs ${log4j.test.version} @ 
> org.apache.hudi:hudi:0.12.0-SNAPSHOT, /data/hudi-master/pom.xml, line 1182, 
> column 19
> [WARNING] 
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-cli:jar:0.12.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: org.apache.logging.log4j:log4j-core:jar -> duplicate declaration 
> of version (?) @ line 216, column 17
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: org.apache.logging.log4j:log4j-api:jar -> duplicate declaration of 
> version (?) @ line 222, column 17
> [WARNING] 
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark-common_2.12:jar:0.12.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark-common_${scala.binary.version}:0.12.0-SNAPSHOT, 
> /data/hudi-master/hudi-spark-datasource/hudi-spark-common/pom.xml, line 24, 
> column 15
> [WARNING] 
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark_2.12:jar:0.12.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark_${scala.binary.version}:0.12.0-SNAPSHOT, 
> /data/hudi-master/hudi-spark-datasource/hudi-spark/pom.xml, line 26, column 15
> [WARNING] 
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-utilities_2.12:jar:0.12.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-utilities_${scala.binary.version}:0.12.0-SNAPSHOT, 
> /data/hudi-master/hudi-utilities/pom.xml, line 26, column 15
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: org.apache.logging.log4j:log4j-core:jar -> duplicate declaration 
> of version (?) @ 
> org.apache.hudi:hudi-utilities_${scala.binary.version}:0.12.0-SNAPSHOT, 
> /data/hudi-master/hudi-utilities/pom.xml, line 501, column 17
> [WARNING] 
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-hive-sync:jar:0.12.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: org.apache.logging.log4j:log4j-core:jar -> duplicate declaration 
> of version (?) @ line 162, column 17
> [WARNING] 
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark3.2-bundle_2.12:jar:0.12.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark${sparkbundle.version}-bundle_${scala.binary.version}:0.12.0-SNAPSHOT,
>  /data/hudi-master/packaging/hudi-spark-bundle/pom.xml, line 26, column 15
> [WARNING] 
> [WARNING] Some problems were encountered while

[GitHub] [hudi] hudi-bot commented on pull request #6480: [HUDI-4687] add show_invalid_parquet procedure

2022-08-24 Thread GitBox



hudi-bot commented on PR #6480:
URL: https://github.com/apache/hudi/pull/6480#issuecomment-1225386287

   
   ## CI report:
   
   * 908a242fc97a46ab57fd3f20f6481376a1595e78 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10914)
 
   * c667cc5a20e0f37406d4729188dd619abe384e30 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10916)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6485: [HUDI-4528] Add diff tool to compare commit metadata

2022-08-24 Thread GitBox



hudi-bot commented on PR #6485:
URL: https://github.com/apache/hudi/pull/6485#issuecomment-1225386358

   
   ## CI report:
   
   * c10b69a5a241b1544f878a61c1f41b282a7dbeb1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10917)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6485: [HUDI-4528] Add diff tool to compare commit metadata

2022-08-24 Thread GitBox



hudi-bot commented on PR #6485:
URL: https://github.com/apache/hudi/pull/6485#issuecomment-1225380741

   
   ## CI report:
   
   * c10b69a5a241b1544f878a61c1f41b282a7dbeb1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6480: [HUDI-4687] add show_invalid_parquet procedure

2022-08-24 Thread GitBox



hudi-bot commented on PR #6480:
URL: https://github.com/apache/hudi/pull/6480#issuecomment-1225380676

   
   ## CI report:
   
   * 908a242fc97a46ab57fd3f20f6481376a1595e78 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10914)
 
   * c667cc5a20e0f37406d4729188dd619abe384e30 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4705) Support Write-on-compaction mode when query cdc on MOR tables

2022-08-24 Thread Yann Byron (Jira)

Yann Byron created HUDI-4705:


 Summary: Support Write-on-compaction mode when query cdc on MOR 
tables
 Key: HUDI-4705
 URL: https://issues.apache.org/jira/browse/HUDI-4705
 Project: Apache Hudi
  Issue Type: New Feature
  Components: compaction, spark
Reporter: Yann Byron


For the case that query cdc on MOR tables, the initial implementation use the 
`Write-on-indexing`  way to extract the cdc data by merging the base file and 
log files in-flight.

This ticket wants to support the `Write-on-compaction` way to get the cdc data 
just by reading the persisted cdc files which are written at the compaction 
operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] prasannarajaperumal commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-24 Thread GitBox



prasannarajaperumal commented on code in PR #6256:
URL: https://github.com/apache/hudi/pull/6256#discussion_r953478052


##
rfc/rfc-51/rfc-51.md:
##
@@ -148,20 +152,46 @@ hudi_cdc_table/
 
 Under a partition directory, the `.log` file with `CDCBlock` above will keep 
the changing data we have to materialize.
 
-There is an option to control what data is written to `CDCBlock`, that is 
`hoodie.table.cdc.supplemental.logging`. See the description of this config 
above.
+ Persisting CDC in MOR: Write-on-indexing vs Write-on-compaction
+
+2 design choices on when to persist CDC in MOR tables:
+
+Write-on-indexing allows CDC info to be persisted at the earliest, however, in 
case of Flink writer or Bucket

Review Comment:
   in case where row_key lookup is NOT done during writing (e.g. Bucket Index) 
the `op` data cannot be deduced and hence the CDC log block cannot be written 
out. 



##
rfc/rfc-51/rfc-51.md:
##
@@ -148,20 +152,46 @@ hudi_cdc_table/
 
 Under a partition directory, the `.log` file with `CDCBlock` above will keep 
the changing data we have to materialize.
 
-There is an option to control what data is written to `CDCBlock`, that is 
`hoodie.table.cdc.supplemental.logging`. See the description of this config 
above.
+ Persisting CDC in MOR: Write-on-indexing vs Write-on-compaction
+
+2 design choices on when to persist CDC in MOR tables:
+
+Write-on-indexing allows CDC info to be persisted at the earliest, however, in 
case of Flink writer or Bucket
+indexing, `op` (I/U/D) data is not available at indexing.
+
+Write-on-compaction can always persist CDC info and achieve standardization of 
implementation logic across engines,
+however, some delays are added to the CDC query results. Based on the business 
requirements, Log Compaction (RFC-48) or
+scheduling more frequent compaction can be used to minimize the latency.
 
-Spark DataSource example:
+The semantics we propose to establish are: when base files are written, the 
corresponding CDC data is also persisted.
+
+- For Spark
+  - inserts are written to base files: the CDC data `op=I` will be persisted
+  - updates/deletes that written to log files are compacted into base files: 
the CDC data `op=U|D` will be persisted
+- For Flink
+  - inserts/updates/deletes that written to log files are compacted into base 
files: the CDC data `op=I|U|D` will be
+persisted
+
+In summary, we propose CDC data to be persisted synchronously upon base files 
generation. It is therefore
+write-on-indexing for Spark inserts (non-bucket index) and write-on-compaction 
for everything else.
+
+Note that it may also be necessary to provide capabilities for asynchronously 
persisting CDC data, in terms of a
+separate table service like `ChangeTrackingService`, which can be scheduled to 
fine-tune the CDC-persisting timings.
+This can be used to meet low-latency optimized-read requirements when 
applicable.

Review Comment:
   I would take this line out. Not related in my opinion. 



##
rfc/rfc-51/rfc-51.md:
##
@@ -42,11 +43,11 @@ In cases where Hudi tables used as streaming sources, we 
want to be aware of all
 
 To implement this feature, we need to implement the logic on the write and 
read path to let Hudi figure out the changed data when read. In some cases, we 
need to write extra data to help optimize CDC queries.
 
-## Scenarios
+## Scenario Illustration

Review Comment:
   Can we call out / illustrate the scenarios where insert and 
delete should produce separate CDC rows and that is a requirement for 
this design



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] 15663671003 commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-08-24 Thread GitBox



15663671003 commented on issue #5765:
URL: https://github.com/apache/hudi/issues/5765#issuecomment-1225336035

   > Version 0.12 has been released. Is this bug fixed？
   
   in spark-3.2.2,hudi-0.12.0, it not fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4528) Diff tool to compare metadata across snapshots in a given time range

2022-08-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4528:
-
Labels: pull-request-available  (was: )

> Diff tool to compare metadata across snapshots in a given time range
> 
>
> Key: HUDI-4528
> URL: https://issues.apache.org/jira/browse/HUDI-4528
> Project: Apache Hudi
>  Issue Type: Task
>  Components: cli
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> A tool that diffs two snapshots at table and partition level and can give 
> info about what new file ids got created, deleted, updated and track other 
> changes that are captured in write stats. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] codope opened a new pull request, #6485: [HUDI-4528] Add diff tool to compare commit metadata

2022-08-24 Thread GitBox



codope opened a new pull request, #6485:
URL: https://github.com/apache/hudi/pull/6485

   ### Change Logs
   
   - Add merge API to merge two timelines in `HoodieDefaultTimeline`. This 
enables merging active and archived timeline.
   - Add partition level info to commits and compaction command.
   - Add diff command to compare metadata.
   - Add commands to list out all actions for file id through a range of 
commits.
   - Add unit tests.
   
   ### Impact
   
   Changes can be helpful while debugging timeline using CLI.
   
   **Risk level: none | low | medium | high**
   
   low
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] prasannarajaperumal commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-24 Thread GitBox



prasannarajaperumal commented on code in PR #6256:
URL: https://github.com/apache/hudi/pull/6256#discussion_r953468420


##
rfc/rfc-51/rfc-51.md:
##
@@ -148,20 +152,46 @@ hudi_cdc_table/
 
 Under a partition directory, the `.log` file with `CDCBlock` above will keep 
the changing data we have to materialize.
 
-There is an option to control what data is written to `CDCBlock`, that is 
`hoodie.table.cdc.supplemental.logging`. See the description of this config 
above.
+ Persisting CDC in MOR: Write-on-indexing vs Write-on-compaction
+
+2 design choices on when to persist CDC in MOR tables:
+
+Write-on-indexing allows CDC info to be persisted at the earliest, however, in 
case of Flink writer or Bucket
+indexing, `op` (I/U/D) data is not available at indexing.
+
+Write-on-compaction can always persist CDC info and achieve standardization of 
implementation logic across engines,
+however, some delays are added to the CDC query results. Based on the business 
requirements, Log Compaction (RFC-48) or
+scheduling more frequent compaction can be used to minimize the latency.
 
-Spark DataSource example:
+The semantics we propose to establish are: when base files are written, the 
corresponding CDC data is also persisted.
+
+- For Spark
+  - inserts are written to base files: the CDC data `op=I` will be persisted
+  - updates/deletes that written to log files are compacted into base files: 
the CDC data `op=U|D` will be persisted
+- For Flink
+  - inserts/updates/deletes that written to log files are compacted into base 
files: the CDC data `op=I|U|D` will be
+persisted
+
+In summary, we propose CDC data to be persisted synchronously upon base files 
generation. It is therefore
+write-on-indexing for Spark inserts (non-bucket index) and write-on-compaction 
for everything else.
+
+Note that it may also be necessary to provide capabilities for asynchronously 
persisting CDC data, in terms of a
+separate table service like `ChangeTrackingService`, which can be scheduled to 
fine-tune the CDC-persisting timings.

Review Comment:
   CDC Availability SLA, effectively decoupling it with Compaction frequency



##
rfc/rfc-51/rfc-51.md:
##
@@ -148,20 +152,27 @@ hudi_cdc_table/
 
 Under a partition directory, the `.log` file with `CDCBlock` above will keep 
the changing data we have to materialize.
 
-There is an option to control what data is written to `CDCBlock`, that is 
`hoodie.table.cdc.supplemental.logging`. See the description of this config 
above.
+ Write-on-indexing vs Write-on-compaction

Review Comment:
   @xushiyan @YannByron - Lets link the jira once its created. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6438: [HUDI-4642] Adding support to hudi-cli to repair depcrated partition

2022-08-24 Thread GitBox



nsivabalan commented on code in PR #6438:
URL: https://github.com/apache/hudi/pull/6438#discussion_r953460424


##
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##
@@ -263,4 +264,30 @@ public String migratePartitionMeta(
 HoodieTableHeaderFields.HEADER_ACTION
 }, rows);
   }
+
+  @CliCommand(value = "repair deprecated partition",
+  help = "Repair deprecated partition (\"default\"). Re-writes data from 
the deprecated partition into " + 
PartitionPathEncodeUtils.DEFAULT_PARTITION_PATH)

Review Comment:
   sg. will prioritize landing this and then adding more enchancements. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6480: [HUDI-4687] add show_invalid_parquet procedure

2022-08-24 Thread GitBox



hudi-bot commented on PR #6480:
URL: https://github.com/apache/hudi/pull/6480#issuecomment-1225315510

   
   ## CI report:
   
   * 908a242fc97a46ab57fd3f20f6481376a1595e78 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10914)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query

2022-08-24 Thread GitBox



hudi-bot commented on PR #6484:
URL: https://github.com/apache/hudi/pull/6484#issuecomment-1225315532

   
   ## CI report:
   
   * 5fa2c9cab3ed92e80292666043cbbd71cb24dc23 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10915)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4703) use the corresponding schema (not the latest schema) to response the time travel query

2022-08-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4703:
-
Labels: pull-request-available  (was: )

> use the corresponding schema (not the latest schema) to response the time 
> travel query 
> ---
>
> Key: HUDI-4703
> URL: https://issues.apache.org/jira/browse/HUDI-4703
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Yann Byron
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/hudi/issues/6424



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query

2022-08-24 Thread GitBox



hudi-bot commented on PR #6484:
URL: https://github.com/apache/hudi/pull/6484#issuecomment-1225310516

   
   ## CI report:
   
   * 5fa2c9cab3ed92e80292666043cbbd71cb24dc23 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6480: [HUDI-4687] add show_invalid_parquet procedure

2022-08-24 Thread GitBox



hudi-bot commented on PR #6480:
URL: https://github.com/apache/hudi/pull/6480#issuecomment-1225310481

   
   ## CI report:
   
   * 9d161840463bb97d4872ce8a2c376cb9e0d00440 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10904)
 
   * 908a242fc97a46ab57fd3f20f6481376a1595e78 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4704) bulk insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread zouxxyy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy updated HUDI-4704:
--
Affects Version/s: 0.12.0

> bulk insert overwrite table will delete the table and then recreate a table
> ---
>
> Key: HUDI-4704
> URL: https://issues.apache.org/jira/browse/HUDI-4704
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql, writer-core
>Affects Versions: 0.12.0
>Reporter: zouxxyy
>Priority: Major
>
> When hoodie.sql.bulk.insert.enable is enabled, executing insert overwrite 
> will delete the table and then recreate a table, so time travel cannot be 
> performed.
>  
> {code:java}
> create table hudi_cow_test_tbl (
>   id bigint,
>   name string,
>   ts bigint,
>   dt string,
>   hh string
> ) using hudi
> tblproperties (
>   type = 'cow',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   'hoodie.sql.insert.mode' = 'non-strict',
>   'hoodie.sql.bulk.insert.enable' = 'true'
> );
> insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';
> insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4704) bulk insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread zouxxyy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy updated HUDI-4704:
--
Component/s: spark-sql
 writer-core

> bulk insert overwrite table will delete the table and then recreate a table
> ---
>
> Key: HUDI-4704
> URL: https://issues.apache.org/jira/browse/HUDI-4704
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql, writer-core
>Reporter: zouxxyy
>Priority: Major
>
> When hoodie.sql.bulk.insert.enable is enabled, executing insert overwrite 
> will delete the table and then recreate a table, so time travel cannot be 
> performed.
>  
> {code:java}
> create table hudi_cow_test_tbl (
>   id bigint,
>   name string,
>   ts bigint,
>   dt string,
>   hh string
> ) using hudi
> tblproperties (
>   type = 'cow',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   'hoodie.sql.insert.mode' = 'non-strict',
>   'hoodie.sql.bulk.insert.enable' = 'true'
> );
> insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';
> insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4704) bulk insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread zouxxyy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy reassigned HUDI-4704:
-

Assignee: (was: zouxxyy)

> bulk insert overwrite table will delete the table and then recreate a table
> ---
>
> Key: HUDI-4704
> URL: https://issues.apache.org/jira/browse/HUDI-4704
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Priority: Major
>
> When hoodie.sql.bulk.insert.enable is enabled, executing insert overwrite 
> will delete the table and then recreate a table, so time travel cannot be 
> performed.
>  
> {code:java}
> create table hudi_cow_test_tbl (
>   id bigint,
>   name string,
>   ts bigint,
>   dt string,
>   hh string
> ) using hudi
> tblproperties (
>   type = 'cow',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   'hoodie.sql.insert.mode' = 'non-strict',
>   'hoodie.sql.bulk.insert.enable' = 'true'
> );
> insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';
> insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] Zouxxyy commented on issue #6452: [SUPPORT] When enable bulk insert, executing insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread GitBox



Zouxxyy commented on issue #6452:
URL: https://github.com/apache/hudi/issues/6452#issuecomment-1225307072

   I tried to solve the problem by just deleting `fs.delete(tablePath, true)` 
in `HoodieSparkSqlWriter`.
   
   But after that,  execute the query, the old data is not overwritten
   
   By looking at the code, I found that the `WriteOperationType` of bulk insert 
overwrite is `BULK_INSERT`, not `INSERT_OVERWRITE`. At this time, only commit 
will be executed, but replace will not be executed.
   
   I don't know good way to solve this problem, maybe add a new type 
`BULK_INSERT_OVERWRITE`? Or first determine whether it is `OVERWRITE`, and then 
determine the type of insert operation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] 15663671003 commented on issue #6483: [SUPPORT]

2022-08-24 Thread GitBox



15663671003 commented on issue #6483:
URL: https://github.com/apache/hudi/issues/6483#issuecomment-1225300932

   when set hoodie.datasource.hive_sync.mode="hms", hudi can be sync


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-24 Thread GitBox



danny0405 commented on code in PR #6000:
URL: https://github.com/apache/hudi/pull/6000#discussion_r953405253


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java:
##
@@ -74,16 +74,57 @@ public class HoodieActiveTimeline extends 
HoodieDefaultTimeline {
   REQUESTED_REPLACE_COMMIT_EXTENSION, INFLIGHT_REPLACE_COMMIT_EXTENSION, 
REPLACE_COMMIT_EXTENSION,
   REQUESTED_INDEX_COMMIT_EXTENSION, INFLIGHT_INDEX_COMMIT_EXTENSION, 
INDEX_COMMIT_EXTENSION,
   REQUESTED_SAVE_SCHEMA_ACTION_EXTENSION, 
INFLIGHT_SAVE_SCHEMA_ACTION_EXTENSION, SAVE_SCHEMA_ACTION_EXTENSION));
+
+  private static final Set NOT_PARSABLE_TIMESTAMPS = new 
HashSet(3) {{
+  add(HoodieTimeline.INIT_INSTANT_TS);
+  add(HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS);
+  add(HoodieTimeline.FULL_BOOTSTRAP_INSTANT_TS);
+}};
+
   private static final Logger LOG = 
LogManager.getLogger(HoodieActiveTimeline.class);
   protected HoodieTableMetaClient metaClient;
 
   /**
* Parse the timestamp of an Instant and return a {@code Date}.
+   * Throw ParseException if timestamp not valid format as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   *
+   * @param timestamp a timestamp String which follow pattern as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   * @return Date of instant timestamp
*/
   public static Date parseDateFromInstantTime(String timestamp) throws 
ParseException {
 return HoodieInstantTimeGenerator.parseDateFromInstantTime(timestamp);
   }
 
+  /**
+   * The same format method as above, but this method will mute ParseException.
+   * If the gaven timestamp is invalid, then will return {@code Option.empty}.
+   * Or a corresponding Date value if these timestamp provided
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#INIT_INSTANT_TS},
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#METADATA_BOOTSTRAP_INSTANT_TS},
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#FULL_BOOTSTRAP_INSTANT_TS}.
+   * This method is useful when parse timestamp for metrics
+   *
+   * @param timestamp a timestamp String which follow pattern as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   * @return {@code Option} of instant timestamp, {@code Option.empty} 
if invalid timestamp
+   */
+  public static Option parseDateFromInstantTimeSafely(String timestamp) {
+Option parsedDate;
+try {
+  parsedDate = 
Option.of(HoodieInstantTimeGenerator.parseDateFromInstantTime(timestamp));
+} catch (ParseException e) {
+  if (NOT_PARSABLE_TIMESTAMPS.contains(timestamp)) {
+parsedDate = Option.of(new Date(Integer.parseInt(timestamp)));
+  } else {
+LOG.warn("Failed to parse timestamp " + timestamp + " because of " + 
e.getMessage());
+parsedDate = Option.empty();
+  }

Review Comment:
   ` because of` -> `:`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-24 Thread GitBox



danny0405 commented on code in PR #6000:
URL: https://github.com/apache/hudi/pull/6000#discussion_r953404878


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java:
##
@@ -74,16 +74,57 @@ public class HoodieActiveTimeline extends 
HoodieDefaultTimeline {
   REQUESTED_REPLACE_COMMIT_EXTENSION, INFLIGHT_REPLACE_COMMIT_EXTENSION, 
REPLACE_COMMIT_EXTENSION,
   REQUESTED_INDEX_COMMIT_EXTENSION, INFLIGHT_INDEX_COMMIT_EXTENSION, 
INDEX_COMMIT_EXTENSION,
   REQUESTED_SAVE_SCHEMA_ACTION_EXTENSION, 
INFLIGHT_SAVE_SCHEMA_ACTION_EXTENSION, SAVE_SCHEMA_ACTION_EXTENSION));
+
+  private static final Set NOT_PARSABLE_TIMESTAMPS = new 
HashSet(3) {{
+  add(HoodieTimeline.INIT_INSTANT_TS);
+  add(HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS);
+  add(HoodieTimeline.FULL_BOOTSTRAP_INSTANT_TS);
+}};
+
   private static final Logger LOG = 
LogManager.getLogger(HoodieActiveTimeline.class);
   protected HoodieTableMetaClient metaClient;
 
   /**
* Parse the timestamp of an Instant and return a {@code Date}.
+   * Throw ParseException if timestamp not valid format as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   *
+   * @param timestamp a timestamp String which follow pattern as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   * @return Date of instant timestamp
*/
   public static Date parseDateFromInstantTime(String timestamp) throws 
ParseException {
 return HoodieInstantTimeGenerator.parseDateFromInstantTime(timestamp);
   }
 
+  /**
+   * The same format method as above, but this method will mute ParseException.
+   * If the gaven timestamp is invalid, then will return {@code Option.empty}.
+   * Or a corresponding Date value if these timestamp provided
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#INIT_INSTANT_TS},
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#METADATA_BOOTSTRAP_INSTANT_TS},
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#FULL_BOOTSTRAP_INSTANT_TS}.
+   * This method is useful when parse timestamp for metrics
+   *

Review Comment:
   `when parse` -> `when parsing`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-24 Thread GitBox



danny0405 commented on code in PR #6000:
URL: https://github.com/apache/hudi/pull/6000#discussion_r953404485


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java:
##
@@ -74,16 +74,57 @@ public class HoodieActiveTimeline extends 
HoodieDefaultTimeline {
   REQUESTED_REPLACE_COMMIT_EXTENSION, INFLIGHT_REPLACE_COMMIT_EXTENSION, 
REPLACE_COMMIT_EXTENSION,
   REQUESTED_INDEX_COMMIT_EXTENSION, INFLIGHT_INDEX_COMMIT_EXTENSION, 
INDEX_COMMIT_EXTENSION,
   REQUESTED_SAVE_SCHEMA_ACTION_EXTENSION, 
INFLIGHT_SAVE_SCHEMA_ACTION_EXTENSION, SAVE_SCHEMA_ACTION_EXTENSION));
+
+  private static final Set NOT_PARSABLE_TIMESTAMPS = new 
HashSet(3) {{
+  add(HoodieTimeline.INIT_INSTANT_TS);
+  add(HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS);
+  add(HoodieTimeline.FULL_BOOTSTRAP_INSTANT_TS);
+}};
+
   private static final Logger LOG = 
LogManager.getLogger(HoodieActiveTimeline.class);
   protected HoodieTableMetaClient metaClient;
 
   /**
* Parse the timestamp of an Instant and return a {@code Date}.
+   * Throw ParseException if timestamp not valid format as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   *
+   * @param timestamp a timestamp String which follow pattern as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   * @return Date of instant timestamp
*/
   public static Date parseDateFromInstantTime(String timestamp) throws 
ParseException {
 return HoodieInstantTimeGenerator.parseDateFromInstantTime(timestamp);
   }
 
+  /**
+   * The same format method as above, but this method will mute ParseException.
+   * If the gaven timestamp is invalid, then will return {@code Option.empty}.
+   * Or a corresponding Date value if these timestamp provided
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#INIT_INSTANT_TS},
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#METADATA_BOOTSTRAP_INSTANT_TS},
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#FULL_BOOTSTRAP_INSTANT_TS}.

Review Comment:
   `Or a corresponding Date value if these timestamp provided`
   -> `Or a corresponding Date value if these timestamp strings are provided`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] 15663671003 opened a new issue, #6483: [SUPPORT]

2022-08-24 Thread GitBox



15663671003 opened a new issue, #6483:
URL: https://github.com/apache/hudi/issues/6483

   **Describe the problem you faced**
   
   can't connect to hive when sync 
   
   **Expected behavior**
   
   Install spark3.2.2 in CDH6.2.1 environment, and run an operation to write to 
hudi0.12.0, the problem of synchronization hive failure occurs, I guess whether 
hudi0.12.0 only supports hive3.x, what should I do, please help me.
   
   **Environment Description**
   
   * Hudi version : 0.12.0
   
   * Spark version : 3.2.2
   
   * Hive version : 2.1.1
   
   * Hadoop version : 3.0.0-cdh6.2.1
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   ```
   [root@gt4 test]# sudo -u admin 
/opt/spark-3.2.2-bin-3.0.0-cdh6.2.1/bin/pyspark --num-executors 200 
--executor-cores 1 --executor-memory 8g --driver-memory 4g --conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' 
--jars /home/test/hudi-spark3.2-bundle_2.12-0.12.0.jar
   Python 3.6.8 (default, Nov 16 2020, 16:55:22)
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   22/08/24 14:30:46 WARN conf.HiveConf: HiveConf of name 
hive.vectorized.use.checked.expressions does not exist
   22/08/24 14:30:46 WARN conf.HiveConf: HiveConf of name 
hive.strict.checks.no.partition.filter does not exist
   22/08/24 14:30:46 WARN conf.HiveConf: HiveConf of name 
hive.strict.checks.orderby.no.limit does not exist
   22/08/24 14:30:46 WARN conf.HiveConf: HiveConf of name 
hive.vectorized.input.format.excludes does not exist
   /opt/spark-3.2.2-bin-3.0.0-cdh6.2.1/python/pyspark/context.py:238: 
FutureWarning: Python 3.6 support is deprecated in Spark 3.2.
 FutureWarning
   Welcome to
   __
/ __/__  ___ _/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /__ / .__/\_,_/_/ /_/\_\   version 3.2.2
 /_/
   
   Using Python version 3.6.8 (default, Nov 16 2020 16:55:22)
   Spark context Web UI available at http://gt4.dwh.antiytip.com:4040
   Spark context available as 'sc' (master = yarn, app id = 
application_1660892079989_11440).
   SparkSession available as 'spark'.
   >>> def write():
   ... import datetime
   ... df = spark.createDataFrame(
   ... [
   ... [i, i, i, datetime.datetime.now().strftime("%Y%m%d%H%M%S"), 
'a', 'b', ]
   ... for i in range(1)
   ... ],
   ... ['k1', 'k2', 'v', 'cmp_key', 'pt1', 'pt2', ]
   ... )
   ... table_name = "hudi_0_12_0_test"
   ... db_name = "user_test"
   ... path = f"/user/hive/warehouse/{db_name}.db/{table_name}"
   ... hudi_options = {
   ... 'hoodie.table.name': table_name,
   ... 'hoodie.datasource.write.recordkey.field': "k1,k2",
   ... 'hoodie.datasource.write.table.name': table_name,
   ... 'hoodie.datasource.write.operation': "upsert",
   ... 'hoodie.datasource.write.precombine.field': "cmp_key",
   ... 'hoodie.datasource.write.table.type': "COPY_ON_WRITE",
   ... 'hoodie.upsert.shuffle.parallelism': 2000,
   ... 'hoodie.bulkinsert.shuffle.parallelism': 2000,
   ... 'hoodie.insert.shuffle.parallelism': 2000,
   ... 'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS',
   ... 'hoodie.cleaner.fileversions.retained': 6,
   ... 'hoodie.parquet.max.file.size': 1024*1024*100,
   ... 'hoodie.parquet.small.file.limit': 1024*1024*60,
   ... 'hoodie.parquet.compression.codec': 'snappy',
   ... 'hoodie.bloom.index.parallelism': 4321,
   ... 'hoodie.datasource.write.payload.class': 
"org.apache.hudi.common.model.DefaultHoodieRecordPayload",
   ... 'hoodie.datasource.hive_sync.enable': 'true',
   ... 'hoodie.datasource.hive_sync.database': db_name,
   ... 'hoodie.datasource.hive_sync.table': table_name,
   ... 'hoodie.datasource.hive_sync.jdbcurl': 
"jdbc:hive2://hive.dwhtest.com:1",
   ... 'hoodie.datasource.write.hive_style_partitioning': "true",
   ... 'hoodie.datasource.write.partitionpath.field': "pt1,pt2",
   ... 'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
   ... 'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator'
   ... }
   ... df.write.format("hudi").options(
   ... **hudi_options
   ... ).save(path)
   ...
   >>> write()
   22/08/24 14:33:23 WARN metadata.HoodieBackedTableMetadata: Metadata table 
was not found at path /user/hive/warehouse/user_test.db/hudi_0_12   
_0_test/.hoodie/metadata
   22/08/24 14:33:41 ERROR

[GitHub] [hudi] danny0405 commented on a diff in pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-24 Thread GitBox



danny0405 commented on code in PR #6000:
URL: https://github.com/apache/hudi/pull/6000#discussion_r953403629


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java:
##
@@ -74,16 +74,57 @@ public class HoodieActiveTimeline extends 
HoodieDefaultTimeline {
   REQUESTED_REPLACE_COMMIT_EXTENSION, INFLIGHT_REPLACE_COMMIT_EXTENSION, 
REPLACE_COMMIT_EXTENSION,
   REQUESTED_INDEX_COMMIT_EXTENSION, INFLIGHT_INDEX_COMMIT_EXTENSION, 
INDEX_COMMIT_EXTENSION,
   REQUESTED_SAVE_SCHEMA_ACTION_EXTENSION, 
INFLIGHT_SAVE_SCHEMA_ACTION_EXTENSION, SAVE_SCHEMA_ACTION_EXTENSION));
+
+  private static final Set NOT_PARSABLE_TIMESTAMPS = new 
HashSet(3) {{
+  add(HoodieTimeline.INIT_INSTANT_TS);
+  add(HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS);
+  add(HoodieTimeline.FULL_BOOTSTRAP_INSTANT_TS);
+}};
+
   private static final Logger LOG = 
LogManager.getLogger(HoodieActiveTimeline.class);
   protected HoodieTableMetaClient metaClient;
 
   /**
* Parse the timestamp of an Instant and return a {@code Date}.
+   * Throw ParseException if timestamp not valid format as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   *
+   * @param timestamp a timestamp String which follow pattern as
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator#SECS_INSTANT_TIMESTAMP_FORMAT}.
+   * @return Date of instant timestamp
*/
   public static Date parseDateFromInstantTime(String timestamp) throws 
ParseException {
 return HoodieInstantTimeGenerator.parseDateFromInstantTime(timestamp);
   }
 
+  /**
+   * The same format method as above, but this method will mute ParseException.
+   * If the gaven timestamp is invalid, then will return {@code Option.empty}.
+   * Or a corresponding Date value if these timestamp provided
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#INIT_INSTANT_TS},
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#METADATA_BOOTSTRAP_INSTANT_TS},
+   *  {@link 
org.apache.hudi.common.table.timeline.HoodieTimeline#FULL_BOOTSTRAP_INSTANT_TS}.

Review Comment:
   `If the gaven timestamp is invalid, then will return {@code Option.empty}.`
   -> `If the given timestamp is invalid, returns {@code Option.empty}.`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] china-shang commented on issue #6479: [SUPPORT] How to query the previous SNAPSHOT in Hive

2022-08-24 Thread GitBox



china-shang commented on issue #6479:
URL: https://github.com/apache/hudi/issues/6479#issuecomment-1225264914

   > I guess it's still under development
   > 
   > https://issues.apache.org/jira/browse/HUDI-1460
   
   Oh... No... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4704) bulk insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread zouxxyy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy updated HUDI-4704:
--
Description: 
When hoodie.sql.bulk.insert.enable is enabled, executing insert overwrite will 
delete the table and then recreate a table, so time travel cannot be performed.

 
{code:java}
create table hudi_cow_test_tbl (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts',
  'hoodie.sql.insert.mode' = 'non-strict',
  'hoodie.sql.bulk.insert.enable' = 'true'
);

insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';

insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';

{code}

  was:
When hoodie.sql.bulk.insert.enable is enabled, executing insert overwrite will 
delete the table and then recreate a table, so time travel cannot be performed.

 

1. create table hudi_cow_test_tbl (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts',
  'hoodie.sql.insert.mode' = 'non-strict',
  'hoodie.sql.bulk.insert.enable' = 'true'
);

2. insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';

3. insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';


> bulk insert overwrite table will delete the table and then recreate a table
> ---
>
> Key: HUDI-4704
> URL: https://issues.apache.org/jira/browse/HUDI-4704
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Assignee: zouxxyy
>Priority: Major
>
> When hoodie.sql.bulk.insert.enable is enabled, executing insert overwrite 
> will delete the table and then recreate a table, so time travel cannot be 
> performed.
>  
> {code:java}
> create table hudi_cow_test_tbl (
>   id bigint,
>   name string,
>   ts bigint,
>   dt string,
>   hh string
> ) using hudi
> tblproperties (
>   type = 'cow',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   'hoodie.sql.insert.mode' = 'non-strict',
>   'hoodie.sql.bulk.insert.enable' = 'true'
> );
> insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';
> insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4704) bulk insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread zouxxyy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy updated HUDI-4704:
--
Description: 
When hoodie.sql.bulk.insert.enable is enabled, executing insert overwrite will 
delete the table and then recreate a table, so time travel cannot be performed.

 

1. create table hudi_cow_test_tbl (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts',
  'hoodie.sql.insert.mode' = 'non-strict',
  'hoodie.sql.bulk.insert.enable' = 'true'
);

2. insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';

3. insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';

  was:
When `hoodie.sql.bulk.insert.enable` is enabled, executing insert overwrite 
will delete the table and then recreate a table, so time travel cannot be 
performed.

 

1. create table hudi_cow_test_tbl (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts',
  'hoodie.sql.insert.mode' = 'non-strict',
  'hoodie.sql.bulk.insert.enable' = 'true'
);

2. insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';

3. insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';


> bulk insert overwrite table will delete the table and then recreate a table
> ---
>
> Key: HUDI-4704
> URL: https://issues.apache.org/jira/browse/HUDI-4704
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Assignee: zouxxyy
>Priority: Major
>
> When hoodie.sql.bulk.insert.enable is enabled, executing insert overwrite 
> will delete the table and then recreate a table, so time travel cannot be 
> performed.
>  
> 1. create table hudi_cow_test_tbl (
>   id bigint,
>   name string,
>   ts bigint,
>   dt string,
>   hh string
> ) using hudi
> tblproperties (
>   type = 'cow',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   'hoodie.sql.insert.mode' = 'non-strict',
>   'hoodie.sql.bulk.insert.enable' = 'true'
> );
> 2. insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';
> 3. insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', 
> '11';



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4704) bulk insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread zouxxyy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy updated HUDI-4704:
--
Description: 
When `hoodie.sql.bulk.insert.enable` is enabled, executing insert overwrite 
will delete the table and then recreate a table, so time travel cannot be 
performed.

 

1. create table hudi_cow_test_tbl (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts',
  'hoodie.sql.insert.mode' = 'non-strict',
  'hoodie.sql.bulk.insert.enable' = 'true'
);

2. insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';

3. insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', '11';

> bulk insert overwrite table will delete the table and then recreate a table
> ---
>
> Key: HUDI-4704
> URL: https://issues.apache.org/jira/browse/HUDI-4704
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Assignee: zouxxyy
>Priority: Major
>
> When `hoodie.sql.bulk.insert.enable` is enabled, executing insert overwrite 
> will delete the table and then recreate a table, so time travel cannot be 
> performed.
>  
> 1. create table hudi_cow_test_tbl (
>   id bigint,
>   name string,
>   ts bigint,
>   dt string,
>   hh string
> ) using hudi
> tblproperties (
>   type = 'cow',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   'hoodie.sql.insert.mode' = 'non-strict',
>   'hoodie.sql.bulk.insert.enable' = 'true'
> );
> 2. insert into hudi_cow_test_tbl select 1, 'a1', 1001, '2021-12-09', '11';
> 3. insert overwrite hudi_cow_test_tbl select 3, 'a3', 1001, '2021-12-09', 
> '11';



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4704) bulk insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread zouxxyy (Jira)

zouxxyy created HUDI-4704:
-

 Summary: bulk insert overwrite table will delete the table and 
then recreate a table
 Key: HUDI-4704
 URL: https://issues.apache.org/jira/browse/HUDI-4704
 Project: Apache Hudi
  Issue Type: Bug
Reporter: zouxxyy






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4704) bulk insert overwrite table will delete the table and then recreate a table

2022-08-24 Thread zouxxyy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy reassigned HUDI-4704:
-

Assignee: zouxxyy

> bulk insert overwrite table will delete the table and then recreate a table
> ---
>
> Key: HUDI-4704
> URL: https://issues.apache.org/jira/browse/HUDI-4704
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Assignee: zouxxyy
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] nsivabalan commented on issue #6212: [SUPPORT] Hudi creates duplicate, redundant file during clustering

2022-08-24 Thread GitBox



nsivabalan commented on issue #6212:
URL: https://github.com/apache/hudi/issues/6212#issuecomment-1225242795

   give me two days. I am gonna take a stab at this and will update here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

< 1 2

101 - 162 of 162 matches

Mail list logo