Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11370:
URL: https://github.com/apache/hudi/pull/11370#issuecomment-2141282627

   
   ## CI report:
   
   * dcf9a4a7947b75943814493f528b90b68ee2b9aa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24160)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11370:
URL: https://github.com/apache/hudi/pull/11370#issuecomment-2141237293

   
   ## CI report:
   
   * dcf9a4a7947b75943814493f528b90b68ee2b9aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24160)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]

2024-05-30 Thread via GitHub


danny0405 commented on code in PR #11370:
URL: https://github.com/apache/hudi/pull/11370#discussion_r1621717283


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##
@@ -388,7 +388,7 @@ public static ConflictResolutionStrategy 
getConflictResolutionStrategy(Configura
* Returns whether to commit even when current batch has no data, for flink 
defaults false
*/
   public static boolean allowCommitOnEmptyBatch(Configuration conf) {
-return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), false);
+return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), 
HoodieWriteConfig.ALLOW_EMPTY_COMMIT.defaultValue());

Review Comment:
   you can tun ITTestHoodieDataSource in local env and make it pass first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11370:
URL: https://github.com/apache/hudi/pull/11370#issuecomment-2141231803

   
   ## CI report:
   
   * dcf9a4a7947b75943814493f528b90b68ee2b9aa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7817] Use Jackson Core instead of org.codehaus.jackson for JSON encoding [hudi]

2024-05-30 Thread via GitHub


yihua merged PR #11369:
URL: https://github.com/apache/hudi/pull/11369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7817] Use Jackson Core instead of org.codehaus.jackson for JSON encoding (#11369)

2024-05-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 0e55f0900d8 [HUDI-7817] Use Jackson Core instead of 
org.codehaus.jackson for JSON encoding (#11369)
0e55f0900d8 is described below

commit 0e55f0900d80b64398d9e8d50b32e8e1680df9f0
Author: Y Ethan Guo 
AuthorDate: Thu May 30 21:39:23 2024 -0700

[HUDI-7817] Use Jackson Core instead of org.codehaus.jackson for JSON 
encoding (#11369)
---
 hudi-common/src/main/java/org/apache/hudi/avro/JsonEncoder.java | 8 
 style/checkstyle.xml| 3 ++-
 style/scalastyle.xml| 8 +---
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/hudi-common/src/main/java/org/apache/hudi/avro/JsonEncoder.java 
b/hudi-common/src/main/java/org/apache/hudi/avro/JsonEncoder.java
index 86d6a6ad9e2..01b44ead24f 100644
--- a/hudi-common/src/main/java/org/apache/hudi/avro/JsonEncoder.java
+++ b/hudi-common/src/main/java/org/apache/hudi/avro/JsonEncoder.java
@@ -19,6 +19,10 @@
 
 package org.apache.hudi.avro;
 
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.util.MinimalPrettyPrinter;
 import org.apache.avro.AvroTypeException;
 import org.apache.avro.Schema;
 import org.apache.avro.io.Encoder;
@@ -27,10 +31,6 @@ import org.apache.avro.io.parsing.JsonGrammarGenerator;
 import org.apache.avro.io.parsing.Parser;
 import org.apache.avro.io.parsing.Symbol;
 import org.apache.avro.util.Utf8;
-import org.codehaus.jackson.JsonEncoding;
-import org.codehaus.jackson.JsonFactory;
-import org.codehaus.jackson.JsonGenerator;
-import org.codehaus.jackson.util.MinimalPrettyPrinter;
 
 import java.io.IOException;
 import java.io.OutputStream;
diff --git a/style/checkstyle.xml b/style/checkstyle.xml
index 92883af6ff5..24fd704ba46 100644
--- a/style/checkstyle.xml
+++ b/style/checkstyle.xml
@@ -267,7 +267,8 @@
 
 
 
-
+
 
 
diff --git a/style/scalastyle.xml b/style/scalastyle.xml
index 463ceebef30..dd4ddb3b801 100644
--- a/style/scalastyle.xml
+++ b/style/scalastyle.xml
@@ -57,7 +57,7 @@
  
  
   
-   
+   
   
  
  
@@ -130,10 +130,4 @@
scala\..*
   
  
- 
-  
-   
-  
- 
-
 



Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


usberkeley commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2141196482

   There are many conflicts between my local code and Remote. This is my 
mistake. To make the PR record beautiful, I opened a new PR: 
https://github.com/apache/hudi/pull/11370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


usberkeley closed pull request #11359: [HUDI-7810] Fix 
OptionsResolver#allowCommitOnEmptyBatch default value…
URL: https://github.com/apache/hudi/pull/11359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7819) Fix OptionsResolver#allowCommitOnEmptyBatch default value bug

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7819:
-
Labels: pull-request-available  (was: )

> Fix OptionsResolver#allowCommitOnEmptyBatch default value bug
> -
>
> Key: HUDI-7819
> URL: https://issues.apache.org/jira/browse/HUDI-7819
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: bradley
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]

2024-05-30 Thread via GitHub


usberkeley opened a new pull request, #11370:
URL: https://github.com/apache/hudi/pull/11370

   ### Change Logs
   
   OptionsResolver#allowCommitOnEmptyBatch has a hardcoded default value of 
"false", while ALLOW_EMPTY_COMMIT (hoodie.allow.empty.commit) defaults to 
"true", this function returns the wrong default value.
   
   In addition, TestHoodieFlinkQuickstart was modified to avoid being affected 
by empty commits (hoodie.allow.empty.commit=true).
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [1] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [1] Change Logs and Impact were stated clearly
   - [1] Adequate tests were added if applicable
   - [1] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]

2024-05-30 Thread via GitHub


KnightChess commented on code in PR #11043:
URL: https://github.com/apache/hudi/pull/11043#discussion_r1621639791


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestBloomFiltersIndexSupport.scala:
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.common.config.{HoodieMetadataConfig, TypedProperties}
+import org.apache.hudi.common.model.{FileSlice, HoodieTableType}
+import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient}
+import org.apache.hudi.common.testutils.RawTripTestPayload.recordsToStrings
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.metadata.HoodieMetadataFileSystemView
+import org.apache.hudi.testutils.HoodieSparkClientTestBase
+import org.apache.hudi.util.{JFunction, JavaConversions}
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, 
HoodieFileIndex}
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, EqualTo, 
Expression, Literal}
+import org.apache.spark.sql.functions.{col, not}
+import org.apache.spark.sql.types.StringType
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.EnumSource
+
+import java.util.concurrent.atomic.AtomicInteger
+import java.util.stream.Collectors
+import scala.collection.JavaConverters._
+import scala.collection.{JavaConverters, mutable}
+
+class TestBloomFiltersIndexSupport extends HoodieSparkClientTestBase {
+
+  val sqlTempTable = "hudi_tbl_bloom"
+  var spark: SparkSession = _
+  var instantTime: AtomicInteger = _
+  val metadataOpts: Map[String, String] = Map(
+HoodieMetadataConfig.ENABLE.key -> "true",
+HoodieMetadataConfig.ENABLE_METADATA_INDEX_BLOOM_FILTER.key -> "true",
+HoodieMetadataConfig.BLOOM_FILTER_INDEX_FOR_COLUMNS.key -> "_row_key"
+  )
+  val commonOpts: Map[String, String] = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
+RECORDKEY_FIELD.key -> "_row_key",
+PARTITIONPATH_FIELD.key -> "partition",
+PRECOMBINE_FIELD.key -> "timestamp",
+HoodieTableConfig.POPULATE_META_FIELDS.key -> "true"
+  ) ++ metadataOpts
+  var mergedDfList: List[DataFrame] = List.empty
+
+  @BeforeEach
+  override def setUp(): Unit = {
+initPath()
+initSparkContexts()
+initHoodieStorage()
+initTestDataGenerator()
+
+setTableName("hoodie_test")
+initMetaClient()
+
+instantTime = new AtomicInteger(1)
+
+spark = sqlContext.sparkSession
+  }
+
+  @AfterEach
+  override def tearDown(): Unit = {
+cleanupFileSystem()
+cleanupSparkContexts()
+  }
+
+  @ParameterizedTest
+  @EnumSource(classOf[HoodieTableType])
+  def testIndexInitialization(tableType: HoodieTableType): Unit = {
+val hudiOpts = commonOpts + (DataSourceWriteOptions.TABLE_TYPE.key -> 
tableType.name())
+doWriteAndValidateBloomFilters(
+  hudiOpts,
+  operation = DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
+  saveMode = SaveMode.Overwrite)
+  }
+
+  /**
+   * Test case to do a write with updates and then validate file pruning using 
bloom filters.
+   */
+  @Test
+  def testBloomFiltersIndexFilePruning(): Unit = {
+var hudiOpts = commonOpts
+hudiOpts = hudiOpts + (
+  DataSourceReadOptions.ENABLE_DATA_SKIPPING.key -> "true")
+
+doWriteAndValidateBloomFilters(
+  hudiOpts,
+  operation = DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
+  saveMode = SaveMode.Overwrite,
+  shouldValidate = false)
+doWriteAndValidateBloomFilters(
+  hudiOpts,
+  operation = DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL,
+  saveMode = SaveMode.Append)
+
+createTempTable(hudiOpts)
+verifyQueryPredicate(hudiOpts)
+  }
+
+  private def createTempTable(hudiOpts: Map[String, String]): Unit = {
+val readDf = 

Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2141151906

   
   ## CI report:
   
   * 4b149d9085498be66c6426b0c3fde90ddf382cec UNKNOWN
   * c8b14bd35eb233306750d8b31780d3da8ba2547d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24157)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2141146409

   
   ## CI report:
   
   * 4b149d9085498be66c6426b0c3fde90ddf382cec UNKNOWN
   * 9ce101ca9d0c194af5b31b533c83fb21549ca8d3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24146)
 
   * 0bc90bdc0865275eb0e3650a5bc82c3b3d65d11f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24156)
 
   * c8b14bd35eb233306750d8b31780d3da8ba2547d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7810) Fix OptionsResolver#allowCommitOnEmptyBatch default value bug

2024-05-30 Thread bradley (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bradley closed HUDI-7810.
-
Resolution: Later

> Fix OptionsResolver#allowCommitOnEmptyBatch default value bug
> -
>
> Key: HUDI-7810
> URL: https://issues.apache.org/jira/browse/HUDI-7810
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: bradley
>Priority: Major
>  Labels: pull-request-available
>
> Fixed in PR: [https://github.com/apache/hudi/pull/11359]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7819) Fix OptionsResolver#allowCommitOnEmptyBatch default value bug

2024-05-30 Thread bradley (Jira)
bradley created HUDI-7819:
-

 Summary: Fix OptionsResolver#allowCommitOnEmptyBatch default value 
bug
 Key: HUDI-7819
 URL: https://issues.apache.org/jira/browse/HUDI-7819
 Project: Apache Hudi
  Issue Type: Bug
Reporter: bradley






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2141115466

   
   ## CI report:
   
   * 4b149d9085498be66c6426b0c3fde90ddf382cec UNKNOWN
   * 9ce101ca9d0c194af5b31b533c83fb21549ca8d3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24146)
 
   * 0bc90bdc0865275eb0e3650a5bc82c3b3d65d11f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24156)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2141109747

   
   ## CI report:
   
   * 4b149d9085498be66c6426b0c3fde90ddf382cec UNKNOWN
   * 9ce101ca9d0c194af5b31b533c83fb21549ca8d3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24146)
 
   * 0bc90bdc0865275eb0e3650a5bc82c3b3d65d11f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR][TESTING][DNM] Validating 0.15.0 RC2 bundles [hudi]

2024-05-30 Thread via GitHub


yihua closed pull request #11340: [MINOR][TESTING][DNM] Validating 0.15.0 RC2 
bundles
URL: https://github.com/apache/hudi/pull/11340


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR][Test][DNM] Test Azure CI on branch-0.x [hudi]

2024-05-30 Thread via GitHub


yihua closed pull request #10766: [MINOR][Test][DNM] Test Azure CI on branch-0.x
URL: https://github.com/apache/hudi/pull/10766


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7818) Flink Table planner not loading problem

2024-05-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7818:
-
Sprint: Sprint 2023-04-26

> Flink Table planner not loading problem
> ---
>
> Key: HUDI-7818
> URL: https://issues.apache.org/jira/browse/HUDI-7818
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7818) Flink Table planner not loading problem

2024-05-30 Thread Danny Chen (Jira)
Danny Chen created HUDI-7818:


 Summary: Flink Table planner not loading problem
 Key: HUDI-7818
 URL: https://issues.apache.org/jira/browse/HUDI-7818
 Project: Apache Hudi
  Issue Type: Improvement
  Components: writer-core
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7817] Use Jackson Core instead of org.codehaus.jackson for JSON encoding [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11369:
URL: https://github.com/apache/hudi/pull/11369#issuecomment-2140985222

   
   ## CI report:
   
   * 1718840e241dd32dc4c11885ba2bf1311bf822ec Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24155)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]How to improve the speed of Flink writing to hudi ? [hudi]

2024-05-30 Thread via GitHub


HuangZhenQiu commented on issue #8071:
URL: https://github.com/apache/hudi/issues/8071#issuecomment-2140979488

   @danny0405 
   Do we have some best practices of (COW and MOR ) for Flink ingestion to 
Hudi? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7817] Use Jackson Core instead of org.codehaus.jackson for JSON encoding [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11369:
URL: https://github.com/apache/hudi/pull/11369#issuecomment-2140935788

   
   ## CI report:
   
   * 1718840e241dd32dc4c11885ba2bf1311bf822ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24155)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7816]: Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11368:
URL: https://github.com/apache/hudi/pull/11368#issuecomment-2140935768

   
   ## CI report:
   
   * 1dde761d4147e9c1a94914759ca0bfd0f7d23ec7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24154)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7817] Use Jackson Core instead of org.codehaus.jackson for JSON encoding [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11369:
URL: https://github.com/apache/hudi/pull/11369#issuecomment-2140928147

   
   ## CI report:
   
   * 1718840e241dd32dc4c11885ba2bf1311bf822ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7814] Exclude unused transitive dependencies that introduce vulnerabilities [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11364:
URL: https://github.com/apache/hudi/pull/11364#issuecomment-2140920017

   
   ## CI report:
   
   * ff1e3d8a934fe1a2c92e341be610516476bf5d7a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24153)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7817) Use Jackson Core instead of org.codehaus.jackson for JSON encoding

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7817:
-
Labels: pull-request-available  (was: )

> Use Jackson Core instead of org.codehaus.jackson for JSON encoding
> --
>
> Key: HUDI-7817
> URL: https://issues.apache.org/jira/browse/HUDI-7817
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> org.codehaus.jackson is a older version of Jackson Core 
> (com.fasterxml.jackson.core:jackson-core).  
> org.codehaus.jackson:jackson-mapper-asl has critical vulnerabilities which 
> should be avoided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7817) Use Jackson Core instead of org.codehaus.jackson for JSON encoding

2024-05-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7817:

Description: org.codehaus.jackson is a older version of Jackson Core 
(com.fasterxml.jackson.core:jackson-core).  
org.codehaus.jackson:jackson-mapper-asl has critical vulnerabilities which 
should be avoided.  (was: org.codehaus.jackson is a older version of Jackson 
Core (com.fasterxml.jackson.core:jackson-core).  
org.codehaus.jackson:jackson-mapper-asl has critical vulnerabilities which 
should be avoid.)

> Use Jackson Core instead of org.codehaus.jackson for JSON encoding
> --
>
> Key: HUDI-7817
> URL: https://issues.apache.org/jira/browse/HUDI-7817
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> org.codehaus.jackson is a older version of Jackson Core 
> (com.fasterxml.jackson.core:jackson-core).  
> org.codehaus.jackson:jackson-mapper-asl has critical vulnerabilities which 
> should be avoided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7817] Use Jackson Core instead of org.codehaus.jackson for JSON encoding [hudi]

2024-05-30 Thread via GitHub


yihua opened a new pull request, #11369:
URL: https://github.com/apache/hudi/pull/11369

   ### Change Logs
   
   `org.codehaus.jackson` is a older version of Jackson Core 
(`com.fasterxml.jackson.core:jackson-core`).  
`org.codehaus.jackson:jackson-mapper-asl` has critical vulnerabilities which 
should be avoided.  This PR changes `JsonEncoder` to use Jackson Core and adds 
rules to check illegal imports of `org.codehaus.jackson`.
   
   ### Impact
   
   Unifies usage of JSON encoding.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7817) Use Jackson Core instead of org.codehaus.jackson for JSON encoding

2024-05-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7817:

Description: org.codehaus.jackson is a older version of Jackson Core 
(com.fasterxml.jackson.core:jackson-core).  
org.codehaus.jackson:jackson-mapper-asl has critical vulnerabilities which 
should be avoid.  (was: org.codehaus.jackson is a older version of )

> Use Jackson Core instead of org.codehaus.jackson for JSON encoding
> --
>
> Key: HUDI-7817
> URL: https://issues.apache.org/jira/browse/HUDI-7817
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> org.codehaus.jackson is a older version of Jackson Core 
> (com.fasterxml.jackson.core:jackson-core).  
> org.codehaus.jackson:jackson-mapper-asl has critical vulnerabilities which 
> should be avoid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7817) Use Jackson Core instead of org.codehaus.jackson for JSON encoding

2024-05-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7817:

Description: org.codehaus.jackson is a older version of 

> Use Jackson Core instead of org.codehaus.jackson for JSON encoding
> --
>
> Key: HUDI-7817
> URL: https://issues.apache.org/jira/browse/HUDI-7817
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> org.codehaus.jackson is a older version of 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7817) Use Jackson Core instead of org.codehaus.jackson for JSON encoding

2024-05-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7817:
---

Assignee: Ethan Guo

> Use Jackson Core instead of org.codehaus.jackson for JSON encoding
> --
>
> Key: HUDI-7817
> URL: https://issues.apache.org/jira/browse/HUDI-7817
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7817) Use Jackson Core instead of org.codehaus.jackson for JSON encoding

2024-05-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7817:

Fix Version/s: 1.0.0

> Use Jackson Core instead of org.codehaus.jackson for JSON encoding
> --
>
> Key: HUDI-7817
> URL: https://issues.apache.org/jira/browse/HUDI-7817
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7817) Use Jackson Core instead of org.codehaus.jackson for JSON encoding

2024-05-30 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7817:
---

 Summary: Use Jackson Core instead of org.codehaus.jackson for JSON 
encoding
 Key: HUDI-7817
 URL: https://issues.apache.org/jira/browse/HUDI-7817
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7816]: Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11368:
URL: https://github.com/apache/hudi/pull/11368#issuecomment-2140868406

   
   ## CI report:
   
   * 1dde761d4147e9c1a94914759ca0bfd0f7d23ec7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24154)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7814] Exclude unused transitive dependencies that introduce vulnerabilities [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11364:
URL: https://github.com/apache/hudi/pull/11364#issuecomment-2140858167

   
   ## CI report:
   
   * 3337f90b44d58d07c8a4055c9544f0e957d93226 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24138)
 
   * ff1e3d8a934fe1a2c92e341be610516476bf5d7a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24153)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7816]: Add SourceProfileSupplier option to SnapshotLoadQuerySplitter [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11368:
URL: https://github.com/apache/hudi/pull/11368#issuecomment-2140858256

   
   ## CI report:
   
   * 1dde761d4147e9c1a94914759ca0bfd0f7d23ec7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7814] Exclude unused transitive dependencies that introduce vulnerabilities [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11364:
URL: https://github.com/apache/hudi/pull/11364#issuecomment-2140848183

   
   ## CI report:
   
   * 3337f90b44d58d07c8a4055c9544f0e957d93226 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24138)
 
   * ff1e3d8a934fe1a2c92e341be610516476bf5d7a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7816) Pass the source profile to the snapshot query splitter

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7816:
-
Labels: pull-request-available  (was: )

> Pass the source profile to the snapshot query splitter
> --
>
> Key: HUDI-7816
> URL: https://issues.apache.org/jira/browse/HUDI-7816
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Rajesh Mahindra
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7816]: Add SourceProfileSupplier option to SnapshotLoadQuerySplitter [hudi]

2024-05-30 Thread via GitHub


mattwong949 opened a new pull request, #11368:
URL: https://github.com/apache/hudi/pull/11368

   ### Change Logs
   
   Expanding the interface of the SnapshotLoadQuerySplitter to accept 
SourceProfileSupplier option.
   
   ### Impact
   
   Some SnapshotLoadQuerySplitter implementations may want to consider a 
SourceProfileSupplier in their logic, allowing source estimations to be used 
when splitting queries.
   
   ### Risk level (write none, low medium or high below)
   
   Low, small change to the API but no logic change within hudi itself.
   
   ### Documentation Update
   
   Updated javadocs for the modified interface
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7816) Pass the source profile to the snapshot query splitter

2024-05-30 Thread Rajesh Mahindra (Jira)
Rajesh Mahindra created HUDI-7816:
-

 Summary: Pass the source profile to the snapshot query splitter
 Key: HUDI-7816
 URL: https://issues.apache.org/jira/browse/HUDI-7816
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Rajesh Mahindra






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated (c758508b62f -> db7480820e3)

2024-05-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from c758508b62f [HUDI-7769] Fix Hudi CDC read on Spark 3.3.4 and 3.4.3 
(#11242)
 add db7480820e3 [MINOR] Fix GitHub CI concurrency (#11361)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/bot.yml  | 1 +
 .github/workflows/release_candidate_validation.yml | 4 
 2 files changed, 1 insertion(+), 4 deletions(-)



Re: [PR] [MINOR] Fix GitHub CI concurrency [hudi]

2024-05-30 Thread via GitHub


yihua merged PR #11361:
URL: https://github.com/apache/hudi/pull/11361


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11162:
URL: https://github.com/apache/hudi/pull/11162#issuecomment-2140765294

   
   ## CI report:
   
   * 3c52961bdbcb210e4c7140f5939143cfda7adb50 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24151)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server [hudi]

2024-05-30 Thread via GitHub


Gatsby-Lee commented on PR #8079:
URL: https://github.com/apache/hudi/pull/8079#issuecomment-2140708258

    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch branch-0.x updated: [MINOR] Fix GitHub CI concurrency (#11362)

2024-05-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch branch-0.x
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/branch-0.x by this push:
 new 70094deb391 [MINOR] Fix GitHub CI concurrency (#11362)
70094deb391 is described below

commit 70094deb391f612c13babd3cdf49dd88ebb0eec0
Author: Y Ethan Guo 
AuthorDate: Thu May 30 11:26:39 2024 -0700

[MINOR] Fix GitHub CI concurrency (#11362)
---
 .github/workflows/bot.yml  | 1 +
 .github/workflows/release_candidate_validation.yml | 4 
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/.github/workflows/bot.yml b/.github/workflows/bot.yml
index 951eecdcc57..72200c4822d 100644
--- a/.github/workflows/bot.yml
+++ b/.github/workflows/bot.yml
@@ -25,6 +25,7 @@ on:
 
 concurrency:
   group: ${{ github.ref }}
+  cancel-in-progress: ${{ !contains(github.ref, 'master') && 
!contains(github.ref, 'branch-0.x') && !contains(github.ref, 'release-') }}
 
 env:
   MVN_ARGS: -e -ntp -B -V -Dgpg.skip -Djacoco.skip -Pwarn-log 
-Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.shade=warn 
-Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.dependency=warn 
-Dmaven.wagon.httpconnectionManager.ttlSeconds=25 
-Dmaven.wagon.http.retryHandler.count=5
diff --git a/.github/workflows/release_candidate_validation.yml 
b/.github/workflows/release_candidate_validation.yml
index a952ba782e5..b9b668cc80b 100644
--- a/.github/workflows/release_candidate_validation.yml
+++ b/.github/workflows/release_candidate_validation.yml
@@ -8,10 +8,6 @@ on:
 branches:
   - 'release-*'
 
-concurrency:
-  group: ${{ github.ref }}
-  cancel-in-progress: ${{ !contains(github.ref, 'master') }}
-
 env:
   MVN_ARGS: -e -ntp -B -V -Dgpg.skip -Djacoco.skip -Pwarn-log 
-Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.shade=warn 
-Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.dependency=warn 
-Dmaven.wagon.httpconnectionManager.ttlSeconds=25 
-Dmaven.wagon.http.retryHandler.count=5
   SPARK_COMMON_MODULES: 
hudi-spark-datasource/hudi-spark,hudi-spark-datasource/hudi-spark-common



Re: [PR] [MINOR] Fix GitHub CI concurrency [hudi]

2024-05-30 Thread via GitHub


yihua merged PR #11362:
URL: https://github.com/apache/hudi/pull/11362


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7779) Guarding archival to not archive unintended commits

2024-05-30 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7779:
--
Description: 
Archiving commits from active timeline could lead to data consistency issues on 
some rarest of occasions. We should come up with proper guards to ensure we do 
not make such unintended archival. 

 

Major gap which we wanted to guard is:

if someone disabled cleaner, archival should account for data consistency 
issues and ensure it bails out.

We have a base guarding condition, where archival will stop at the earliest 
commit to retain based on latest clean commit metadata. But there are few other 
scenarios that needs to be accounted for. 

 

a. Keeping aside replace commits, lets dive into specifics for regular commits 
and delta commits.

Say user configured clean commits to 4 and archival configs to 5 and 6. after 
t10, cleaner is supposed to clean up all file versions created at or before t6. 
Say cleaner did not run(for whatever reason for next 5 commits). 

    Archival will certainly be guarded until earliest commit to retain based on 
latest clean commits. 

Corner case to consider: 

A savepoint was added to say t3 and later removed. and still the cleaner was 
never re-enabled. Even though archival would have been stopped at t3 (until 
savepoint is present),but once savepoint is removed, if archival is executed, 
it could archive commit t3. Which means, file versions tracked at t3 is still 
not yet cleaned by the cleaner. 

Reasoning: 

We are good here wrt data consistency. Up until cleaner runs next time, this 
older file versions might be exposed to the end-user. But time travel query is 
not intended for already cleaned up commits and hence this is not an issue. 
None of snapshot, time travel query or incremental query will run into issues 
as they are not supposed to poll for t3. 

At any later point, if cleaner is re-enabled, it will take care of cleaning up 
file versions tracked at t3 commit. Just that for interim period, some older 
file versions might still be exposed to readers. 

 

b. The more tricky part is when replace commits are involved. Since replace 
commit metadata in active timeline is what ensures the replaced file groups are 
ignored for reads, before archiving the same, cleaner is expected to clean them 
up fully. But are there chances that this could go wrong? 

Corner case to consider. Lets add onto above scenario, where t3 has a 
savepoint, and t4 is a replace commit which replaced file groups tracked in t3. 

Cleaner will skip cleaning up files tracked by t3(due to the presence of 
savepoint), but will clean up t4, t5 and t6. So, earliest commit to retain will 
be pointing to t6. And say savepoint for t3 is removed, but cleaner was 
disabled. In this state of the timeline, if archival is executed, (since 
t3.savepoint is removed), archival might archive t3 and t4.rc.  This could lead 
to data duplicates as both replaced file groups and new file groups from t4.rc 
would be exposed as valid file groups. 

 

In other words, if we were to summarize the different scenarios: 

i. replaced file group is never cleaned up. 
    - ECTR(Earliest commit to retain) is less than this.rc and we are good. 
ii. replaced file group is cleaned up. 
    - ECTR is > this.rc and is good to archive.
iii. tricky: ECTR moved ahead compared to this.rc, but due to savepoint, full 
clean up did not happen.  After savepoint is removed, and when archival is 
executed, we should avoid archiving the rc of interest. This is the gap we 
don't account for as of now.

 

We have 3 options to go about to solve this.

Option A: 

Let Savepoint deletion flow take care of cleaning up the files its tracking. 

cons:

Savepoint's responsibility is not removing any data files. So, from a single 
user responsibility rule, this may not be right. Also, this clean up might need 
to do what a clean planner might actually be doing. ie. build file system view, 
understand if its supposed to be cleaned up already, and then only clean up the 
files which are supposed to be cleaned up. For eg, if a file group has only one 
file slice, it should not be cleaned up and scenarios like this. 

 

Option B:

Since archival is the one which might cause data consistency issues, why not 
archival do the clean up. 

We need to account for concurrent cleans, failure and retry scenarios etc. 
Also, we might need to build the file system view and then take a call whether 
something needs to be cleaned up before archiving something. 

Cons:

Again, the single user responsibility rule might be broken. Would be neat if 
cleaner takes care of deleting data files and archival only takes care of 
deleting/archiving timeline files. 

 

Option C:

Similar to how cleaner maintain EarliestCommitToRetain, let cleaner track 
another metadata named "EarliestCommitToArchive". Strictly speaking, earliest 
commit to 

Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11162:
URL: https://github.com/apache/hudi/pull/11162#issuecomment-2140514821

   
   ## CI report:
   
   * a602c9c4234062e66877fc4bf2c50f94f43767bc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24142)
 
   * 3c52961bdbcb210e4c7140f5939143cfda7adb50 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24151)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11162:
URL: https://github.com/apache/hudi/pull/11162#issuecomment-2140488986

   
   ## CI report:
   
   * a602c9c4234062e66877fc4bf2c50f94f43767bc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24142)
 
   * 3c52961bdbcb210e4c7140f5939143cfda7adb50 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #10957:
URL: https://github.com/apache/hudi/pull/10957#issuecomment-2140461564

   
   ## CI report:
   
   * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN
   * 475a1bc220eaee04fa78ba46a922b434b8306047 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24150)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark-Hudi: Unable to perform Hard delete using Pyspark on HUDI table from AWS Glue [hudi]

2024-05-30 Thread via GitHub


soumilshah1995 commented on issue #11349:
URL: https://github.com/apache/hudi/issues/11349#issuecomment-2140440102

   good to hear that your issue is resolved cheers !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark-Hudi: Unable to perform Hard delete using Pyspark on HUDI table from AWS Glue [hudi]

2024-05-30 Thread via GitHub


Ssv-21 commented on issue #11349:
URL: https://github.com/apache/hudi/issues/11349#issuecomment-2140322503

   Actually, I was using the native glue-based Hudi. But after going through 
your blogspot post, I tried using Hudi 0.14.0-Spark 3.3 bundle jar, and it 
worked.
   I believe something is wrong with the glue-based Hudi, and it is better to 
provide the jars than using the native version.
   
   and Thank you, Soumil, for your suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #10957:
URL: https://github.com/apache/hudi/pull/10957#issuecomment-2140301377

   
   ## CI report:
   
   * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN
   * 63737caa30a0ba2ccc66b05bbeb3005d185eb4b7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24148)
 
   * 475a1bc220eaee04fa78ba46a922b434b8306047 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24150)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #10957:
URL: https://github.com/apache/hudi/pull/10957#issuecomment-2140271817

   
   ## CI report:
   
   * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN
   * 540d122ed1f6c9ee56730ec85fde9f0355b5d67a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23935)
 
   * 63737caa30a0ba2ccc66b05bbeb3005d185eb4b7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24148)
 
   * 475a1bc220eaee04fa78ba46a922b434b8306047 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7769] Fix Hudi CDC read on Spark 3.3.4 and 3.4.3 (#11242)

2024-05-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c758508b62f [HUDI-7769] Fix Hudi CDC read on Spark 3.3.4 and 3.4.3 
(#11242)
c758508b62f is described below

commit c758508b62f0617ac95e33a490dde62cc897ab3a
Author: Y Ethan Guo 
AuthorDate: Thu May 30 09:29:00 2024 -0700

[HUDI-7769] Fix Hudi CDC read on Spark 3.3.4 and 3.4.3 (#11242)
---
 .../src/main/scala/org/apache/hudi/cdc/CDCRelation.scala  | 8 
 1 file changed, 8 insertions(+)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/CDCRelation.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/CDCRelation.scala
index 311383a9c32..f298efc8ed4 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/CDCRelation.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/CDCRelation.scala
@@ -27,6 +27,7 @@ import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
 import org.apache.hudi.exception.HoodieException
 import org.apache.hudi.internal.schema.InternalSchema
 import org.apache.hudi.{AvroConversionUtils, DataSourceReadOptions, 
HoodieDataSourceHelper, HoodieTableSchema}
+
 import org.apache.spark.internal.Logging
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.InternalRow
@@ -50,6 +51,8 @@ class CDCRelation(
 options: Map[String, String]
 ) extends BaseRelation with PrunedFilteredScan with Logging {
 
+  imbueConfigs(sqlContext)
+
   val spark: SparkSession = sqlContext.sparkSession
 
   val (tableAvroSchema, _) = {
@@ -118,6 +121,11 @@ class CDCRelation(
 )
 cdcRdd.asInstanceOf[RDD[InternalRow]]
   }
+
+  def imbueConfigs(sqlContext: SQLContext): Unit = {
+// Disable vectorized reading for CDC relation
+
sqlContext.sparkSession.sessionState.conf.setConfString("spark.sql.parquet.enableVectorizedReader",
 "false")
+  }
 }
 
 object CDCRelation {



Re: [PR] [HUDI-7769] Fix Hudi CDC read with legacy parquet file format on Spark [hudi]

2024-05-30 Thread via GitHub


yihua merged PR #11242:
URL: https://github.com/apache/hudi/pull/11242


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #10957:
URL: https://github.com/apache/hudi/pull/10957#issuecomment-2140088833

   
   ## CI report:
   
   * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN
   * 540d122ed1f6c9ee56730ec85fde9f0355b5d67a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23935)
 
   * 63737caa30a0ba2ccc66b05bbeb3005d185eb4b7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24148)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #10957:
URL: https://github.com/apache/hudi/pull/10957#issuecomment-2140060183

   
   ## CI report:
   
   * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN
   * 540d122ed1f6c9ee56730ec85fde9f0355b5d67a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23935)
 
   * 63737caa30a0ba2ccc66b05bbeb3005d185eb4b7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2140031009

   
   ## CI report:
   
   * 4b149d9085498be66c6426b0c3fde90ddf382cec UNKNOWN
   * 9ce101ca9d0c194af5b31b533c83fb21549ca8d3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24146)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7407) Add optional clean support to standalone compaction and clustering jobs

2024-05-30 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7407.
-
Resolution: Fixed

> Add optional clean support to standalone compaction and clustering jobs
> ---
>
> Key: HUDI-7407
> URL: https://issues.apache.org/jira/browse/HUDI-7407
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: table-service
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Lets add top level config to standalone compaction and clustering job to 
> optionally clean. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2139791741

   
   ## CI report:
   
   * c8bf966468abfcab8121f7ba7a63f8098bbf965a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24122)
 
   * 4b149d9085498be66c6426b0c3fde90ddf382cec UNKNOWN
   * 9ce101ca9d0c194af5b31b533c83fb21549ca8d3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24146)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7407] Making clean optional in standalone compaction and clustering jobs [hudi]

2024-05-30 Thread via GitHub


codope merged PR #10668:
URL: https://github.com/apache/hudi/pull/10668


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7407] Making clean optional in standalone compaction and clustering jobs (#10668)

2024-05-30 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new f0c1a88f8d0 [HUDI-7407] Making clean optional in standalone compaction 
and clustering jobs (#10668)
f0c1a88f8d0 is described below

commit f0c1a88f8d0de9f06d2838c32cdc276444f8afa3
Author: Sivabalan Narayanan 
AuthorDate: Thu May 30 07:50:08 2024 -0700

[HUDI-7407] Making clean optional in standalone compaction and clustering 
jobs (#10668)

* Making clean optional in standalone compaction and clustering standlaone 
jobs
---
 .../apache/hudi/utilities/HoodieClusteringJob.java |  5 +++-
 .../org/apache/hudi/utilities/HoodieCompactor.java |  8 +++--
 .../hudi/utilities/multitable/CleanTask.java   |  1 +
 .../hudi/utilities/multitable/ClusteringTask.java  |  1 +
 .../hudi/utilities/multitable/CompactionTask.java  |  1 +
 .../offlinejob/TestHoodieClusteringJob.java| 34 +-
 .../offlinejob/TestHoodieCompactorJob.java | 28 ++
 7 files changed, 49 insertions(+), 29 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
index 8e017152407..0a0b1f3b886 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
@@ -92,6 +92,8 @@ public class HoodieClusteringJob {
 public String sparkMemory = null;
 @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
 public int retry = 0;
+@Parameter(names = {"--skip-clean", "-sc"}, description = "do not trigger 
clean after compaction", required = false)
+public Boolean skipClean = true;
 
 @Parameter(names = {"--schedule", "-sc"}, description = "Schedule 
clustering @desperate soon please use \"--mode schedule\" instead")
 public Boolean runSchedule = false;
@@ -131,6 +133,7 @@ public class HoodieClusteringJob {
   + "   --spark-master " + sparkMaster + ", \n"
   + "   --spark-memory " + sparkMemory + ", \n"
   + "   --retry " + retry + ", \n"
+  + "   --skipClean " + skipClean + ", \n"
   + "   --schedule " + runSchedule + ", \n"
   + "   --retry-last-failed-clustering-job " + 
retryLastFailedClusteringJob + ", \n"
   + "   --mode " + runningMode + ", \n"
@@ -297,7 +300,7 @@ public class HoodieClusteringJob {
   }
 
   private void clean(SparkRDDWriteClient client) {
-if (client.getConfig().isAutoClean()) {
+if (!cfg.skipClean && client.getConfig().isAutoClean()) {
   client.clean();
 }
   }
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
index 42633ee5558..e8e94126118 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
@@ -94,6 +94,8 @@ public class HoodieCompactor {
 public String sparkMemory = null;
 @Parameter(names = {"--retry", "-rt"}, description = "number of retries", 
required = false)
 public int retry = 0;
+@Parameter(names = {"--skip-clean", "-sc"}, description = "do not trigger 
clean after compaction", required = false)
+public Boolean skipClean = true;
 @Parameter(names = {"--schedule", "-sc"}, description = "Schedule 
compaction", required = false)
 public Boolean runSchedule = false;
 @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set 
\"schedule\" means make a compact plan; "
@@ -124,6 +126,7 @@ public class HoodieCompactor {
   + "   --schema-file " + schemaFile + ", \n"
   + "   --spark-master " + sparkMaster + ", \n"
   + "   --spark-memory " + sparkMemory + ", \n"
+  + "   --skipClean " + skipClean + ", \n"
   + "   --retry " + retry + ", \n"
   + "   --schedule " + runSchedule + ", \n"
   + "   --mode " + runningMode + ", \n"
@@ -150,6 +153,7 @@ public class HoodieCompactor {
   && Objects.equals(sparkMaster, config.sparkMaster)
   && Objects.equals(sparkMemory, config.sparkMemory)
   && Objects.equals(retry, config.retry)
+  && Objects.equals(skipClean, config.skipClean)
   && Objects.equals(runSchedule, config.runSchedule)
   && Objects.equals(runningMode, config.runningMode)
   && Objects.equals(strategyClassName, config.strategyClassName)
@@ -160,7 +164,7 @@ public class HoodieCompactor {
 @Override
 public int hashCode() {
   return Objects.hash(basePath, tableName, compactionInstantTime, 
schemaFile,
-  sparkMaster, parallelism, sparkMemory, retry, 

Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2139638319

   
   ## CI report:
   
   * c8bf966468abfcab8121f7ba7a63f8098bbf965a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24122)
 
   * 4b149d9085498be66c6426b0c3fde90ddf382cec UNKNOWN
   * 9ce101ca9d0c194af5b31b533c83fb21549ca8d3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


usberkeley commented on code in PR #11359:
URL: https://github.com/apache/hudi/pull/11359#discussion_r1620794903


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##
@@ -370,7 +370,7 @@ public static ConflictResolutionStrategy 
getConflictResolutionStrategy(Configura
* Returns whether to commit even when current batch has no data, for flink 
defaults false
*/
   public static boolean allowCommitOnEmptyBatch(Configuration conf) {
-return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), false);
+return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), 
HoodieWriteConfig.ALLOW_EMPTY_COMMIT.defaultValue());

Review Comment:
   After correcting the default return value of 
OptionsResolver#allowCommitOnEmptyBatch to "true", 
StreamWriteOperatorCoordinator will submit an empty Commit Or DeltaCommit (when 
Checkpoint is completed), so when the program queries the latest commit, the 
commit is empty, so the returned result is also empty, and the unit test fails 
in the end
   
   Modification plan:
   When creating a Hudi table, set hoodie.allow.empty.commit = false
   
   Other solutions:
   We can modify the default value of the "hoodie.allow.empty.commit", but I 
personally think it is not good enough, the reason:
   Modify "hoodie.allow.empty.commit" default value to "false", but the default 
value of the official document and code is "true", and it is very important to 
submit an empty commit by default in Flink, which can track the entire life 
cycle. Therefore, do not adopt this solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7407] Making clean optional in standalone compaction and clustering jobs [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #10668:
URL: https://github.com/apache/hudi/pull/10668#issuecomment-2139636243

   
   ## CI report:
   
   * 5a6c7723f716d5719a8011150f73077ab1ba3a1f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24145)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


usberkeley commented on code in PR #11359:
URL: https://github.com/apache/hudi/pull/11359#discussion_r1620734871


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##
@@ -370,7 +370,7 @@ public static ConflictResolutionStrategy 
getConflictResolutionStrategy(Configura
* Returns whether to commit even when current batch has no data, for flink 
defaults false
*/
   public static boolean allowCommitOnEmptyBatch(Configuration conf) {
-return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), false);
+return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), 
HoodieWriteConfig.ALLOW_EMPTY_COMMIT.defaultValue());

Review Comment:
   The original TestHoodieFlinkQuickstart can run successfully because:
   After correcting the default return value of 
OptionsResolver#allowCommitOnEmptyBatch to "true", 
StreamWriteOperatorCoordinator will submit an empty Commit Or DeltaCommit (when 
Checkpoint is completed), so when the program queries the latest commit, the 
commit is empty, and the returned result is also empty, and the unit test fails.
   
   Modification plan:
   When creating a Hudi table, set hoodie.allow.empty.commit = false
   
   Other solutions:
   You can modify the default value, but I personally think it is not good 
enough. The reason is:
   Modify hoodie.allow.empty.commit to false, but the default value of the 
official document and code is true, and it is very important to submit an empty 
commit by default in Flink, which can track the entire life cycle. Therefore, 
do not adopt this solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11359:
URL: https://github.com/apache/hudi/pull/11359#issuecomment-2139622234

   
   ## CI report:
   
   * c8bf966468abfcab8121f7ba7a63f8098bbf965a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24122)
 
   * 4b149d9085498be66c6426b0c3fde90ddf382cec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) annotated tag release-0.15.0-rc3 updated (d0df1d4a94d -> 987b4dd1741)

2024-05-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to annotated tag release-0.15.0-rc3
in repository https://gitbox.apache.org/repos/asf/hudi.git


*** WARNING: tag release-0.15.0-rc3 was modified! ***

from d0df1d4a94d (commit)
  to 987b4dd1741 (tag)
 tagging d0df1d4a94d13cfc061faaf1a9573c886811c104 (commit)
 replaces release-0.15.0-rc2
  by Y Ethan Guo
  on Thu May 30 06:57:38 2024 -0700

- Log -
release-0.15.0
-BEGIN PGP SIGNATURE-

iQIzBAABCAAdFiEEDE0xZCfsqnGiCtlma+HUVMkPXqUFAmZYhdIACgkQa+HUVMkP
XqWSBw/8CQjJg5FX+NzO9xqWwNXvki6vAVwF2IDHdcDRh3L10w3WGF8K4J/+aqGP
0UZgC9FlmX/pNQoAHJ5HmUl3lElDj1/K/1ek6wp7HGGuRVvPLOoCWmYnYXqH0apa
QNcdTxUG0sgDb0NL7us4eVCVpwgW45+e4NmCJONEMCFtn2MmnWROG1Anj/AaF6WY
WXAfh8N8zMnPTE2hopBRRLGCf9wrh8s8GqsL+qx6Jmp9rSYo+9xW/Xc0BflEF2Sl
Regg9wWzSN1ukqFf8dI7PA1qnBGhITMCIfurrcKQG5O0jG04lawCPT/VnX06UjRJ
3Q9zX7WpkgrCGi8t2uIOoVOrRvJPospytSNTRpPGAPjqz04d2QYsuzyhwXjfhXqK
5XuKU3Ps8wqhvGsU6rYbZnI41MVOSrwJHjNq1kq+YR7jD/tZIzYr2luMViIkkm+X
MQwvl8q/qLo3YxNGjUQrUZtgMlibWaDdsKCFBjDGPBPhS2gNMnnJ9a40U12pdVdr
R9y6aWPCuxCImu7PhTwA8GJIPZe4oXQU++0Tdm3ucwvWDuD2wReYfRHs2GezfLHN
keo220YqDjxWInJ6TLeHAM98ApBzgmB2lQNzTETR8zDnLCpOE90OfF0OpJQerU7Q
R+OMsT/ncoSw3ZZfEq91qxIV4d4fRMjffcSyLKs/bff1ESNCPJc=
=0Jvi
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:



svn commit: r69471 - in /dev/hudi/hudi-0.15.0-rc3: ./ hudi-0.15.0-rc3.src.tgz hudi-0.15.0-rc3.src.tgz.asc hudi-0.15.0-rc3.src.tgz.sha512

2024-05-30 Thread yihua
Author: yihua
Date: Thu May 30 13:52:32 2024
New Revision: 69471

Log:
Add Apache Hudi 0.15.0 RC3 source release

Added:
dev/hudi/hudi-0.15.0-rc3/
dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz   (with props)
dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz.asc
dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz.sha512

Added: dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz
==
Binary file - no diff available.

Propchange: dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz
--
svn:mime-type = application/octet-stream

Added: dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz.asc
==
--- dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz.asc (added)
+++ dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz.asc Thu May 30 13:52:32 
2024
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEEiIqTQeYA64VQqs1e+xt1BPf3cMkFAmZYJ2cACgkQ+xt1BPf3
+cMmPyBAAmdcXwkeX3aTkvu/GwrgX7KOcfSSRzoNgPnQaLqv7Y8H+8M97O2auIoCY
+10ymaSaqEr9gVZ56fwx0q3YAQJKRdrc+jZ4m70OFVM602Gs0itDLT1SEn4c7LOK/
+YAsNFCBD+vfZRH3vUERDTMlmmHgOF48cnNw/SOdKTJOT/LYr+G9CrzWIQhTg44C2
+JElQNZj+3Sv1J0foVm2Fmsva7DB6JOYF3bpy0VZvqJRZgWBMc9Nuj3lcRlK0qGvl
+OZ5sr2T6czMt4CELj6wtSMOEL1knlc+luJLbrwueO2srRu3Kl/fhU/KYPmiaWqA9
+e+SUIJ5lJMUU/Dn+rnV4m9SDIOGcHnf+rJFC0C+0ALfIo76GAqYucF6ALfsEFgMn
+vAoOzJ0SZ2fIlqavG3U/0YaO92457Tqmsnr6ahsf9LoavUDLleRng0+OiKuzLa/H
+Ick1qQVDLSZrf3gfqIWmVldaovWBOo1A1jaFCGjz41b5CUsPbT8VFCGpMyuI49Ns
+LUnglCcXfLXzcuLxy3awhJp3YGYC8m3ombg/HtFGBIq/4XH8e8Q1FCdzgr2GSgdz
+2F7JvK9ruxq3JTqacPIJKTW4TwuQpQtWvARgWDP5cHGmTGprzF5DG2s2FvHr+PrV
++Fp3G+6B/RG08TO8oj41OdLJ5D2ImXF5VHzOyok4Ijo8SEWb9qE=
+=Nl7f
+-END PGP SIGNATURE-

Added: dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz.sha512
==
--- dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz.sha512 (added)
+++ dev/hudi/hudi-0.15.0-rc3/hudi-0.15.0-rc3.src.tgz.sha512 Thu May 30 13:52:32 
2024
@@ -0,0 +1 @@
+3bf244b3a396e66849b9eb6e96e5e2debd5aba282a469249741cb827a0cceb7d92235f6ffe276e80baacec8fe797ec0180fdfd1fa784804bdb5920c9f0d7e892
  hudi-0.15.0-rc3.src.tgz




Re: [PR] [HUDI-7810] Fix OptionsResolver#allowCommitOnEmptyBatch default value… [hudi]

2024-05-30 Thread via GitHub


usberkeley commented on code in PR #11359:
URL: https://github.com/apache/hudi/pull/11359#discussion_r1620734871


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##
@@ -370,7 +370,7 @@ public static ConflictResolutionStrategy 
getConflictResolutionStrategy(Configura
* Returns whether to commit even when current batch has no data, for flink 
defaults false
*/
   public static boolean allowCommitOnEmptyBatch(Configuration conf) {
-return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), false);
+return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), 
HoodieWriteConfig.ALLOW_EMPTY_COMMIT.defaultValue());

Review Comment:
   The original TestHoodieFlinkQuickstart can run successfully because:
   After correcting the default return value of 
OptionsResolver#allowCommitOnEmptyBatch to "true", 
StreamWriteOperatorCoordinator will submit an empty Commit Or DeltaCommit (when 
Checkpoint is completed), so when the program queries the latest commit, the 
commit is empty, and the returned result is also empty, and the unit test fails.
   
   Modification plan:
   When creating a Hudi table, set hoodie.allow.empty.commit = false
   
   Other solutions:
   You can modify the default value, but I personally think it is not good 
enough. The reason is:
   Modify hoodie.allow.empty.commit to false, but the default value of the 
official document and code is true, and it is very important to submit an empty 
commit by default in Flink, which can track the entire life cycle. Therefore, 
do not adopt this solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7407] Making clean optional in standalone compaction and clustering jobs [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #10668:
URL: https://github.com/apache/hudi/pull/10668#issuecomment-2139516375

   
   ## CI report:
   
   * b24eafcc00d5cf4a27ae7f9d7e70b1bfc5a12b1a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24111)
 
   * 5a6c7723f716d5719a8011150f73077ab1ba3a1f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24145)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7407] Making clean optional in standalone compaction and clustering jobs [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #10668:
URL: https://github.com/apache/hudi/pull/10668#issuecomment-2139501982

   
   ## CI report:
   
   * b24eafcc00d5cf4a27ae7f9d7e70b1bfc5a12b1a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24111)
 
   * 5a6c7723f716d5719a8011150f73077ab1ba3a1f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7407] Making clean optional in standalone compaction and clustering jobs [hudi]

2024-05-30 Thread via GitHub


codope commented on code in PR #10668:
URL: https://github.com/apache/hudi/pull/10668#discussion_r1620652100


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java:
##
@@ -92,6 +92,8 @@ public static class Config implements Serializable {
 public String sparkMemory = null;
 @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
 public int retry = 0;
+@Parameter(names = {"--skip-clean", "-sc"}, description = "do not trigger 
clean after compaction", required = false)
+public Boolean skipClean = true;

Review Comment:
   not changing.. should be fine as it's an offline job.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]

2024-05-30 Thread via GitHub


KnightChess commented on code in PR #11043:
URL: https://github.com/apache/hudi/pull/11043#discussion_r1620531072


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestBloomFiltersIndexSupport.scala:
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.common.config.{HoodieMetadataConfig, TypedProperties}
+import org.apache.hudi.common.model.{FileSlice, HoodieTableType}
+import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient}
+import org.apache.hudi.common.testutils.RawTripTestPayload.recordsToStrings
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.metadata.HoodieMetadataFileSystemView
+import org.apache.hudi.testutils.HoodieSparkClientTestBase
+import org.apache.hudi.util.{JFunction, JavaConversions}
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, 
HoodieFileIndex}
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, EqualTo, 
Expression, Literal}
+import org.apache.spark.sql.functions.{col, not}
+import org.apache.spark.sql.types.StringType
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.EnumSource
+
+import java.util.concurrent.atomic.AtomicInteger
+import java.util.stream.Collectors
+import scala.collection.JavaConverters._
+import scala.collection.{JavaConverters, mutable}
+
+class TestBloomFiltersIndexSupport extends HoodieSparkClientTestBase {
+
+  val sqlTempTable = "hudi_tbl_bloom"
+  var spark: SparkSession = _
+  var instantTime: AtomicInteger = _
+  val metadataOpts: Map[String, String] = Map(
+HoodieMetadataConfig.ENABLE.key -> "true",
+HoodieMetadataConfig.ENABLE_METADATA_INDEX_BLOOM_FILTER.key -> "true",
+HoodieMetadataConfig.BLOOM_FILTER_INDEX_FOR_COLUMNS.key -> "_row_key"
+  )
+  val commonOpts: Map[String, String] = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
+RECORDKEY_FIELD.key -> "_row_key",
+PARTITIONPATH_FIELD.key -> "partition",
+PRECOMBINE_FIELD.key -> "timestamp",
+HoodieTableConfig.POPULATE_META_FIELDS.key -> "true"
+  ) ++ metadataOpts
+  var mergedDfList: List[DataFrame] = List.empty
+
+  @BeforeEach
+  override def setUp(): Unit = {
+initPath()
+initSparkContexts()
+initHoodieStorage()
+initTestDataGenerator()
+
+setTableName("hoodie_test")
+initMetaClient()
+
+instantTime = new AtomicInteger(1)
+
+spark = sqlContext.sparkSession
+  }
+
+  @AfterEach
+  override def tearDown(): Unit = {
+cleanupFileSystem()
+cleanupSparkContexts()
+  }
+
+  @ParameterizedTest
+  @EnumSource(classOf[HoodieTableType])
+  def testIndexInitialization(tableType: HoodieTableType): Unit = {
+val hudiOpts = commonOpts + (DataSourceWriteOptions.TABLE_TYPE.key -> 
tableType.name())
+doWriteAndValidateBloomFilters(
+  hudiOpts,
+  operation = DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
+  saveMode = SaveMode.Overwrite)
+  }
+
+  /**
+   * Test case to do a write with updates and then validate file pruning using 
bloom filters.
+   */
+  @Test
+  def testBloomFiltersIndexFilePruning(): Unit = {
+var hudiOpts = commonOpts
+hudiOpts = hudiOpts + (
+  DataSourceReadOptions.ENABLE_DATA_SKIPPING.key -> "true")
+
+doWriteAndValidateBloomFilters(
+  hudiOpts,
+  operation = DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
+  saveMode = SaveMode.Overwrite,
+  shouldValidate = false)
+doWriteAndValidateBloomFilters(
+  hudiOpts,
+  operation = DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL,
+  saveMode = SaveMode.Append)
+
+createTempTable(hudiOpts)
+verifyQueryPredicate(hudiOpts)
+  }
+
+  private def createTempTable(hudiOpts: Map[String, String]): Unit = {
+val readDf = 

[I] [SUPPORT] using spark's observe feature on dataframes saved by hudi is stuck [hudi]

2024-05-30 Thread via GitHub


szingerpeter opened a new issue, #11367:
URL: https://github.com/apache/hudi/issues/11367

   **Describe the problem you faced**
   
   When trying to use the 
[observe](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.observe.html)
 function on dataframes saved by hudi the application gets stuck after saving 
the data and trying to retrieve the statistics.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   ```
   from pyspark.sql import DataFrame, Observation
   from pyspark.sql import functions as F
   observation = Observation()
   
   df = spark.createDataFrame([[1, 1], [2, 2], [3, 3], [4, 4]])
   
   df = df.observe(observation, F.count(F.lit(1)).alias('row_count'))
   
   
df.write.format('csv').mode('overwrite').save('file:/opt/spark/work-dir/test_csv')
   
   observation.get # returns: {'row_count': 4}
   
   observation2 = Observation()
   df2 = spark.createDataFrame([[1, 1], [2, 2], [3, 3], [4, 4]])
   
   hudi_options = {
   'hoodie.table.name': 'test',
   'hoodie.datasource.write.recordkey.field': '_1',
   'hoodie.datasource.write.partitionpath.field': '',
   'hoodie.datasource.write.table.name': 'test',
   'hoodie.datasource.write.operation': 'insert_overwrite',
   'hoodie.datasource.write.precombine.field': '_2',
   }
   
   df2 = df2.observe(observation2, F.count(F.lit(1)).alias('row_count'))
   
   df.write.format("hudi").\
   options(**hudi_options).\
   mode("overwrite").\
   save('file:/opt/spark/work-dir/test')
   
   observation2.get # gets stuck
   
   ```
   
   Disclaimer: I know there are hudi metrics and callbacks; however, i would 
like to add some more advanced quality checks to our applications
   
   **Environment Description**
   
   * Hudi version : 0.13.0-amzn-0
   
   * Spark version : 3.3.2
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : HDFS/S3 both
   
   * Running on Docker? (yes/no) : both
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Implement secondary index write path [hudi]

2024-05-30 Thread via GitHub


codope merged PR #11146:
URL: https://github.com/apache/hudi/pull/11146


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7146] Implement secondary index write path (#11146)

2024-05-30 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new cd62c31f368 [HUDI-7146] Implement secondary index write path (#11146)
cd62c31f368 is described below

commit cd62c31f368d6939c246bd58b77887104c4ca776
Author: Sagar Sumit 
AuthorDate: Thu May 30 15:56:51 2024 +0530

[HUDI-7146] Implement secondary index write path (#11146)

Main changes in this PR are for secondary index write path:

New index type added in MetadataPartitionType
Initialization of the new index in HoodieBackedTableMetadataWriter
Util methods to support index creation and update in HoodieTableMetadataUtil
Changes to HoodieBackedTableMetadataWriter to handle update and deletes for 
secondary index.
New APIs in HoodieTableMetadata and their implementation in 
BaseTableMetadata and HoodieBackedTableMetadata to load secondary index.
Changes in HoodieMergedLogRecordScanner to merge secondary index payloads.
---
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  18 +-
 .../org/apache/hudi/index/HoodieIndexUtils.java|   4 +-
 .../metadata/HoodieBackedTableMetadataWriter.java  | 215 -
 .../action/index/ScheduleIndexActionExecutor.java  |   2 +-
 .../BaseHoodieFunctionalIndexClient.java   |   2 +-
 .../apache/hudi/index/TestHoodieIndexUtils.java|  14 +-
 .../FlinkHoodieBackedTableMetadataWriter.java  |  15 +-
 .../JavaHoodieBackedTableMetadataWriter.java   |  15 +-
 .../SparkHoodieBackedTableMetadataWriter.java  |  27 ++-
 hudi-common/src/main/avro/HoodieMetadata.avsc  |  28 +++
 ...lIndexConfig.java => HoodieIndexingConfig.java} |  29 +--
 .../hudi/common/config/HoodieMetadataConfig.java   |  32 +++
 ...xDefinition.java => HoodieIndexDefinition.java} |  30 +--
 ...IndexMetadata.java => HoodieIndexMetadata.java} |  31 +--
 .../hudi/common/table/HoodieTableMetaClient.java   |  54 +++---
 .../common/table/log/HoodieFileSliceReader.java|   4 +-
 .../hudi/common/table/log/LogFileIterator.java |   0
 .../hudi/keygen/constant/KeyGeneratorOptions.java  |   7 +
 .../apache/hudi/metadata/BaseTableMetadata.java|  13 ++
 .../hudi/metadata/HoodieBackedTableMetadata.java   | 128 +---
 .../hudi/metadata/HoodieMetadataPayload.java   |  80 +++-
 .../hudi/metadata/HoodieTableMetadataUtil.java | 152 ++-
 .../hudi/metadata/MetadataPartitionType.java   |  32 ++-
 .../hudi/metadata/TestMetadataPartitionType.java   |  30 ++-
 .../hudi/HoodieSparkFunctionalIndexClient.java |  22 ++-
 .../scala/org/apache/hudi/DataSourceOptions.scala  |   7 +
 .../org/apache/hudi/FunctionalIndexSupport.scala   |   6 +-
 .../org/apache/hudi/HoodieSparkSqlWriter.scala |   1 +
 .../spark/sql/hudi/command/IndexCommands.scala |   2 +-
 .../hudi/functional/RecordLevelIndexTestBase.scala |  19 ++
 .../hudi/functional/SecondaryIndexTestBase.scala   |  65 +++
 .../functional}/TestFunctionalIndex.scala  |  25 +--
 .../functional/TestSecondaryIndexWithSql.scala |  98 ++
 33 files changed, 995 insertions(+), 212 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index be32ad8ac34..86a412fac64 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -27,7 +27,7 @@ import org.apache.hudi.common.config.ConfigGroups;
 import org.apache.hudi.common.config.ConfigProperty;
 import org.apache.hudi.common.config.HoodieCommonConfig;
 import org.apache.hudi.common.config.HoodieConfig;
-import org.apache.hudi.common.config.HoodieFunctionalIndexConfig;
+import org.apache.hudi.common.config.HoodieIndexingConfig;
 import org.apache.hudi.common.config.HoodieMemoryConfig;
 import org.apache.hudi.common.config.HoodieMetadataConfig;
 import org.apache.hudi.common.config.HoodieMetaserverConfig;
@@ -801,7 +801,7 @@ public class HoodieWriteConfig extends HoodieConfig {
   private HoodieCommonConfig commonConfig;
   private HoodieStorageConfig storageConfig;
   private HoodieTimeGeneratorConfig timeGeneratorConfig;
-  private HoodieFunctionalIndexConfig functionalIndexConfig;
+  private HoodieIndexingConfig indexingConfig;
   private EngineType engineType;
 
   /**
@@ -1199,7 +1199,7 @@ public class HoodieWriteConfig extends HoodieConfig {
 this.storageConfig = 
HoodieStorageConfig.newBuilder().fromProperties(props).build();
 this.timeGeneratorConfig = 
HoodieTimeGeneratorConfig.newBuilder().fromProperties(props)
 .withDefaultLockProvider(!isLockRequired()).build();
-this.functionalIndexConfig = 

Re: [PR] [HUDI-7815] Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh timeline [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11365:
URL: https://github.com/apache/hudi/pull/11365#issuecomment-2139068484

   
   ## CI report:
   
   * 8147454d905761bd2256aac273ef69aa1e56fba8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24143)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11162:
URL: https://github.com/apache/hudi/pull/11162#issuecomment-2139067855

   
   ## CI report:
   
   * a602c9c4234062e66877fc4bf2c50f94f43767bc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24142)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi Sink Connector shows broker disconnected [hudi]

2024-05-30 Thread via GitHub


prabodh1194 commented on issue #9070:
URL: https://github.com/apache/hudi/issues/9070#issuecomment-2139020981

   but still facing a bunch of issues in the java classpath.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7815] Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh timeline [hudi]

2024-05-30 Thread via GitHub


xuzifu666 commented on code in PR #11365:
URL: https://github.com/apache/hudi/pull/11365#discussion_r1620230806


##
hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java:
##
@@ -69,7 +69,7 @@ public class ClusteringUtils {
   public static Stream> 
getAllPendingClusteringPlans(
   HoodieTableMetaClient metaClient) {
 List pendingReplaceInstants =
-
metaClient.getActiveTimeline().filterPendingReplaceTimeline().getInstants();
+
metaClient.reloadActiveTimeline().filterPendingReplaceTimeline().getInstants();

Review Comment:
   multiple writer not do this,and heartbeat not set long engouh could cause 
it.All the next job would failed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7815] Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh timeline [hudi]

2024-05-30 Thread via GitHub


danny0405 commented on code in PR #11365:
URL: https://github.com/apache/hudi/pull/11365#discussion_r1620205175


##
hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java:
##
@@ -69,7 +69,7 @@ public class ClusteringUtils {
   public static Stream> 
getAllPendingClusteringPlans(
   HoodieTableMetaClient metaClient) {
 List pendingReplaceInstants =
-
metaClient.getActiveTimeline().filterPendingReplaceTimeline().getInstants();
+
metaClient.reloadActiveTimeline().filterPendingReplaceTimeline().getInstants();

Review Comment:
   It looks like all the invokers already have the refreshed timeline.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Implement secondary index write path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11146:
URL: https://github.com/apache/hudi/pull/11146#issuecomment-2138926167

   
   ## CI report:
   
   * 470bc5f44e7a6658a8717ef1b77e92afcdd90087 UNKNOWN
   * 43f73661f79eb87ac52d29fa153b996a15f29b99 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24141)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11043:
URL: https://github.com/apache/hudi/pull/11043#issuecomment-2138904248

   
   ## CI report:
   
   * 541b544049e68b3d22cdf0f5159fbd9b0005d345 UNKNOWN
   * 6ece7645a69b367901c71ab78dea15f39d69fca5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24140)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] CVE problems in latest 0.14.1 [hudi]

2024-05-30 Thread via GitHub


Smith-Cruise opened a new issue, #11366:
URL: https://github.com/apache/hudi/issues/11366

   CVE jars were introduced by `hudi-common`(in `hbase-server` and 
`hbase-client` transitive dependency)
   Could you let me know if the community plans to resolve these CVE 
dependencies?
   
   ```bash
   lib/hbase-protocol-shaded-2.4.18.jar
   
   Total: 49 (UNKNOWN: 0, LOW: 0, MEDIUM: 3, HIGH: 26, CRITICAL: 20)
   
   
+-+--+--+---++-+
   |   LIBRARY   | VULNERABILITY ID | SEVERITY 
| INSTALLED VERSION | FIXED VERSION  |  
TITLE  |
   
+-+--+--+---++-+
   | com.fasterxml.jackson.core:jackson-databind | CVE-2017-15095   | CRITICAL 
| 2.4.0 | 2.9.4, 2.8.11  | jackson-databind: Unsafe 
   |
   
+-+--+--+---++-+
   
   lib/htrace-core4-4.2.0-incubating.jar
   =
   Total: 49 (UNKNOWN: 0, LOW: 0, MEDIUM: 3, HIGH: 26, CRITICAL: 20)
   
   
+-+--+--+---++-+
   |   LIBRARY   | VULNERABILITY ID | SEVERITY 
| INSTALLED VERSION | FIXED VERSION  |  
TITLE  |
   
+-+--+--+---++-+
   | com.fasterxml.jackson.core:jackson-databind | CVE-2017-15095   | CRITICAL 
| 2.4.0 | 2.9.4, 2.8.11  | jackson-databind: Unsafe 
   |
   | |  |  
|   || deserialization due to   
   |
   | |  |  
|   || incomplete black list 
(incomplete   |
   | |  |  
|   || fix for 
CVE-2017-7525)...   |
   | |  |  
|   || 
-->avd.aquasec.com/nvd/cve-2017-15095   
|
   
+-+--+--+---++-+
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7815] Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh timeline [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11365:
URL: https://github.com/apache/hudi/pull/11365#issuecomment-2138822739

   
   ## CI report:
   
   * 8147454d905761bd2256aac273ef69aa1e56fba8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24143)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11162:
URL: https://github.com/apache/hudi/pull/11162#issuecomment-2138822123

   
   ## CI report:
   
   * 9d0e80222f6cc69b2dba6f4cdbfc642f31a95e52 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24124)
 
   * a602c9c4234062e66877fc4bf2c50f94f43767bc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24142)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Implement secondary index write path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11146:
URL: https://github.com/apache/hudi/pull/11146#issuecomment-2138821957

   
   ## CI report:
   
   * 470bc5f44e7a6658a8717ef1b77e92afcdd90087 UNKNOWN
   * e8a4507886bc97b1819ea39788f2abd7385b8cf2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24139)
 
   * 43f73661f79eb87ac52d29fa153b996a15f29b99 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24141)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11043:
URL: https://github.com/apache/hudi/pull/11043#issuecomment-2138821662

   
   ## CI report:
   
   * 541b544049e68b3d22cdf0f5159fbd9b0005d345 UNKNOWN
   * 87c15b2c23430d967749dede5e09d74a33dcce88 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24128)
 
   * 6ece7645a69b367901c71ab78dea15f39d69fca5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24140)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] With autogenerated keys HoodieStreamer failing with error - ts(Part -ts) field not found in record [hudi]

2024-05-30 Thread via GitHub


Sarfaraz-214 commented on issue #10233:
URL: https://github.com/apache/hudi/issues/10233#issuecomment-2138816691

   Hi @nsivabalan 
   I am already using INSERT mode. Shared all the configs above.
   
   `hoodie.spark.sql.insert.into.operation=insert`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7815] Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh timeline [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11365:
URL: https://github.com/apache/hudi/pull/11365#issuecomment-2138810841

   
   ## CI report:
   
   * 8147454d905761bd2256aac273ef69aa1e56fba8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11162:
URL: https://github.com/apache/hudi/pull/11162#issuecomment-2138810306

   
   ## CI report:
   
   * 9d0e80222f6cc69b2dba6f4cdbfc642f31a95e52 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24124)
 
   * a602c9c4234062e66877fc4bf2c50f94f43767bc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Implement secondary index write path [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11146:
URL: https://github.com/apache/hudi/pull/11146#issuecomment-2138810132

   
   ## CI report:
   
   * 470bc5f44e7a6658a8717ef1b77e92afcdd90087 UNKNOWN
   * e8a4507886bc97b1819ea39788f2abd7385b8cf2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24139)
 
   * 43f73661f79eb87ac52d29fa153b996a15f29b99 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]

2024-05-30 Thread via GitHub


hudi-bot commented on PR #11043:
URL: https://github.com/apache/hudi/pull/11043#issuecomment-2138809912

   
   ## CI report:
   
   * 541b544049e68b3d22cdf0f5159fbd9b0005d345 UNKNOWN
   * 87c15b2c23430d967749dede5e09d74a33dcce88 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24128)
 
   * 6ece7645a69b367901c71ab78dea15f39d69fca5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Implement secondary index write path [hudi]

2024-05-30 Thread via GitHub


codope commented on code in PR #11146:
URL: https://github.com/apache/hudi/pull/11146#discussion_r1620048869


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieIndexDefinition.java:
##
@@ -45,14 +49,14 @@ public class HoodieFunctionalIndexDefinition implements 
Serializable {
   // Any other configuration or properties specific to the index
   private Map indexOptions;
 
-  public HoodieFunctionalIndexDefinition() {
+  public HoodieIndexDefinition() {
   }
 
-  public HoodieFunctionalIndexDefinition(String indexName, String indexType, 
String indexFunction, List sourceFields,
- Map indexOptions) {
+  public HoodieIndexDefinition(String indexName, String indexType, String 
indexFunction, List sourceFields,
+   Map indexOptions) {
 this.indexName = indexName;
 this.indexType = indexType;
-this.indexFunction = indexFunction;
+this.indexFunction = nonEmpty(indexFunction) ? indexFunction : 
SPARK_IDENTITY;

Review Comment:
   On second thought, it should not bind to spark function. I will correct it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7815] Multiple writer with bulkinsert getAllPendingClusteringPl… [hudi]

2024-05-30 Thread via GitHub


xuzifu666 opened a new pull request, #11365:
URL: https://github.com/apache/hudi/pull/11365

   ### Change Logs
   
   Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh 
timeline
   Caused by: org.apache.hudi.exception.HoodieException: Error getting all file 
groups in pending clustering
at 
org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:135)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:113)
at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:108)
at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.(HoodieTableFileSystemView.java:102)
at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.(HoodieTableFileSystemView.java:93)
at 
org.apache.hudi.metadata.HoodieMetadataFileSystemView.(HoodieMetadataFileSystemView.java:44)
at 
org.apache.hudi.common.table.view.FileSystemViewManager.createInMemoryFileSystemView(FileSystemViewManager.java:166)
at 
org.apache.hudi.common.table.view.FileSystemViewManager.lambda$createViewManager$5fcdabfe$1(FileSystemViewManager.java:259)
   
   ### Impact
   
   low
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7815) Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh timeline

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7815:
-
Labels: pull-request-available  (was: )

> Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh 
> timeline
> 
>
> Key: HUDI-7815
> URL: https://issues.apache.org/jira/browse/HUDI-7815
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >