[jira] [Updated] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
[ https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7844: - Labels: pull-request-available (was: ) > Fix HoodieSparkSqlTestBase to throw error upon test failure > --- > > Key: HUDI-7844 > URL: https://issues.apache.org/jira/browse/HUDI-7844 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: Screenshot 2024-06-07 at 22.27.21.png > > > This PR ([https://github.com/apache/hudi/pull/11162]) introduces the > following changes that make `HoodieSparkSqlTestBase` to swallow test failures. > > !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
yihua opened a new pull request, #11416: URL: https://github.com/apache/hudi/pull/11416 ### Change Logs PR #11162 introduces the changes that make `HoodieSparkSqlTestBase` to swallow test failures. This PR reverts the changes so that test failures are surfaced locally and in CI. ### Impact Makes sure test failures are surfaced in CI. ### Risk level none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
[ https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7844: --- Assignee: Ethan Guo > Fix HoodieSparkSqlTestBase to throw error upon test failure > --- > > Key: HUDI-7844 > URL: https://issues.apache.org/jira/browse/HUDI-7844 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > Attachments: Screenshot 2024-06-07 at 22.27.21.png > > > This PR ([https://github.com/apache/hudi/pull/11162]) introduces the > following changes that make `HoodieSparkSqlTestBase` to swallow test failures. > > !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
[ https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7844: Description: This PR ([https://github.com/apache/hudi/pull/11162]) introduces the following changes that makes !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397! was: This PR (https://github.com/apache/hudi/pull/11162) introduces the following changes in > Fix HoodieSparkSqlTestBase to throw error upon test failure > --- > > Key: HUDI-7844 > URL: https://issues.apache.org/jira/browse/HUDI-7844 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > Attachments: Screenshot 2024-06-07 at 22.27.21.png > > > This PR ([https://github.com/apache/hudi/pull/11162]) introduces the > following changes that makes > !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
[ https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7844: Description: This PR ([https://github.com/apache/hudi/pull/11162]) introduces the following changes that make `HoodieSparkSqlTestBase` to swallow test failures. !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397! was: This PR ([https://github.com/apache/hudi/pull/11162]) introduces the following changes that makes !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397! > Fix HoodieSparkSqlTestBase to throw error upon test failure > --- > > Key: HUDI-7844 > URL: https://issues.apache.org/jira/browse/HUDI-7844 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > Attachments: Screenshot 2024-06-07 at 22.27.21.png > > > This PR ([https://github.com/apache/hudi/pull/11162]) introduces the > following changes that make `HoodieSparkSqlTestBase` to swallow test failures. > > !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2155818031 ## CI report: * a70cfc6db41a781bb3b6c9c8a9138892f7a12687 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24291) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
Ethan Guo created HUDI-7844: --- Summary: Fix HoodieSparkSqlTestBase to throw error upon test failure Key: HUDI-7844 URL: https://issues.apache.org/jira/browse/HUDI-7844 Project: Apache Hudi Issue Type: Bug Reporter: Ethan Guo Attachments: Screenshot 2024-06-07 at 22.27.21.png -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
[ https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7844: Fix Version/s: 1.0.0 > Fix HoodieSparkSqlTestBase to throw error upon test failure > --- > > Key: HUDI-7844 > URL: https://issues.apache.org/jira/browse/HUDI-7844 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > Attachments: Screenshot 2024-06-07 at 22.27.21.png > > > This PR (https://github.com/apache/hudi/pull/11162) introduces the following > changes in > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
[ https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7844: Description: This PR (https://github.com/apache/hudi/pull/11162) introduces the following changes in > Fix HoodieSparkSqlTestBase to throw error upon test failure > --- > > Key: HUDI-7844 > URL: https://issues.apache.org/jira/browse/HUDI-7844 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Attachments: Screenshot 2024-06-07 at 22.27.21.png > > > This PR (https://github.com/apache/hudi/pull/11162) introduces the following > changes in > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
[ https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7844: Attachment: Screenshot 2024-06-07 at 22.27.21.png > Fix HoodieSparkSqlTestBase to throw error upon test failure > --- > > Key: HUDI-7844 > URL: https://issues.apache.org/jira/browse/HUDI-7844 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > Attachments: Screenshot 2024-06-07 at 22.27.21.png > > > This PR (https://github.com/apache/hudi/pull/11162) introduces the following > changes in > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7843) Support record merge mode with partial updates
[ https://issues.apache.org/jira/browse/HUDI-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7843: Description: Right now the new partial update support with partial updates stored in the log block works in Spark SQL MERGE INTO and the merging logic either based on the transaction time or event time is fully taken care inside "HoodieSparkRecordMerger". It would be good to decouple the merging logic from the merger and use the record merge mode to control how to merge partial updates. (was: Right now the new partial update support with partial updates stored in the log block works in Spark SQL MERGE INTO and we assume that ) > Support record merge mode with partial updates > -- > > Key: HUDI-7843 > URL: https://issues.apache.org/jira/browse/HUDI-7843 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > Right now the new partial update support with partial updates stored in the > log block works in Spark SQL MERGE INTO and the merging logic either based on > the transaction time or event time is fully taken care inside > "HoodieSparkRecordMerger". It would be good to decouple the merging logic > from the merger and use the record merge mode to control how to merge partial > updates. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7843) Support record merge mode with partial updates
[ https://issues.apache.org/jira/browse/HUDI-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7843: Fix Version/s: 1.0.0 > Support record merge mode with partial updates > -- > > Key: HUDI-7843 > URL: https://issues.apache.org/jira/browse/HUDI-7843 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > Right now the new partial update support with partial updates stored in the > log block works in Spark SQL MERGE INTO and we assume that -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7843) Support record merge mode with partial updates
[ https://issues.apache.org/jira/browse/HUDI-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7843: Description: Right now the new partial update support with partial updates stored in the log block works in Spark SQL MERGE INTO and we assume that > Support record merge mode with partial updates > -- > > Key: HUDI-7843 > URL: https://issues.apache.org/jira/browse/HUDI-7843 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > > Right now the new partial update support with partial updates stored in the > log block works in Spark SQL MERGE INTO and we assume that -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7843) Support record merge mode with partial updates
Ethan Guo created HUDI-7843: --- Summary: Support record merge mode with partial updates Key: HUDI-7843 URL: https://issues.apache.org/jira/browse/HUDI-7843 Project: Apache Hudi Issue Type: New Feature Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2155806588 ## CI report: * c2dec94b442920784b3914cc13b87294e734a477 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24272) * a70cfc6db41a781bb3b6c9c8a9138892f7a12687 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24291) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2155804768 ## CI report: * c2dec94b442920784b3914cc13b87294e734a477 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24272) * a70cfc6db41a781bb3b6c9c8a9138892f7a12687 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7842) Update docs with the new record merge mode config
[ https://issues.apache.org/jira/browse/HUDI-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7842: Fix Version/s: 1.0.0 > Update docs with the new record merge mode config > - > > Key: HUDI-7842 > URL: https://issues.apache.org/jira/browse/HUDI-7842 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > We should educate users on the new record merge mode config introduced by > HUDI-6798 that simplifies configs controlling the merging behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7842) Update docs with the new record merge mode config
[ https://issues.apache.org/jira/browse/HUDI-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7842: Description: We should educate users on the new record merge mode config introduced by HUDI-6798 that simplifies configs controlling the merging behavior. > Update docs with the new record merge mode config > - > > Key: HUDI-7842 > URL: https://issues.apache.org/jira/browse/HUDI-7842 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Priority: Major > > We should educate users on the new record merge mode config introduced by > HUDI-6798 that simplifies configs controlling the merging behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7842) Update docs with the new record merge mode config
Ethan Guo created HUDI-7842: --- Summary: Update docs with the new record merge mode config Key: HUDI-7842 URL: https://issues.apache.org/jira/browse/HUDI-7842 Project: Apache Hudi Issue Type: Task Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch branch-0.x updated: [MINOR] use scala.math.abs instead of calcite abs (#11412)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch branch-0.x in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/branch-0.x by this push: new 041964ab711 [MINOR] use scala.math.abs instead of calcite abs (#11412) 041964ab711 is described below commit 041964ab71175739134c252bef870693d8bba14e Author: Shawn Chang <42792772+c...@users.noreply.github.com> AuthorDate: Fri Jun 7 19:51:53 2024 -0700 [MINOR] use scala.math.abs instead of calcite abs (#11412) Co-authored-by: Shawn Chang --- .../scala/org/apache/hudi/functional/TestParquetColumnProjection.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala index 0173c3f642a..c256cf32fb3 100644 --- a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala +++ b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala @@ -29,7 +29,6 @@ import org.apache.hudi.testutils.SparkClientFunctionalTestHarness.getSparkSqlCon import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, DefaultSource, HoodieBaseRelation, HoodieSparkUtils, HoodieUnsafeRDD} import org.apache.avro.Schema -import org.apache.calcite.runtime.SqlFunctions.abs import org.apache.parquet.hadoop.util.counters.BenchmarkCounter import org.apache.spark.SparkConf import org.apache.spark.internal.Logging @@ -39,6 +38,7 @@ import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertTrue, import org.junit.jupiter.api.{Disabled, Tag, Test} import scala.collection.JavaConverters._ +import scala.math.abs @Tag("functional") class TestParquetColumnProjection extends SparkClientFunctionalTestHarness with Logging {
Re: [PR] [MINOR][branch-0.x] Remove calcite dependency [hudi]
danny0405 merged PR #11412: URL: https://github.com/apache/hudi/pull/11412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version (#11406)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a33b2a5e03f [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version (#11406) a33b2a5e03f is described below commit a33b2a5e03f434e3ce270a128be626ae9e9e78c9 Author: Balaji Varadarajan AuthorDate: Fri Jun 7 18:43:33 2024 -0700 [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version (#11406) Co-authored-by: Balaji Varadarajan Co-authored-by: Y Ethan Guo --- .../apache/hudi/cli/commands/RepairsCommand.java | 4 +++ .../upgrade/EightToSevenDowngradeHandler.java | 37 +++ .../table/upgrade/SevenToEightUpgradeHandler.java | 38 .../table/upgrade/SevenToSixDowngradeHandler.java | 40 + .../table/upgrade/SixToSevenUpgradeHandler.java| 42 ++ .../hudi/table/upgrade/UpgradeDowngrade.java | 8 + .../hudi/common/table/HoodieTableConfig.java | 16 + .../hudi/common/table/HoodieTableMetaClient.java | 3 ++ .../hudi/common/table/HoodieTableVersion.java | 10 -- .../common/table/TestHoodieTableMetaClient.java| 1 + .../RepairOverwriteHoodiePropsProcedure.scala | 5 ++- .../sql/hudi/procedure/TestRepairsProcedure.scala | 1 + .../TestUpgradeOrDowngradeProcedure.scala | 4 +-- 13 files changed, 203 insertions(+), 6 deletions(-) diff --git a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java index 57ec8ccf57b..569136e0b50 100644 --- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java +++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java @@ -161,6 +161,10 @@ public class RepairsCommand { newProps.load(fileInputStream); } Map oldProps = client.getTableConfig().propsMap(); +// Copy Initial Version from old-props to new-props +if (oldProps.containsKey(HoodieTableConfig.INITIAL_VERSION.key())) { + newProps.put(HoodieTableConfig.INITIAL_VERSION.key(), oldProps.get(HoodieTableConfig.INITIAL_VERSION.key())); +} HoodieTableConfig.create(client.getStorage(), client.getMetaPath(), newProps); // reload new props as checksum would have been added newProps = diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/EightToSevenDowngradeHandler.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/EightToSevenDowngradeHandler.java new file mode 100644 index 000..3bb22481681 --- /dev/null +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/EightToSevenDowngradeHandler.java @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.table.upgrade; + +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.config.HoodieWriteConfig; + +import java.util.Collections; +import java.util.Map; + +/** + * Version 7 is going to be placeholder version for bridge release 0.16.0. + * Version 8 is the placeholder version to track 1.x. + */ +public class EightToSevenDowngradeHandler implements DowngradeHandler { + @Override + public Map downgrade(HoodieWriteConfig config, HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) { +return Collections.emptyMap(); + } +} diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SevenToEightUpgradeHandler.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SevenToEightUpgradeHandler.java new file mode 100644 index 000..9ed4f192786 --- /dev/null +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SevenToEightUpgradeHandler.java @@ -0,0 +1,38 @@ +/* + * Licensed to the
Re: [PR] [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version [hudi]
yihua merged PR #11406: URL: https://github.com/apache/hudi/pull/11406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version [hudi]
hudi-bot commented on PR #11406: URL: https://github.com/apache/hudi/pull/11406#issuecomment-2155749354 ## CI report: * da08a0b3c0524b46e70a4cbed8ab82eb5f84f24c UNKNOWN * 5ff06d980691f473a959e149377e7aa14eaf7a55 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24287) * 5ff06d9806 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR][branch-0.x] Remove calcite dependency [hudi]
hudi-bot commented on PR #11412: URL: https://github.com/apache/hudi/pull/11412#issuecomment-2155747004 ## CI report: * f6f9c59cde7928b625332163d3118630fb199c27 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24289) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version [hudi]
yihua commented on PR #11406: URL: https://github.com/apache/hudi/pull/11406#issuecomment-2155725862 Azure CI is green. https://github.com/apache/hudi/assets/2497195/dfb773d1-1753-4bcd-9f05-27037985bf0a;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR][branch-0.x] Remove calcite dependency [hudi]
hudi-bot commented on PR #11412: URL: https://github.com/apache/hudi/pull/11412#issuecomment-2155725721 ## CI report: * 22f518dc886318f5e5af58765436b353a45c0f21 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24260) * f6f9c59cde7928b625332163d3118630fb199c27 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24289) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR][branch-0.x] Remove calcite dependency [hudi]
hudi-bot commented on PR #11412: URL: https://github.com/apache/hudi/pull/11412#issuecomment-2155722554 ## CI report: * 22f518dc886318f5e5af58765436b353a45c0f21 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24260) * f6f9c59cde7928b625332163d3118630fb199c27 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7840] Add position merging to the new file group reader (#11413)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 0d7cc87d687 [HUDI-7840] Add position merging to the new file group reader (#11413) 0d7cc87d687 is described below commit 0d7cc87d687bd235bac099e481535cb9f223b501 Author: Jon Vexler AuthorDate: Fri Jun 7 19:47:42 2024 -0400 [HUDI-7840] Add position merging to the new file group reader (#11413) Co-authored-by: Jonathan Vexler <=> Co-authored-by: Sagar Sumit --- .../SparkFileFormatInternalRowReaderContext.scala | 202 +++ .../hudi/common/engine/HoodieReaderContext.java| 22 +++ .../common/table/read/HoodieFileGroupReader.java | 6 +- .../HoodiePositionBasedFileGroupRecordBuffer.java | 4 +- .../read/HoodiePositionBasedSchemaHandler.java | 75 ...odieFileGroupReaderBasedParquetFileFormat.scala | 2 +- ...stSparkFileFormatInternalRowReaderContext.scala | 72 +++ ...stHoodiePositionBasedFileGroupRecordBuffer.java | 214 + .../functional/TestFiltersInFileGroupReader.java | 109 +++ .../read/TestHoodieFileGroupReaderOnSpark.scala| 2 +- .../TestSpark35RecordPositionMetadataColumn.scala | 143 ++ 11 files changed, 812 insertions(+), 39 deletions(-) diff --git a/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala b/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala index 640f1219fbf..715e2d9a9ab 100644 --- a/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala +++ b/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala @@ -22,10 +22,14 @@ package org.apache.hudi import org.apache.avro.Schema import org.apache.avro.generic.IndexedRecord import org.apache.hadoop.conf.Configuration +import org.apache.hudi.SparkFileFormatInternalRowReaderContext.{filterIsSafeForBootstrap, getAppliedRequiredSchema} +import org.apache.hudi.avro.AvroSchemaUtils import org.apache.hudi.common.engine.HoodieReaderContext import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.table.read.HoodiePositionBasedFileGroupRecordBuffer.ROW_INDEX_TEMPORARY_COLUMN_NAME import org.apache.hudi.common.util.ValidationUtils.checkState -import org.apache.hudi.common.util.collection.{ClosableIterator, CloseableMappingIterator} +import org.apache.hudi.common.util.collection.{CachingIterator, ClosableIterator, CloseableMappingIterator} import org.apache.hudi.io.storage.{HoodieSparkFileReaderFactory, HoodieSparkParquetReader} import org.apache.hudi.storage.{HoodieStorage, StorageConfiguration, StoragePath} import org.apache.hudi.util.CloseableInternalRowIterator @@ -37,7 +41,7 @@ import org.apache.spark.sql.execution.datasources.PartitionedFile import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, SparkParquetReader} import org.apache.spark.sql.hudi.SparkAdapter import org.apache.spark.sql.sources.Filter -import org.apache.spark.sql.types.StructType +import org.apache.spark.sql.types.{LongType, MetadataBuilder, StructField, StructType} import org.apache.spark.sql.vectorized.{ColumnVector, ColumnarBatch} import scala.collection.mutable @@ -53,12 +57,20 @@ import scala.collection.mutable * not required for reading a file group with only log files. * @param recordKeyColumn column name for the recordkey * @param filters spark filters that might be pushed down into the reader + * @param requiredFilters filters that are required and should always be used, even in merging situations */ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetReader, recordKeyColumn: String, - filters: Seq[Filter]) extends BaseSparkInternalRowReaderContext { + filters: Seq[Filter], + requiredFilters: Seq[Filter]) extends BaseSparkInternalRowReaderContext { lazy val sparkAdapter: SparkAdapter = SparkAdapterSupport.sparkAdapter + private lazy val bootstrapSafeFilters: Seq[Filter] = filters.filter(filterIsSafeForBootstrap) ++ requiredFilters private val deserializerMap: mutable.Map[Schema, HoodieAvroDeserializer] = mutable.Map() + private lazy val allFilters = filters ++ requiredFilters + + override def supportsParquetRowIndex: Boolean = { +HoodieSparkUtils.gteqSpark3_5 + } override def getFileRecordIterator(filePath: StoragePath, start: Long, @@ -66,6 +78,10 @@ class
Re: [PR] [HUDI-7840] Add position merging to the new file group reader [hudi]
yihua merged PR #11413: URL: https://github.com/apache/hudi/pull/11413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to the new file group reader [hudi]
yihua commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155701487 Azure CI is green. https://github.com/apache/hudi/assets/2497195/2bf57a53-50c2-4881-b09e-b9d8025c058a;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version [hudi]
hudi-bot commented on PR #11406: URL: https://github.com/apache/hudi/pull/11406#issuecomment-2155699424 ## CI report: * da08a0b3c0524b46e70a4cbed8ab82eb5f84f24c UNKNOWN * 5ff06d980691f473a959e149377e7aa14eaf7a55 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24287) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version [hudi]
hudi-bot commented on PR #11406: URL: https://github.com/apache/hudi/pull/11406#issuecomment-2155695586 ## CI report: * da08a0b3c0524b46e70a4cbed8ab82eb5f84f24c UNKNOWN * 901c7f94b1b56ac19867d5d0deab34eb35ebce2c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24276) * 5ff06d980691f473a959e149377e7aa14eaf7a55 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to the new file group reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155691424 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 0bf72cfded469e3cc1091827cbe4f2f3c16de830 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24285) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24284) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to the new file group reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155661535 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 6c15d7a0558284728296d74c9acbc6805230d9a2 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24283) * 0bf72cfded469e3cc1091827cbe4f2f3c16de830 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to the new file group reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155656514 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 4d4c0fdc03b72cfb7ad86172a75fcca439e42682 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24281) * 6c15d7a0558284728296d74c9acbc6805230d9a2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155650961 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 1ce1d753818efb6be00c20fb5a8dd141c7c47f00 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24280) * 4d4c0fdc03b72cfb7ad86172a75fcca439e42682 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
yihua commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631742303 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -116,45 +143,154 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea skeletonRequiredSchema: Schema, dataFileIterator: ClosableIterator[InternalRow], dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { -doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], - dataFileIterator.asInstanceOf[ClosableIterator[Any]]) +doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], skeletonRequiredSchema, + dataFileIterator.asInstanceOf[ClosableIterator[Any]], dataRequiredSchema) } - protected def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], dataFileIterator: ClosableIterator[Any]): ClosableIterator[InternalRow] = { -new ClosableIterator[Any] { - val combinedRow = new JoinedRow() + private def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], + skeletonRequiredSchema: Schema, + dataFileIterator: ClosableIterator[Any], + dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { +if (supportsPositionField()) { + assert(AvroSchemaUtils.containsFieldInSchema(skeletonRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + assert(AvroSchemaUtils.containsFieldInSchema(dataRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + val rowIndexColumn = new java.util.HashSet[String]() + rowIndexColumn.add(ROW_INDEX_TEMPORARY_COLUMN_NAME) + //always remove the row index column from the skeleton because the data file will also have the same column + val skeletonProjection = projectRecord(skeletonRequiredSchema, +AvroSchemaUtils.removeFieldsFromSchema(skeletonRequiredSchema, rowIndexColumn)) - override def hasNext: Boolean = { -//If the iterators are out of sync it is probably due to filter pushdown -checkState(dataFileIterator.hasNext == skeletonFileIterator.hasNext, - "Bootstrap data-file iterator and skeleton-file iterator have to be in-sync!") -dataFileIterator.hasNext && skeletonFileIterator.hasNext + //If we need to do position based merging with log files we will leave the row index column at the end + val dataProjection = if (getHasLogFiles && getUseRecordPosition) { +getIdentityProjection + } else { +projectRecord(dataRequiredSchema, + AvroSchemaUtils.removeFieldsFromSchema(dataRequiredSchema, rowIndexColumn)) } - override def next(): Any = { -(skeletonFileIterator.next(), dataFileIterator.next()) match { - case (s: ColumnarBatch, d: ColumnarBatch) => -val numCols = s.numCols() + d.numCols() -val vecs: Array[ColumnVector] = new Array[ColumnVector](numCols) -for (i <- 0 until numCols) { - if (i < s.numCols()) { -vecs(i) = s.column(i) + //Always use internal row for positional merge because Review Comment: So iterating through the rows are still needed for stitching? The filtering may still happen within the parquet page/batch since the page level filtering is based on the column stats, if that is what you're talking about. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7693) Allow Vectorized Reading for bootstrap in the new fg reader under some conditions
[ https://issues.apache.org/jira/browse/HUDI-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7693: -- Description: Vectorized reading can be used for bootstrap if we don't need to do any merging. Additionally, it can be used if no filters are pushed down. With row index positions, some pushdown filtering could even be allowed (was: Vectorized reading can be used for bootstrap if we don't need to do any merging. Additionally, it can be used if no filters are pushed down.) > Allow Vectorized Reading for bootstrap in the new fg reader under some > conditions > - > > Key: HUDI-7693 > URL: https://issues.apache.org/jira/browse/HUDI-7693 > Project: Apache Hudi > Issue Type: Improvement > Components: spark, spark-sql >Reporter: Jonathan Vexler >Priority: Minor > > Vectorized reading can be used for bootstrap if we don't need to do any > merging. Additionally, it can be used if no filters are pushed down. With row > index positions, some pushdown filtering could even be allowed -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
jonvex commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631738403 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -116,45 +143,154 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea skeletonRequiredSchema: Schema, dataFileIterator: ClosableIterator[InternalRow], dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { -doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], - dataFileIterator.asInstanceOf[ClosableIterator[Any]]) +doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], skeletonRequiredSchema, + dataFileIterator.asInstanceOf[ClosableIterator[Any]], dataRequiredSchema) } - protected def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], dataFileIterator: ClosableIterator[Any]): ClosableIterator[InternalRow] = { -new ClosableIterator[Any] { - val combinedRow = new JoinedRow() + private def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], + skeletonRequiredSchema: Schema, + dataFileIterator: ClosableIterator[Any], + dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { +if (supportsPositionField()) { + assert(AvroSchemaUtils.containsFieldInSchema(skeletonRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + assert(AvroSchemaUtils.containsFieldInSchema(dataRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + val rowIndexColumn = new java.util.HashSet[String]() + rowIndexColumn.add(ROW_INDEX_TEMPORARY_COLUMN_NAME) + //always remove the row index column from the skeleton because the data file will also have the same column + val skeletonProjection = projectRecord(skeletonRequiredSchema, +AvroSchemaUtils.removeFieldsFromSchema(skeletonRequiredSchema, rowIndexColumn)) - override def hasNext: Boolean = { -//If the iterators are out of sync it is probably due to filter pushdown -checkState(dataFileIterator.hasNext == skeletonFileIterator.hasNext, - "Bootstrap data-file iterator and skeleton-file iterator have to be in-sync!") -dataFileIterator.hasNext && skeletonFileIterator.hasNext + //If we need to do position based merging with log files we will leave the row index column at the end + val dataProjection = if (getHasLogFiles && getUseRecordPosition) { +getIdentityProjection + } else { +projectRecord(dataRequiredSchema, + AvroSchemaUtils.removeFieldsFromSchema(dataRequiredSchema, rowIndexColumn)) } - override def next(): Any = { -(skeletonFileIterator.next(), dataFileIterator.next()) match { - case (s: ColumnarBatch, d: ColumnarBatch) => -val numCols = s.numCols() + d.numCols() -val vecs: Array[ColumnVector] = new Array[ColumnVector](numCols) -for (i <- 0 until numCols) { - if (i < s.numCols()) { -vecs(i) = s.column(i) + //Always use internal row for positional merge because Review Comment: https://issues.apache.org/jira/browse/HUDI-7693 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions. Introduce new hoodie table property to track initial table version when table was created. This is needed to identify if the table was originall
yihua commented on code in PR #11406: URL: https://github.com/apache/hudi/pull/11406#discussion_r1631737898 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SixToSevenUpgradeHandler.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.table.upgrade; + +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.config.HoodieWriteConfig; + +import java.util.Collections; +import java.util.Map; + +/** + * Version 7 is going to be placeholder version for bridge release 0.16.0. + * Version 8 is the placeholder version to track 1.x. + */ +public class SixToSevenUpgradeHandler implements UpgradeHandler { + @Override + public Map upgrade(HoodieWriteConfig config, HoodieEngineContext context, + String instantTime, + SupportsUpgradeDowngrade upgradeDowngradeHelper) { +return Collections.emptyMap(); + } Review Comment: Makes sense. Sounds good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
jonvex commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631737729 ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestHoodiePositionBasedFileGroupRecordBuffer.java: ## @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi; + +import org.apache.hudi.common.config.HoodieStorageConfig; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.engine.HoodieReaderContext; +import org.apache.hudi.common.model.DeleteRecord; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordMerger; +import org.apache.hudi.common.table.HoodieTableConfig; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.TableSchemaResolver; +import org.apache.hudi.common.table.log.block.HoodieDeleteBlock; +import org.apache.hudi.common.table.log.block.HoodieLogBlock; +import org.apache.hudi.common.table.read.HoodiePositionBasedFileGroupRecordBuffer; +import org.apache.hudi.common.table.read.HoodiePositionBasedSchemaHandler; +import org.apache.hudi.common.table.read.TestHoodieFileGroupReaderOnSpark; +import org.apache.hudi.common.testutils.HoodieTestDataGenerator; +import org.apache.hudi.common.testutils.SchemaTestUtil; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.ExternalSpillableMap; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieValidationException; +import org.apache.hudi.keygen.constant.KeyGeneratorOptions; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; +import org.apache.spark.sql.catalyst.InternalRow; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.net.URISyntaxException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import static org.apache.hudi.common.engine.HoodieReaderContext.INTERNAL_META_RECORD_KEY; +import static org.apache.hudi.common.model.WriteOperationType.INSERT; +import static org.apache.hudi.common.testutils.HoodieTestUtils.createMetaClient; +import static org.apache.hudi.common.testutils.RawTripTestPayload.recordsToStrings; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertNull; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +public class TestHoodiePositionBasedFileGroupRecordBuffer extends TestHoodieFileGroupReaderOnSpark { + private final HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator(0xDEEF); + private HoodieTableMetaClient metaClient; + private Schema avroSchema; + private HoodiePositionBasedFileGroupRecordBuffer buffer; + private String partitionPath; + + public void prepareBuffer(boolean useCustomMerger) throws Exception { +Map writeConfigs = new HashMap<>(); +writeConfigs.put(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key(), "parquet"); +writeConfigs.put(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key(), "_row_key"); +writeConfigs.put(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key(), "partition_path"); +writeConfigs.put("hoodie.datasource.write.precombine.field", "timestamp"); +writeConfigs.put("hoodie.payload.ordering.field", "timestamp"); +writeConfigs.put(HoodieTableConfig.HOODIE_TABLE_NAME_KEY, "hoodie_test"); +writeConfigs.put("hoodie.insert.shuffle.parallelism", "4"); +writeConfigs.put("hoodie.upsert.shuffle.parallelism", "4"); +writeConfigs.put("hoodie.bulkinsert.shuffle.parallelism", "2"); +writeConfigs.put("hoodie.delete.shuffle.parallelism", "1"); +writeConfigs.put("hoodie.merge.small.file.group.candidates.limit", "0"); +writeConfigs.put("hoodie.compact.inline", "false"); +
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
jonvex commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631737122 ## hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java: ## @@ -301,9 +311,19 @@ public final UnaryOperator projectRecord(Schema from, Schema to) { * @return the record position in the base file. */ public long extractRecordPosition(T record, Schema schema, String fieldName, long providedPositionIfNeeded) { +if (supportsParquetRowIndex()) { + Object position = getValue(record, schema, fieldName); + if (position != null) { +return (long) position; + } Review Comment: sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
yihua commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631729256 ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestHoodiePositionBasedFileGroupRecordBuffer.java: ## @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi; + +import org.apache.hudi.common.config.HoodieStorageConfig; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.engine.HoodieReaderContext; +import org.apache.hudi.common.model.DeleteRecord; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordMerger; +import org.apache.hudi.common.table.HoodieTableConfig; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.TableSchemaResolver; +import org.apache.hudi.common.table.log.block.HoodieDeleteBlock; +import org.apache.hudi.common.table.log.block.HoodieLogBlock; +import org.apache.hudi.common.table.read.HoodiePositionBasedFileGroupRecordBuffer; +import org.apache.hudi.common.table.read.HoodiePositionBasedSchemaHandler; +import org.apache.hudi.common.table.read.TestHoodieFileGroupReaderOnSpark; +import org.apache.hudi.common.testutils.HoodieTestDataGenerator; +import org.apache.hudi.common.testutils.SchemaTestUtil; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.ExternalSpillableMap; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieValidationException; +import org.apache.hudi.keygen.constant.KeyGeneratorOptions; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; +import org.apache.spark.sql.catalyst.InternalRow; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.net.URISyntaxException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import static org.apache.hudi.common.engine.HoodieReaderContext.INTERNAL_META_RECORD_KEY; +import static org.apache.hudi.common.model.WriteOperationType.INSERT; +import static org.apache.hudi.common.testutils.HoodieTestUtils.createMetaClient; +import static org.apache.hudi.common.testutils.RawTripTestPayload.recordsToStrings; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertNull; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +public class TestHoodiePositionBasedFileGroupRecordBuffer extends TestHoodieFileGroupReaderOnSpark { + private final HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator(0xDEEF); + private HoodieTableMetaClient metaClient; + private Schema avroSchema; + private HoodiePositionBasedFileGroupRecordBuffer buffer; + private String partitionPath; + + public void prepareBuffer(boolean useCustomMerger) throws Exception { +Map writeConfigs = new HashMap<>(); +writeConfigs.put(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key(), "parquet"); +writeConfigs.put(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key(), "_row_key"); +writeConfigs.put(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key(), "partition_path"); +writeConfigs.put("hoodie.datasource.write.precombine.field", "timestamp"); +writeConfigs.put("hoodie.payload.ordering.field", "timestamp"); +writeConfigs.put(HoodieTableConfig.HOODIE_TABLE_NAME_KEY, "hoodie_test"); +writeConfigs.put("hoodie.insert.shuffle.parallelism", "4"); +writeConfigs.put("hoodie.upsert.shuffle.parallelism", "4"); +writeConfigs.put("hoodie.bulkinsert.shuffle.parallelism", "2"); +writeConfigs.put("hoodie.delete.shuffle.parallelism", "1"); +writeConfigs.put("hoodie.merge.small.file.group.candidates.limit", "0"); +writeConfigs.put("hoodie.compact.inline", "false"); +
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-215555 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * d00f2862fb8dd8a84fcc5aa1900e76577b8a9bf1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24275) * 1ce1d753818efb6be00c20fb5a8dd141c7c47f00 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
jonvex commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631723794 ## hudi-spark-datasource/hudi-spark-common/src/test/scala/org/apache/spark/execution/datasources/parquet/TestSparkFileFormatInternalRowReaderContext.scala: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.execution.datasources.parquet + +import org.apache.hudi.SparkFileFormatInternalRowReaderContext +import org.apache.hudi.SparkFileFormatInternalRowReaderContext.filterIsSafeForBootstrap +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.table.read.HoodiePositionBasedFileGroupRecordBuffer.ROW_INDEX_TEMPORARY_COLUMN_NAME +import org.apache.hudi.testutils.SparkClientFunctionalTestHarness +import org.apache.spark.sql.sources.{And, IsNotNull, Or} +import org.apache.spark.sql.types.{LongType, StringType, StructField, StructType} +import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertTrue} +import org.junit.jupiter.api.Test + +class TestSparkFileFormatInternalRowReaderContext extends SparkClientFunctionalTestHarness { + + @Test + def testBootstrapFilters(): Unit = { +val recordKeyField = HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName +val commitTimeField = HoodieRecord.HoodieMetadataField.COMMIT_TIME_METADATA_FIELD.getFieldName + +val recordKeyFilter = IsNotNull(recordKeyField) +assertTrue(filterIsSafeForBootstrap(recordKeyFilter)) +val commitTimeFilter = IsNotNull(commitTimeField) +assertTrue(filterIsSafeForBootstrap(commitTimeFilter)) + +val dataFieldFilter = IsNotNull("someotherfield") +assertTrue(filterIsSafeForBootstrap(dataFieldFilter)) + +val legalComplexFilter = Or(recordKeyFilter, commitTimeFilter) +assertTrue(filterIsSafeForBootstrap(legalComplexFilter)) + +val illegalComplexFilter = Or(recordKeyFilter, dataFieldFilter) +assertFalse(filterIsSafeForBootstrap(illegalComplexFilter)) + +val illegalNestedFilter = And(legalComplexFilter, illegalComplexFilter) +assertFalse(filterIsSafeForBootstrap(illegalNestedFilter)) + +val legalNestedFilter = And(legalComplexFilter, recordKeyFilter) +assertTrue(filterIsSafeForBootstrap(legalNestedFilter)) + } + + @Test + def testGetAppliedRequiredSchema(): Unit = { +val fields = Array( + StructField("column_a", LongType, nullable = false), + StructField("column_b", StringType, nullable = false)) +val requiredSchema = StructType(fields) + +val appliedSchema: StructType = SparkFileFormatInternalRowReaderContext.getAppliedRequiredSchema( Review Comment: TestFiltersInFileGroupReader does tests to ensure that filters are pushed down when they should be. I also set breakpoints to make sure the filtering was actually happening. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
yihua commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631715014 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedSchemaHandler.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.common.table.read; + +import org.apache.hudi.common.engine.HoodieReaderContext; +import org.apache.hudi.common.table.HoodieTableConfig; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.internal.schema.InternalSchema; + +import org.apache.avro.Schema; + +import java.util.Collections; +import java.util.List; + +import static org.apache.hudi.avro.AvroSchemaUtils.appendFieldsToSchemaDedupNested; + +/** + * This class is responsible for handling the schema for the file group reader that supports positional merge. + */ +public class HoodiePositionBasedSchemaHandler extends HoodieFileGroupReaderSchemaHandler { + public HoodiePositionBasedSchemaHandler(HoodieReaderContext readerContext, + Schema dataSchema, + Schema requestedSchema, + Option internalSchemaOpt, + HoodieTableConfig hoodieTableConfig) { +super(readerContext, dataSchema, requestedSchema, internalSchemaOpt, hoodieTableConfig); + Review Comment: nit: remove empty line. ## hudi-spark-datasource/hudi-spark-common/src/test/scala/org/apache/spark/execution/datasources/parquet/TestSparkFileFormatInternalRowReaderContext.scala: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.execution.datasources.parquet + +import org.apache.hudi.SparkFileFormatInternalRowReaderContext +import org.apache.hudi.SparkFileFormatInternalRowReaderContext.filterIsSafeForBootstrap +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.table.read.HoodiePositionBasedFileGroupRecordBuffer.ROW_INDEX_TEMPORARY_COLUMN_NAME +import org.apache.hudi.testutils.SparkClientFunctionalTestHarness +import org.apache.spark.sql.sources.{And, IsNotNull, Or} +import org.apache.spark.sql.types.{LongType, StringType, StructField, StructType} +import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertTrue} +import org.junit.jupiter.api.Test + +class TestSparkFileFormatInternalRowReaderContext extends SparkClientFunctionalTestHarness { + + @Test + def testBootstrapFilters(): Unit = { +val recordKeyField = HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName +val commitTimeField = HoodieRecord.HoodieMetadataField.COMMIT_TIME_METADATA_FIELD.getFieldName + +val recordKeyFilter = IsNotNull(recordKeyField) +assertTrue(filterIsSafeForBootstrap(recordKeyFilter)) +val commitTimeFilter = IsNotNull(commitTimeField) +assertTrue(filterIsSafeForBootstrap(commitTimeFilter)) + +val dataFieldFilter = IsNotNull("someotherfield") +assertTrue(filterIsSafeForBootstrap(dataFieldFilter)) + +val legalComplexFilter = Or(recordKeyFilter, commitTimeFilter) +assertTrue(filterIsSafeForBootstrap(legalComplexFilter)) + +val illegalComplexFilter = Or(recordKeyFilter, dataFieldFilter) +assertFalse(filterIsSafeForBootstrap(illegalComplexFilter)) + +val
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
jonvex commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631715749 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -116,45 +143,154 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea skeletonRequiredSchema: Schema, dataFileIterator: ClosableIterator[InternalRow], dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { -doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], - dataFileIterator.asInstanceOf[ClosableIterator[Any]]) +doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], skeletonRequiredSchema, + dataFileIterator.asInstanceOf[ClosableIterator[Any]], dataRequiredSchema) } - protected def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], dataFileIterator: ClosableIterator[Any]): ClosableIterator[InternalRow] = { -new ClosableIterator[Any] { - val combinedRow = new JoinedRow() + private def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], + skeletonRequiredSchema: Schema, + dataFileIterator: ClosableIterator[Any], + dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { +if (supportsPositionField()) { + assert(AvroSchemaUtils.containsFieldInSchema(skeletonRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + assert(AvroSchemaUtils.containsFieldInSchema(dataRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + val rowIndexColumn = new java.util.HashSet[String]() + rowIndexColumn.add(ROW_INDEX_TEMPORARY_COLUMN_NAME) + //always remove the row index column from the skeleton because the data file will also have the same column + val skeletonProjection = projectRecord(skeletonRequiredSchema, +AvroSchemaUtils.removeFieldsFromSchema(skeletonRequiredSchema, rowIndexColumn)) - override def hasNext: Boolean = { -//If the iterators are out of sync it is probably due to filter pushdown -checkState(dataFileIterator.hasNext == skeletonFileIterator.hasNext, - "Bootstrap data-file iterator and skeleton-file iterator have to be in-sync!") -dataFileIterator.hasNext && skeletonFileIterator.hasNext + //If we need to do position based merging with log files we will leave the row index column at the end + val dataProjection = if (getHasLogFiles && getUseRecordPosition) { +getIdentityProjection + } else { +projectRecord(dataRequiredSchema, + AvroSchemaUtils.removeFieldsFromSchema(dataRequiredSchema, rowIndexColumn)) } - override def next(): Any = { -(skeletonFileIterator.next(), dataFileIterator.next()) match { - case (s: ColumnarBatch, d: ColumnarBatch) => -val numCols = s.numCols() + d.numCols() -val vecs: Array[ColumnVector] = new Array[ColumnVector](numCols) -for (i <- 0 until numCols) { - if (i < s.numCols()) { -vecs(i) = s.column(i) + //Always use internal row for positional merge because + //we need to iterate row by row when merging + new CachingIterator[InternalRow] { +val combinedRow = new JoinedRow() + +//position column will always be at the end of the row +private def getPos(row: InternalRow): Long = { + row.getLong(row.numFields-1) +} + +private def getNextSkeleton: (InternalRow, Long) = { + val nextSkeletonRow = skeletonFileIterator.next().asInstanceOf[InternalRow] + (nextSkeletonRow, getPos(nextSkeletonRow)) +} + +private def getNextData: (InternalRow, Long) = { + val nextDataRow = dataFileIterator.next().asInstanceOf[InternalRow] + (nextDataRow, getPos(nextDataRow)) +} + +override def close(): Unit = { + skeletonFileIterator.close() + dataFileIterator.close() +} + +override protected def doHasNext(): Boolean = { + if (!dataFileIterator.hasNext || !skeletonFileIterator.hasNext) { +false + } else { +var nextSkeleton = getNextSkeleton +var nextData = getNextData +while (nextSkeleton._2 != nextData._2) { + if (nextSkeleton._2 > nextData._2) { +if (!dataFileIterator.hasNext) { + return false +} else { + nextData = getNextData +} } else { -vecs(i) = d.column(i - s.numCols()) +if (!skeletonFileIterator.hasNext) { + return false +} else { + nextSkeleton = getNextSkeleton +}
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
jonvex commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631715307 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -116,45 +143,154 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea skeletonRequiredSchema: Schema, dataFileIterator: ClosableIterator[InternalRow], dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { -doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], - dataFileIterator.asInstanceOf[ClosableIterator[Any]]) +doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], skeletonRequiredSchema, + dataFileIterator.asInstanceOf[ClosableIterator[Any]], dataRequiredSchema) } - protected def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], dataFileIterator: ClosableIterator[Any]): ClosableIterator[InternalRow] = { -new ClosableIterator[Any] { - val combinedRow = new JoinedRow() + private def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], + skeletonRequiredSchema: Schema, + dataFileIterator: ClosableIterator[Any], + dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { +if (supportsPositionField()) { + assert(AvroSchemaUtils.containsFieldInSchema(skeletonRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + assert(AvroSchemaUtils.containsFieldInSchema(dataRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + val rowIndexColumn = new java.util.HashSet[String]() + rowIndexColumn.add(ROW_INDEX_TEMPORARY_COLUMN_NAME) + //always remove the row index column from the skeleton because the data file will also have the same column + val skeletonProjection = projectRecord(skeletonRequiredSchema, +AvroSchemaUtils.removeFieldsFromSchema(skeletonRequiredSchema, rowIndexColumn)) - override def hasNext: Boolean = { -//If the iterators are out of sync it is probably due to filter pushdown -checkState(dataFileIterator.hasNext == skeletonFileIterator.hasNext, - "Bootstrap data-file iterator and skeleton-file iterator have to be in-sync!") -dataFileIterator.hasNext && skeletonFileIterator.hasNext + //If we need to do position based merging with log files we will leave the row index column at the end + val dataProjection = if (getHasLogFiles && getUseRecordPosition) { +getIdentityProjection + } else { +projectRecord(dataRequiredSchema, + AvroSchemaUtils.removeFieldsFromSchema(dataRequiredSchema, rowIndexColumn)) } - override def next(): Any = { -(skeletonFileIterator.next(), dataFileIterator.next()) match { - case (s: ColumnarBatch, d: ColumnarBatch) => -val numCols = s.numCols() + d.numCols() -val vecs: Array[ColumnVector] = new Array[ColumnVector](numCols) -for (i <- 0 until numCols) { - if (i < s.numCols()) { -vecs(i) = s.column(i) + //Always use internal row for positional merge because Review Comment: I think the filtering is actually done by batch as well, so I think we wouldn't need to iterate through the rows themselves -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
yihua commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631711150 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -116,45 +143,154 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea skeletonRequiredSchema: Schema, dataFileIterator: ClosableIterator[InternalRow], dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { -doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], - dataFileIterator.asInstanceOf[ClosableIterator[Any]]) +doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], skeletonRequiredSchema, + dataFileIterator.asInstanceOf[ClosableIterator[Any]], dataRequiredSchema) } - protected def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], dataFileIterator: ClosableIterator[Any]): ClosableIterator[InternalRow] = { -new ClosableIterator[Any] { - val combinedRow = new JoinedRow() + private def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], + skeletonRequiredSchema: Schema, + dataFileIterator: ClosableIterator[Any], + dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { +if (supportsPositionField()) { + assert(AvroSchemaUtils.containsFieldInSchema(skeletonRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + assert(AvroSchemaUtils.containsFieldInSchema(dataRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + val rowIndexColumn = new java.util.HashSet[String]() + rowIndexColumn.add(ROW_INDEX_TEMPORARY_COLUMN_NAME) + //always remove the row index column from the skeleton because the data file will also have the same column + val skeletonProjection = projectRecord(skeletonRequiredSchema, +AvroSchemaUtils.removeFieldsFromSchema(skeletonRequiredSchema, rowIndexColumn)) - override def hasNext: Boolean = { -//If the iterators are out of sync it is probably due to filter pushdown -checkState(dataFileIterator.hasNext == skeletonFileIterator.hasNext, - "Bootstrap data-file iterator and skeleton-file iterator have to be in-sync!") -dataFileIterator.hasNext && skeletonFileIterator.hasNext + //If we need to do position based merging with log files we will leave the row index column at the end + val dataProjection = if (getHasLogFiles && getUseRecordPosition) { +getIdentityProjection + } else { +projectRecord(dataRequiredSchema, + AvroSchemaUtils.removeFieldsFromSchema(dataRequiredSchema, rowIndexColumn)) } - override def next(): Any = { -(skeletonFileIterator.next(), dataFileIterator.next()) match { - case (s: ColumnarBatch, d: ColumnarBatch) => -val numCols = s.numCols() + d.numCols() -val vecs: Array[ColumnVector] = new Array[ColumnVector](numCols) -for (i <- 0 until numCols) { - if (i < s.numCols()) { -vecs(i) = s.column(i) + //Always use internal row for positional merge because Review Comment: We can still iterate through rows within the `ColumnarBatch` in the vectorized processing. We can leave that as a follow-up. ## hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java: ## @@ -122,6 +123,15 @@ public void setNeedsBootstrapMerge(boolean needsBootstrapMerge) { this.needsBootstrapMerge = needsBootstrapMerge; } + // Getter and Setter for useRecordPosition + public boolean getUseRecordPosition() { +return useRecordPosition; + } Review Comment: Rename the getter and setter to sth like `shouldMergeUseRecordPosition` and `setMergeUseRecordPosition` so it indicates this is used for controlling the merging behavior. ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -116,45 +143,154 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea skeletonRequiredSchema: Schema, dataFileIterator: ClosableIterator[InternalRow], dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { -doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], - dataFileIterator.asInstanceOf[ClosableIterator[Any]]) +doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], skeletonRequiredSchema, + dataFileIterator.asInstanceOf[ClosableIterator[Any]], dataRequiredSchema) } - protected def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], dataFileIterator:
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
jonvex commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631710563 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -116,45 +143,154 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea skeletonRequiredSchema: Schema, dataFileIterator: ClosableIterator[InternalRow], dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { -doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], - dataFileIterator.asInstanceOf[ClosableIterator[Any]]) +doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], skeletonRequiredSchema, + dataFileIterator.asInstanceOf[ClosableIterator[Any]], dataRequiredSchema) } - protected def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], dataFileIterator: ClosableIterator[Any]): ClosableIterator[InternalRow] = { -new ClosableIterator[Any] { - val combinedRow = new JoinedRow() + private def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], + skeletonRequiredSchema: Schema, + dataFileIterator: ClosableIterator[Any], + dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { +if (supportsPositionField()) { + assert(AvroSchemaUtils.containsFieldInSchema(skeletonRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + assert(AvroSchemaUtils.containsFieldInSchema(dataRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + val rowIndexColumn = new java.util.HashSet[String]() + rowIndexColumn.add(ROW_INDEX_TEMPORARY_COLUMN_NAME) + //always remove the row index column from the skeleton because the data file will also have the same column + val skeletonProjection = projectRecord(skeletonRequiredSchema, +AvroSchemaUtils.removeFieldsFromSchema(skeletonRequiredSchema, rowIndexColumn)) - override def hasNext: Boolean = { -//If the iterators are out of sync it is probably due to filter pushdown -checkState(dataFileIterator.hasNext == skeletonFileIterator.hasNext, - "Bootstrap data-file iterator and skeleton-file iterator have to be in-sync!") -dataFileIterator.hasNext && skeletonFileIterator.hasNext + //If we need to do position based merging with log files we will leave the row index column at the end + val dataProjection = if (getHasLogFiles && getUseRecordPosition) { +getIdentityProjection + } else { +projectRecord(dataRequiredSchema, + AvroSchemaUtils.removeFieldsFromSchema(dataRequiredSchema, rowIndexColumn)) } - override def next(): Any = { -(skeletonFileIterator.next(), dataFileIterator.next()) match { - case (s: ColumnarBatch, d: ColumnarBatch) => -val numCols = s.numCols() + d.numCols() -val vecs: Array[ColumnVector] = new Array[ColumnVector](numCols) -for (i <- 0 until numCols) { - if (i < s.numCols()) { -vecs(i) = s.column(i) + //Always use internal row for positional merge because + //we need to iterate row by row when merging + new CachingIterator[InternalRow] { +val combinedRow = new JoinedRow() + +//position column will always be at the end of the row +private def getPos(row: InternalRow): Long = { + row.getLong(row.numFields-1) +} + +private def getNextSkeleton: (InternalRow, Long) = { + val nextSkeletonRow = skeletonFileIterator.next().asInstanceOf[InternalRow] + (nextSkeletonRow, getPos(nextSkeletonRow)) +} + +private def getNextData: (InternalRow, Long) = { + val nextDataRow = dataFileIterator.next().asInstanceOf[InternalRow] + (nextDataRow, getPos(nextDataRow)) +} + +override def close(): Unit = { + skeletonFileIterator.close() + dataFileIterator.close() +} + +override protected def doHasNext(): Boolean = { + if (!dataFileIterator.hasNext || !skeletonFileIterator.hasNext) { +false + } else { +var nextSkeleton = getNextSkeleton +var nextData = getNextData +while (nextSkeleton._2 != nextData._2) { + if (nextSkeleton._2 > nextData._2) { +if (!dataFileIterator.hasNext) { + return false +} else { + nextData = getNextData +} } else { -vecs(i) = d.column(i - s.numCols()) +if (!skeletonFileIterator.hasNext) { + return false +} else { + nextSkeleton = getNextSkeleton +}
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2155580839 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 33249cc712c6dcdde12efe8536579d3c9c5f8575 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24279) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
yihua commented on code in PR #11413: URL: https://github.com/apache/hudi/pull/11413#discussion_r1631704549 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -116,45 +143,154 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea skeletonRequiredSchema: Schema, dataFileIterator: ClosableIterator[InternalRow], dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { -doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], - dataFileIterator.asInstanceOf[ClosableIterator[Any]]) +doBootstrapMerge(skeletonFileIterator.asInstanceOf[ClosableIterator[Any]], skeletonRequiredSchema, + dataFileIterator.asInstanceOf[ClosableIterator[Any]], dataRequiredSchema) } - protected def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], dataFileIterator: ClosableIterator[Any]): ClosableIterator[InternalRow] = { -new ClosableIterator[Any] { - val combinedRow = new JoinedRow() + private def doBootstrapMerge(skeletonFileIterator: ClosableIterator[Any], + skeletonRequiredSchema: Schema, + dataFileIterator: ClosableIterator[Any], + dataRequiredSchema: Schema): ClosableIterator[InternalRow] = { +if (supportsPositionField()) { + assert(AvroSchemaUtils.containsFieldInSchema(skeletonRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + assert(AvroSchemaUtils.containsFieldInSchema(dataRequiredSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME)) + val rowIndexColumn = new java.util.HashSet[String]() + rowIndexColumn.add(ROW_INDEX_TEMPORARY_COLUMN_NAME) + //always remove the row index column from the skeleton because the data file will also have the same column + val skeletonProjection = projectRecord(skeletonRequiredSchema, +AvroSchemaUtils.removeFieldsFromSchema(skeletonRequiredSchema, rowIndexColumn)) - override def hasNext: Boolean = { -//If the iterators are out of sync it is probably due to filter pushdown -checkState(dataFileIterator.hasNext == skeletonFileIterator.hasNext, - "Bootstrap data-file iterator and skeleton-file iterator have to be in-sync!") -dataFileIterator.hasNext && skeletonFileIterator.hasNext + //If we need to do position based merging with log files we will leave the row index column at the end + val dataProjection = if (getHasLogFiles && getUseRecordPosition) { +getIdentityProjection + } else { +projectRecord(dataRequiredSchema, + AvroSchemaUtils.removeFieldsFromSchema(dataRequiredSchema, rowIndexColumn)) } - override def next(): Any = { -(skeletonFileIterator.next(), dataFileIterator.next()) match { - case (s: ColumnarBatch, d: ColumnarBatch) => -val numCols = s.numCols() + d.numCols() -val vecs: Array[ColumnVector] = new Array[ColumnVector](numCols) -for (i <- 0 until numCols) { - if (i < s.numCols()) { -vecs(i) = s.column(i) + //Always use internal row for positional merge because + //we need to iterate row by row when merging + new CachingIterator[InternalRow] { +val combinedRow = new JoinedRow() + +//position column will always be at the end of the row +private def getPos(row: InternalRow): Long = { + row.getLong(row.numFields-1) +} + +private def getNextSkeleton: (InternalRow, Long) = { + val nextSkeletonRow = skeletonFileIterator.next().asInstanceOf[InternalRow] + (nextSkeletonRow, getPos(nextSkeletonRow)) +} + +private def getNextData: (InternalRow, Long) = { + val nextDataRow = dataFileIterator.next().asInstanceOf[InternalRow] + (nextDataRow, getPos(nextDataRow)) +} + +override def close(): Unit = { + skeletonFileIterator.close() + dataFileIterator.close() +} + +override protected def doHasNext(): Boolean = { + if (!dataFileIterator.hasNext || !skeletonFileIterator.hasNext) { +false + } else { +var nextSkeleton = getNextSkeleton +var nextData = getNextData +while (nextSkeleton._2 != nextData._2) { + if (nextSkeleton._2 > nextData._2) { +if (!dataFileIterator.hasNext) { + return false +} else { + nextData = getNextData +} } else { -vecs(i) = d.column(i - s.numCols()) +if (!skeletonFileIterator.hasNext) { + return false +} else { + nextSkeleton = getNextSkeleton +}
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2155513457 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * b29ff638867f3760156318bb58a7677c67a415dc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24274) * 33249cc712c6dcdde12efe8536579d3c9c5f8575 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24279) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2155505446 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * b29ff638867f3760156318bb58a7677c67a415dc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24274) * 33249cc712c6dcdde12efe8536579d3c9c5f8575 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155498531 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * d00f2862fb8dd8a84fcc5aa1900e76577b8a9bf1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24275) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
jonvex commented on code in PR #10422: URL: https://github.com/apache/hudi/pull/10422#discussion_r1631640824 ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieFileGroupReaderRecordReader.java: ## @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.hadoop; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.common.config.HoodieCommonConfig; +import org.apache.hudi.common.config.HoodieReaderConfig; +import org.apache.hudi.common.fs.FSUtils; +import org.apache.hudi.common.model.BaseFile; +import org.apache.hudi.common.model.FileSlice; +import org.apache.hudi.common.model.HoodieBaseFile; +import org.apache.hudi.common.model.HoodieFileGroupId; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.TableSchemaResolver; +import org.apache.hudi.common.table.read.HoodieFileGroupReader; +import org.apache.hudi.common.table.timeline.HoodieInstant; +import org.apache.hudi.common.util.FileIOUtils; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.TablePathUtils; +import org.apache.hudi.common.util.collection.ExternalSpillableMap; +import org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader; +import org.apache.hudi.hadoop.realtime.RealtimeSplit; +import org.apache.hudi.hadoop.utils.HoodieRealtimeInputFormatUtils; +import org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils; + +import org.apache.avro.Schema; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.FileSplit; +import org.apache.hadoop.mapred.InputSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.mapred.Reporter; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Set; +import java.util.function.UnaryOperator; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +import static org.apache.hudi.common.config.HoodieCommonConfig.DISK_MAP_BITCASK_COMPRESSION_ENABLED; +import static org.apache.hudi.common.config.HoodieCommonConfig.SPILLABLE_DISK_MAP_TYPE; +import static org.apache.hudi.common.config.HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE; +import static org.apache.hudi.common.config.HoodieMemoryConfig.SPILLABLE_MAP_BASE_PATH; + +public class HoodieFileGroupReaderRecordReader implements RecordReader { + + public interface HiveReaderCreator { +org.apache.hadoop.mapred.RecordReader getRecordReader( +final org.apache.hadoop.mapred.InputSplit split, +final org.apache.hadoop.mapred.JobConf job, +final org.apache.hadoop.mapred.Reporter reporter +) throws IOException; + } + + private final HiveHoodieReaderContext readerContext; + private final HoodieFileGroupReader fileGroupReader; + private final ArrayWritable arrayWritable; + private final NullWritable nullWritable = NullWritable.get(); + private final InputSplit inputSplit; + private final JobConf jobConfCopy; + private final UnaryOperator reverseProjection; + + public HoodieFileGroupReaderRecordReader(HiveReaderCreator readerCreator, + final InputSplit split, + final JobConf jobConf, + final Reporter reporter) throws IOException { +this.jobConfCopy = new JobConf(jobConf); +HoodieRealtimeInputFormatUtils.cleanProjectionColumnIds(jobConfCopy); +Set partitionColumns = new HashSet<>(getPartitionFieldNames(jobConfCopy)); +this.inputSplit = split; + +FileSplit fileSplit = (FileSplit) split; +String tableBasePath = getTableBasePath(split,
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
jonvex commented on code in PR #10422: URL: https://github.com/apache/hudi/pull/10422#discussion_r1631634950 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java: ## @@ -146,43 +147,50 @@ public void testWriteDuringCompaction(String payloadClass) throws IOException { @ParameterizedTest @MethodSource("writeLogTest") public void testWriteLogDuringCompaction(boolean enableMetadataTable, boolean enableTimelineServer) throws IOException { -Properties props = getPropertiesForKeyGen(true); -HoodieWriteConfig config = HoodieWriteConfig.newBuilder() -.forTable("test-trip-table") -.withPath(basePath()) -.withSchema(TRIP_EXAMPLE_SCHEMA) -.withParallelism(2, 2) -.withAutoCommit(true) -.withEmbeddedTimelineServerEnabled(enableTimelineServer) - .withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadataTable).build()) -.withCompactionConfig(HoodieCompactionConfig.newBuilder() -.withMaxNumDeltaCommitsBeforeCompaction(1).build()) -.withLayoutConfig(HoodieLayoutConfig.newBuilder() -.withLayoutType(HoodieStorageLayout.LayoutType.BUCKET.name()) - .withLayoutPartitioner(SparkBucketIndexPartitioner.class.getName()).build()) - .withIndexConfig(HoodieIndexConfig.newBuilder().fromProperties(props).withIndexType(HoodieIndex.IndexType.BUCKET).withBucketNum("1").build()) -.build(); -props.putAll(config.getProps()); - -metaClient = getHoodieMetaClient(HoodieTableType.MERGE_ON_READ, props); -client = getHoodieWriteClient(config); - -final List records = dataGen.generateInserts("001", 100); -JavaRDD writeRecords = jsc().parallelize(records, 2); +try { + //disable for this test because it seems like we process mor in a different order? Review Comment: https://issues.apache.org/jira/browse/HUDI-7610 Delete behavior is inconsistent and imo undefined. This is one of the advantages of unifying all the readers with FGReader is that we can remove the inconsistency between engines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2155410555 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN * bfea0d3a2dd9e6ba2d96c1d7d20a07e085883da6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24278) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2155393640 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN * 40932069f637e82d80731fe8625331d293fdc1e0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24277) * bfea0d3a2dd9e6ba2d96c1d7d20a07e085883da6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions. Introduce new hoodie table property to track initial table version when table was created. This is needed to identify if the table was originall
hudi-bot commented on PR #11406: URL: https://github.com/apache/hudi/pull/11406#issuecomment-2155393533 ## CI report: * da08a0b3c0524b46e70a4cbed8ab82eb5f84f24c UNKNOWN * 901c7f94b1b56ac19867d5d0deab34eb35ebce2c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24276) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions. Introduce new hoodie table property to track initial table version when table was created. This is needed to identify if the table was originall
hudi-bot commented on PR #11406: URL: https://github.com/apache/hudi/pull/11406#issuecomment-2155337800 ## CI report: * e8a80e29e51c84a3403906d2acf0aeee24dedda4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24245) * da08a0b3c0524b46e70a4cbed8ab82eb5f84f24c UNKNOWN * 901c7f94b1b56ac19867d5d0deab34eb35ebce2c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2155337978 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN * 40932069f637e82d80731fe8625331d293fdc1e0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155337873 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 4e6335c7cfb18881776d572954558a41aa33b91d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24273) * d00f2862fb8dd8a84fcc5aa1900e76577b8a9bf1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24275) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2155328233 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155328163 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 4e6335c7cfb18881776d572954558a41aa33b91d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24273) * d00f2862fb8dd8a84fcc5aa1900e76577b8a9bf1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions. Introduce new hoodie table property to track initial table version when table was created. This is needed to identify if the table was originall
hudi-bot commented on PR #11406: URL: https://github.com/apache/hudi/pull/11406#issuecomment-2155328082 ## CI report: * e8a80e29e51c84a3403906d2acf0aeee24dedda4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24245) * da08a0b3c0524b46e70a4cbed8ab82eb5f84f24c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions. Introduce new hoodie table property to track initial table version when table was created. This is needed to identify if the table was originall
balaji-varadarajan commented on code in PR #11406: URL: https://github.com/apache/hudi/pull/11406#discussion_r1631564698 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -623,6 +624,8 @@ private static void initTableMetaClient(StorageConfiguration storageConf, Str } initializeBootstrapDirsIfNotExists(basePath, storage); +// When the table is initialized, set the initial version to be the current version. +props.put(INITIAL_VERSION.key(), String.valueOf(HoodieTableVersion.current().versionCode())); Review Comment: Good point. Found one place in RepairsCommand and added the fix. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SixToSevenUpgradeHandler.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.table.upgrade; + +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.config.HoodieWriteConfig; + +import java.util.Collections; +import java.util.Map; + +/** + * Version 7 is going to be placeholder version for bridge release 0.16.0. + * Version 8 is the placeholder version to track 1.x. Review Comment: Done ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SevenToSixDowngradeHandler.java: ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.table.upgrade; + +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.config.HoodieWriteConfig; + +import java.util.Collections; +import java.util.Map; + +/** + * Version 7 is going to be placeholder version for bridge release 0.16.0. + * Version 8 is the placeholder version to track 1.x. Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions. Introduce new hoodie table property to track initial table version when table was created. This is needed to identify if the table was originall
balaji-varadarajan commented on code in PR #11406: URL: https://github.com/apache/hudi/pull/11406#discussion_r1631549810 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SixToSevenUpgradeHandler.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.table.upgrade; + +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.config.HoodieWriteConfig; + +import java.util.Collections; +import java.util.Map; + +/** + * Version 7 is going to be placeholder version for bridge release 0.16.0. + * Version 8 is the placeholder version to track 1.x. + */ +public class SixToSevenUpgradeHandler implements UpgradeHandler { + @Override + public Map upgrade(HoodieWriteConfig config, HoodieEngineContext context, + String instantTime, + SupportsUpgradeDowngrade upgradeDowngradeHelper) { +return Collections.emptyMap(); + } Review Comment: We cannot determine the correct initial version during the upgrade path as we are doing one version increment at a time. We can basically interpret absence of INITIAL_VERSION as that the table was created by some version of 0.x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
jonvex opened a new pull request, #11415: URL: https://github.com/apache/hudi/pull/11415 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
jonvex closed pull request #10991: [HUDI-7269] Fallback to key based merge if positions are missing from log block URL: https://github.com/apache/hudi/pull/10991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7834] Create placeholder table versions. Introduce new hoodie table property to track initial table version when table was created. This is needed to identify if the table was originall
balaji-varadarajan commented on code in PR #11406: URL: https://github.com/apache/hudi/pull/11406#discussion_r1631546670 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -512,6 +519,15 @@ public HoodieTableVersion getTableVersion() { : VERSION.defaultValue(); } + /** + * @return the hoodie.table.initial.version from hoodie.properties file. + */ + public HoodieTableVersion getTableInitialVersion() { +return contains(INITIAL_VERSION) +? HoodieTableVersion.versionFromCode(getInt(INITIAL_VERSION)) Review Comment: INITIAL_VERSION is similar to VERSION in type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7841) RLI and secondary index should consider only pruned partitions for file skipping
[ https://issues.apache.org/jira/browse/HUDI-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7841: -- Summary: RLI and secondary index should consider only pruned partitions for file skipping (was: RLI should consider only pruned partitions for file skipping) > RLI and secondary index should consider only pruned partitions for file > skipping > > > Key: HUDI-7841 > URL: https://issues.apache.org/jira/browse/HUDI-7841 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Sagar Sumit >Assignee: Lokesh Jain >Priority: Major > Fix For: 1.0.0 > > > Even though RLI scans only matching files, it tries to get those candidate > files by iterating over all files from file index. See - > [https://github.com/apache/hudi/blob/f4be74c29471fbd6afff472f8db292e6b1f16f05/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/RecordLevelIndexSupport.scala#L47] > Instead, it can use the `prunedPartitionsAndFileSlices` to only consider > pruned partitions whenever there is a partition predicate. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7841) RLI should consider only pruned partitions for file skipping
[ https://issues.apache.org/jira/browse/HUDI-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit reassigned HUDI-7841: - Assignee: Lokesh Jain > RLI should consider only pruned partitions for file skipping > > > Key: HUDI-7841 > URL: https://issues.apache.org/jira/browse/HUDI-7841 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Sagar Sumit >Assignee: Lokesh Jain >Priority: Major > Fix For: 1.0.0 > > > Even though RLI scans only matching files, it tries to get those candidate > files by iterating over all files from file index. See - > [https://github.com/apache/hudi/blob/f4be74c29471fbd6afff472f8db292e6b1f16f05/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/RecordLevelIndexSupport.scala#L47] > Instead, it can use the `prunedPartitionsAndFileSlices` to only consider > pruned partitions whenever there is a partition predicate. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155239340 ## CI report: * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 4e6335c7cfb18881776d572954558a41aa33b91d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24273) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2155237282 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * b29ff638867f3760156318bb58a7677c67a415dc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24274) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Intermittent stall of S3 PUT request for about 17 minutes [hudi]
gudladona commented on issue #11203: URL: https://github.com/apache/hudi/issues/11203#issuecomment-2155201212 Assessment and workaround provided here: https://github.com/aws/aws-sdk-java/issues/3110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-4705) Support Write-on-compaction mode when query cdc on MOR tables
[ https://issues.apache.org/jira/browse/HUDI-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853214#comment-17853214 ] Shiyan Xu commented on HUDI-4705: - [~lizhiqiang] [~biyan900...@gmail.com] to clarify, CDC for spark works on MOR, just that the implementation is using write-on-indexing strategy (ref: [https://github.com/apache/hudi/blob/master/rfc/rfc-51/rfc-51.md#persisting-cdc-in-mor-write-on-indexing-vs-write-on-compaction)] We want to unify the implementation as write-on-compaction, which allows flink writer to work too. (write-on-indexing strategy does not work for flink as explained in the RFC) > Support Write-on-compaction mode when query cdc on MOR tables > - > > Key: HUDI-4705 > URL: https://issues.apache.org/jira/browse/HUDI-4705 > Project: Apache Hudi > Issue Type: New Feature > Components: compaction, spark, table-service >Reporter: Yann Byron >Priority: Major > > For the case that query cdc on MOR tables, the initial implementation use the > `Write-on-indexing` way to extract the cdc data by merging the base file and > log files in-flight. > This ticket wants to support the `Write-on-compaction` way to get the cdc > data just by reading the persisted cdc files which are written at the > compaction operation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2155176602 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 18fbd92eec10c49025db364be79cc9dbfccee362 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24162) * b29ff638867f3760156318bb58a7677c67a415dc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24274) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2155165419 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 18fbd92eec10c49025db364be79cc9dbfccee362 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24162) * b29ff638867f3760156318bb58a7677c67a415dc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
ad1happy2go commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2155139922 @SuneethaYamani https://hudi.apache.org/docs/configurations/#hoodiemetadataenable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Partial update of the MOR table after compaction with Hudi Streamer [hudi]
ad1happy2go commented on issue #11348: URL: https://github.com/apache/hudi/issues/11348#issuecomment-2155137100 @kirillklimenko We will look into it. Thanks for the details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] duplicated records when use insert overwrite [hudi]
ad1happy2go commented on issue #11358: URL: https://github.com/apache/hudi/issues/11358#issuecomment-2155135667 @njalan If the data which you are inserting has dups, then insert overwrite will create dups in the table. Can you please share us the timeline to look further -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155087634 ## CI report: * 1a1ca64bec2fb94acce596934dd636b77cb0aca7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24264) * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 4e6335c7cfb18881776d572954558a41aa33b91d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24273) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155074344 ## CI report: * 1a1ca64bec2fb94acce596934dd636b77cb0aca7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24264) * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN * 4e6335c7cfb18881776d572954558a41aa33b91d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7840] Add position merging to fg reader [hudi]
hudi-bot commented on PR #11413: URL: https://github.com/apache/hudi/pull/11413#issuecomment-2155060689 ## CI report: * 1a1ca64bec2fb94acce596934dd636b77cb0aca7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24264) * d581b2726ba5047c9e72396820da81ecf1357266 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154949218 ## CI report: * c2dec94b442920784b3914cc13b87294e734a477 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24272) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154856344 ## CI report: * 3e8bdc41e97141b94a9f60a3450f41ad342fa45e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24271) * c2dec94b442920784b3914cc13b87294e734a477 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24272) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154841313 ## CI report: * 3e8bdc41e97141b94a9f60a3450f41ad342fa45e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24271) * c2dec94b442920784b3914cc13b87294e734a477 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154824717 ## CI report: * 3e8bdc41e97141b94a9f60a3450f41ad342fa45e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24271) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7841) RLI should consider only pruned partitions for file skipping
Sagar Sumit created HUDI-7841: - Summary: RLI should consider only pruned partitions for file skipping Key: HUDI-7841 URL: https://issues.apache.org/jira/browse/HUDI-7841 Project: Apache Hudi Issue Type: Improvement Reporter: Sagar Sumit Fix For: 1.0.0 Even though RLI scans only matching files, it tries to get those candidate files by iterating over all files from file index. See - [https://github.com/apache/hudi/blob/f4be74c29471fbd6afff472f8db292e6b1f16f05/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/RecordLevelIndexSupport.scala#L47] Instead, it can use the `prunedPartitionsAndFileSlices` to only consider pruned partitions whenever there is a partition predicate. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
codope commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1631121419 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -1382,5 +1398,35 @@ public HoodieTableMetaClient initTable(StorageConfiguration configuration, St throws IOException { return HoodieTableMetaClient.initTableAndGetMetaClient(configuration, basePath, build()); } + +private void validateMergeConfigs() { Review Comment: Where is this method used? ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -242,6 +249,11 @@ public HoodieFileGroupReaderIterator getClosableIterator() { return new HoodieFileGroupReaderIterator<>(this); } + public static RecordMergeMode getRecordMergeMode(Properties props) { +String mergeMode = getStringWithAltKeys(props, HoodieCommonConfig.RECORD_MERGE_MODE, true).toUpperCase(); Review Comment: note: Setting `useDefaultValue` to true as many tests don't set record merge mode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7390] fix: HoodieStreamer no longer works without --props being supplied [hudi]
hudi-bot commented on PR #11414: URL: https://github.com/apache/hudi/pull/11414#issuecomment-2154749790 ## CI report: * 3ffd431d11a16bfb032e905eceb5374d901cb6ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24270) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154747202 ## CI report: * 083ea7ec0e0cb2f14fc47faff5d781a64cca3874 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24269) * 3e8bdc41e97141b94a9f60a3450f41ad342fa45e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24271) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154733863 ## CI report: * 083ea7ec0e0cb2f14fc47faff5d781a64cca3874 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24269) * 3e8bdc41e97141b94a9f60a3450f41ad342fa45e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154721676 ## CI report: * 083ea7ec0e0cb2f14fc47faff5d781a64cca3874 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24269) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7390] fix: HoodieStreamer no longer works without --props being supplied [hudi]
hudi-bot commented on PR #11414: URL: https://github.com/apache/hudi/pull/11414#issuecomment-2154656824 ## CI report: * 3ffd431d11a16bfb032e905eceb5374d901cb6ee Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24270) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154654455 ## CI report: * f1ad4786aad397d5bad19d3cf68cbbb90c92d9ac Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24267) * 083ea7ec0e0cb2f14fc47faff5d781a64cca3874 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24269) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7390] fix: HoodieStreamer no longer works without --props being supplied [hudi]
hudi-bot commented on PR #11414: URL: https://github.com/apache/hudi/pull/11414#issuecomment-2154644253 ## CI report: * 3ffd431d11a16bfb032e905eceb5374d901cb6ee UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2154641685 ## CI report: * f1ad4786aad397d5bad19d3cf68cbbb90c92d9ac Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24267) * 083ea7ec0e0cb2f14fc47faff5d781a64cca3874 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org