[GitHub] [hudi] hudi-bot commented on pull request #9108: [HUDI-6462] Add Hudi client init callback interface
hudi-bot commented on PR #9108: URL: https://github.com/apache/hudi/pull/9108#issuecomment-1615557367 ## CI report: * 3fbcdb8f1f2c7504b8564ead1d065c1d862f83fb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18246) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9108: [HUDI-6462] Add Hudi client init callback interface
hudi-bot commented on PR #9108: URL: https://github.com/apache/hudi/pull/9108#issuecomment-1615552202 ## CI report: * 3fbcdb8f1f2c7504b8564ead1d065c1d862f83fb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6462: - Labels: pull-request-available (was: ) > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > > At the time of instantiation of the write/base client, user may want to do > additional processing such as sending metrics/logs/notification or adding > more properties to the write config. The write/base client init callback > abstraction allows such logic to be plugged into Hudi. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua opened a new pull request, #9108: [HUDI-6462] Add Hudi client init callback interface
yihua opened a new pull request, #9108: URL: https://github.com/apache/hudi/pull/9108 ### Change Logs This PR adds the interface for Hudi client init callback to run custom logic at the time of initialization of a Hudi client: ``` @PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING) public interface HoodieClientInitCallback { /** * A callback method in which the user can implement custom logic. * This method is called when a {@link BaseHoodieClient} is initialized. * * @param hoodieClient {@link BaseHoodieClient} instance. */ @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) void call(BaseHoodieClient hoodieClient); } ``` At the time of instantiation of the write or table service client, a user may want to do additional processing, such as sending metrics, logsm notification, or adding more properties to the write config. The implementation of client init callback interface allows such logic to be plugged into Hudi. A new config, `hoodie.client.init.callback.classes`, is added for plugging in the callback implementation. The class list is comma-separated. New tests are added and the behavior is expected. ### Impact Adds new functionality of client init callback to run custom logic at the time of initialization of a Hudi client. ### Risk level none ### Documentation Update Will update the Hudi docs. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9083: [HUDI-6464] Spark SQL Merge Into for pkless tables
hudi-bot commented on PR #9083: URL: https://github.com/apache/hudi/pull/9083#issuecomment-1615506177 ## CI report: * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN * 69f68c8ee2ed4cdae41cbf62a47a28b39ddcd57f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18245) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9107: [HUDI-6463] Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate
hudi-bot commented on PR #9107: URL: https://github.com/apache/hudi/pull/9107#issuecomment-1615502789 ## CI report: * 3f4ef9bc84c59b038504f86acd3734eb2cc11bad Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18244) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant
hudi-bot commented on PR #9038: URL: https://github.com/apache/hudi/pull/9038#issuecomment-1615459950 ## CI report: * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN * 34f8823f48712c57058bc37c8936a276c1457557 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18243) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #8774: [HUDI-6246] Fixing restore for compaction commit
nsivabalan commented on code in PR #8774: URL: https://github.com/apache/hudi/pull/8774#discussion_r1248439284 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackStrategy.java: ## @@ -117,21 +126,22 @@ public List getRollbackRequests(HoodieInstant instantToRo // If there is no delta commit present after the current commit (if compaction), no action, else we // need to make sure that a compaction commit rollback also deletes any log files written as part of the // succeeding deltacommit. - boolean higherDeltaCommits = + boolean hasHigherCompletedDeltaCommits = !activeTimeline.getDeltaCommitTimeline().filterCompletedInstants().findInstantsAfter(commit, 1) .empty(); - if (higherDeltaCommits) { -// Rollback of a compaction action with no higher deltacommit means that the compaction is scheduled + if (hasHigherCompletedDeltaCommits && !isCommitMetadataCompleted) { Review Comment: due to async compaction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key
[ https://issues.apache.org/jira/browse/HUDI-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler reassigned HUDI-6464: - Fix Version/s: 0.14.0 Assignee: Jonathan Vexler > Implement Spark SQL Merge Into for tables without primary key > - > > Key: HUDI-6464 > URL: https://issues.apache.org/jira/browse/HUDI-6464 > Project: Apache Hudi > Issue Type: New Feature > Components: spark-sql >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > Merge Into currently only matches on the primary key which pkless tables > don't have -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key
[ https://issues.apache.org/jira/browse/HUDI-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-6464: -- Status: Patch Available (was: In Progress) > Implement Spark SQL Merge Into for tables without primary key > - > > Key: HUDI-6464 > URL: https://issues.apache.org/jira/browse/HUDI-6464 > Project: Apache Hudi > Issue Type: New Feature > Components: spark-sql >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > Merge Into currently only matches on the primary key which pkless tables > don't have -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key
[ https://issues.apache.org/jira/browse/HUDI-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-6464: -- Status: In Progress (was: Open) > Implement Spark SQL Merge Into for tables without primary key > - > > Key: HUDI-6464 > URL: https://issues.apache.org/jira/browse/HUDI-6464 > Project: Apache Hudi > Issue Type: New Feature > Components: spark-sql >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > Merge Into currently only matches on the primary key which pkless tables > don't have -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] jonvex commented on a diff in pull request #9083: [HUDI-6464] Spark SQL Merge Into for pkless tables
jonvex commented on code in PR #9083: URL: https://github.com/apache/hudi/pull/9083#discussion_r1248405665 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestMergeIntoTable2.scala: ## @@ -884,10 +887,9 @@ class TestMergeIntoTable2 extends HoodieSparkSqlTestBase { """.stripMargin ) checkAnswer(s"select id, name, price, ts, dt from $tableName")( -Seq(1, "a1", 10.1, 1000, "2021-03-21"), Seq(1, "a2", 10.2, 1002, "2021-03-21"), -Seq(3, "a3", 10.3, 1003, "2021-03-21"), -Seq(1, "a2", 10.2, 1002, "2021-03-21") Review Comment: Slight behavior change here. Previously we were doing an insert when matched because of no precombine key. Now we actually do an update -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key
[ https://issues.apache.org/jira/browse/HUDI-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6464: - Labels: pull-request-available (was: ) > Implement Spark SQL Merge Into for tables without primary key > - > > Key: HUDI-6464 > URL: https://issues.apache.org/jira/browse/HUDI-6464 > Project: Apache Hudi > Issue Type: New Feature > Components: spark-sql >Reporter: Jonathan Vexler >Priority: Major > Labels: pull-request-available > > Merge Into currently only matches on the primary key which pkless tables > don't have -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9083: [HUDI-6464] Spark SQL Merge Into for pkless tables
hudi-bot commented on PR #9083: URL: https://github.com/apache/hudi/pull/9083#issuecomment-1615381146 ## CI report: * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN * ac4f2ce82babd0794dd73ec097ae79853978b5a5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18211) * 69f68c8ee2ed4cdae41cbf62a47a28b39ddcd57f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18245) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key
Jonathan Vexler created HUDI-6464: - Summary: Implement Spark SQL Merge Into for tables without primary key Key: HUDI-6464 URL: https://issues.apache.org/jira/browse/HUDI-6464 Project: Apache Hudi Issue Type: New Feature Components: spark-sql Reporter: Jonathan Vexler Merge Into currently only matches on the primary key which pkless tables don't have -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into
hudi-bot commented on PR #9083: URL: https://github.com/apache/hudi/pull/9083#issuecomment-1615372261 ## CI report: * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN * ac4f2ce82babd0794dd73ec097ae79853978b5a5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18211) * 69f68c8ee2ed4cdae41cbf62a47a28b39ddcd57f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #9056: [HUDI-6456] [DOC] Add parquet blooms documentation
danny0405 commented on code in PR #9056: URL: https://github.com/apache/hudi/pull/9056#discussion_r1248375900 ## website/docs/configurations.md: ## @@ -20,6 +20,7 @@ hoodie.datasource.hive_sync.support_timestamp false It helps to have a central configuration file for your common cross job configurations/tunings, so all the jobs on your cluster can utilize it. It also works with Spark SQL DML/DDL, and helps avoid having to pass configs inside the SQL statements. By default, Hudi would load the configuration file under `/etc/hudi/conf` directory. You can specify a different configuration directory location by setting the `HUDI_CONF_DIR` environment variable. +- [**Parquet Configs**](#PARQUET_CONFIG): These configs makes it possible to bring native parquet features - [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the Hudi Spark Datasource, providing ability to define keys/partitioning, pick out the write operation, specify how to merge records or choosing query type to read. Review Comment: Should we put it under `Spark Datasource Configs` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9107: [HUDI-6463] Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate
hudi-bot commented on PR #9107: URL: https://github.com/apache/hudi/pull/9107#issuecomment-1615342762 ## CI report: * 3f4ef9bc84c59b038504f86acd3734eb2cc11bad Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18244) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant
hudi-bot commented on PR #9038: URL: https://github.com/apache/hudi/pull/9038#issuecomment-1615339864 ## CI report: * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN * 34f8823f48712c57058bc37c8936a276c1457557 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18243) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9107: [HUDI-6463] Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate
hudi-bot commented on PR #9107: URL: https://github.com/apache/hudi/pull/9107#issuecomment-1615339935 ## CI report: * 3f4ef9bc84c59b038504f86acd3734eb2cc11bad UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6463) Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate
[ https://issues.apache.org/jira/browse/HUDI-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6463: - Labels: pull-request-available (was: ) > Fix deluge loggings of > HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate > > > Key: HUDI-6463 > URL: https://issues.apache.org/jira/browse/HUDI-6463 > Project: Apache Hudi > Issue Type: Task > Components: writer-core >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 opened a new pull request, #9107: [HUDI-6463] Fix deluge loggings of HoodieBackedTableMetadataWriter#ge…
danny0405 opened a new pull request, #9107: URL: https://github.com/apache/hudi/pull/9107 …tMetadataPartitionsToUpdate ### Change Logs There are too many verbose loggins of the warnnings, fix it. ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6463) Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate
Danny Chen created HUDI-6463: Summary: Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate Key: HUDI-6463 URL: https://issues.apache.org/jira/browse/HUDI-6463 Project: Apache Hudi Issue Type: Task Components: writer-core Reporter: Danny Chen Fix For: 0.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …
hudi-bot commented on PR #9035: URL: https://github.com/apache/hudi/pull/9035#issuecomment-1615299463 ## CI report: * f0735271d079b8dfa76b6350505e9a4e38610d8a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18242) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6462: Description: At the time of instantiation of the write/base client, user may want to do additional processing such as sending metrics/logs/notification or adding more properties to the write config. The write/base client init callback abstraction allows such logic to be plugged into Hudi. (was: At the time of instantiation of the write client, user may want to do additional processing such as sending metrics/logs/notification or adding more properties to the write config. The write/base client init callback abstraction allows such logic to be plugged into Hudi.) > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > > At the time of instantiation of the write/base client, user may want to do > additional processing such as sending metrics/logs/notification or adding > more properties to the write config. The write/base client init callback > abstraction allows such logic to be plugged into Hudi. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6462: Description: At the time of instantiation of the write client, user may want to do additional processing such as sending metrics/logs/notification or adding more properties to the write config. The write/base client init callback abstraction allows such logic to be plugged into Hudi. (was: At the time of instantiation of the write client, user may want to do additional processing such as sending metrics/logs/notification or adding more properties to the write config. ) > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > > At the time of instantiation of the write client, user may want to do > additional processing such as sending metrics/logs/notification or adding > more properties to the write config. The write/base client init callback > abstraction allows such logic to be plugged into Hudi. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …
hudi-bot commented on PR #9035: URL: https://github.com/apache/hudi/pull/9035#issuecomment-1615249731 ## CI report: * d273d7fca86a899653508ae50316107ac3243d42 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18050) * f0735271d079b8dfa76b6350505e9a4e38610d8a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18242) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6462: Description: At the time of instantiation of the write client, user may want to do additional processing such as sending metrics/logs/notification or adding more properties to the write config. > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > > At the time of instantiation of the write client, user may want to do > additional processing such as sending metrics/logs/notification or adding > more properties to the write config. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-6462: --- Assignee: Ethan Guo > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6462: Fix Version/s: 0.14.0 > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6462: Priority: Blocker (was: Major) > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6462: Summary: Add write/base client init callback abstraction (was: Add write client init callback abstraction) > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6462: Story Points: 2 > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction
[ https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6462: Component/s: writer-core > Add write/base client init callback abstraction > --- > > Key: HUDI-6462 > URL: https://issues.apache.org/jira/browse/HUDI-6462 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6462) Add write client init callback abstraction
Ethan Guo created HUDI-6462: --- Summary: Add write client init callback abstraction Key: HUDI-6462 URL: https://issues.apache.org/jira/browse/HUDI-6462 Project: Apache Hudi Issue Type: New Feature Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …
hudi-bot commented on PR #9035: URL: https://github.com/apache/hudi/pull/9035#issuecomment-1615245617 ## CI report: * d273d7fca86a899653508ae50316107ac3243d42 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18050) * f0735271d079b8dfa76b6350505e9a4e38610d8a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
hudi-bot commented on PR #8837: URL: https://github.com/apache/hudi/pull/8837#issuecomment-1615240130 ## CI report: * 3e22656f66687bb920ec82e6764bf083985df09c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18241) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.
hudi-bot commented on PR #9106: URL: https://github.com/apache/hudi/pull/9106#issuecomment-1615153382 ## CI report: * eb56e1be9ea831362a61adccec2ec2826c86d6a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
nsivabalan commented on code in PR #8837: URL: https://github.com/apache/hudi/pull/8837#discussion_r1248215039 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java: ## @@ -1871,7 +1865,11 @@ private void testTableOperationsImpl(HoodieSparkEngineContext engineContext, Hoo validateMetadata(client); // Restore - client.restoreToInstant("2021010100060", writeConfig.isMetadataTableEnabled()); + if (metaClient.getTableType() == COPY_ON_WRITE) { +assertThrows(HoodieRestoreException.class, () -> client.restoreToInstant("2021010100060", writeConfig.isMetadataTableEnabled())); Review Comment: @prashantwason : hey, can you help clarify, why do we expect this to fail just for COW table and not MOR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the
hudi-bot commented on PR #9087: URL: https://github.com/apache/hudi/pull/9087#issuecomment-1615098734 ## CI report: * 1ff671477f3635ced1643f31de4d2c47acfb3244 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18238) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
nsivabalan commented on code in PR #8837: URL: https://github.com/apache/hudi/pull/8837#discussion_r1248170142 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -973,52 +973,46 @@ public void update(HoodieRestoreMetadata restoreMetadata, String instantTime) { */ @Override public void update(HoodieRollbackMetadata rollbackMetadata, String instantTime) { -// The commit which is being rolled back on the dataset -final String commitInstantTime = rollbackMetadata.getCommitsRollback().get(0); -// Find the deltacommits since the last compaction -Option> deltaCommitsInfo = - CompactionUtils.getDeltaCommitsSinceLatestCompaction(metadataMetaClient.getActiveTimeline()); -if (!deltaCommitsInfo.isPresent()) { - LOG.info(String.format("Ignoring rollback of instant %s at %s since there are no deltacommits on MDT", commitInstantTime, instantTime)); - return; -} - -// This could be a compaction or deltacommit instant (See CompactionUtils.getDeltaCommitsSinceLatestCompaction) -HoodieInstant compactionInstant = deltaCommitsInfo.get().getValue(); -HoodieTimeline deltacommitsSinceCompaction = deltaCommitsInfo.get().getKey(); - -// The deltacommit that will be rolled back -HoodieInstant deltaCommitInstant = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, commitInstantTime); - -// The commit being rolled back should not be older than the latest compaction on the MDT. Compaction on MDT only occurs when all actions -// are completed on the dataset. Hence, this case implies a rollback of completed commit which should actually be handled using restore. -if (compactionInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)) { - final String compactionInstantTime = compactionInstant.getTimestamp(); - if (HoodieTimeline.LESSER_THAN_OR_EQUALS.test(commitInstantTime, compactionInstantTime)) { -throw new HoodieMetadataException(String.format("Commit being rolled back %s is older than the latest compaction %s. " -+ "There are %d deltacommits after this compaction: %s", commitInstantTime, compactionInstantTime, -deltacommitsSinceCompaction.countInstants(), deltacommitsSinceCompaction.getInstants())); +if (initialized && metadata != null) { + // The commit which is being rolled back on the dataset + final String commitInstantTime = rollbackMetadata.getCommitsRollback().get(0); + // Find the deltacommits since the last compaction + Option> deltaCommitsInfo = + CompactionUtils.getDeltaCommitsSinceLatestCompaction(metadataMetaClient.getActiveTimeline()); + if (!deltaCommitsInfo.isPresent() || deltaCommitsInfo.get().getKey().empty()) { +LOG.info(String.format("Ignoring rollback of instant %s at %s since there are no deltacommits on MDT", commitInstantTime, instantTime)); +return; } -} -if (deltacommitsSinceCompaction.containsInstant(deltaCommitInstant)) { - LOG.info("Rolling back MDT deltacommit " + commitInstantTime); - if (!getWriteClient().rollback(commitInstantTime, instantTime)) { -throw new HoodieMetadataException("Failed to rollback deltacommit at " + commitInstantTime); + // This could be a compaction or deltacommit instant (See CompactionUtils.getDeltaCommitsSinceLatestCompaction) + HoodieInstant compactionInstant = deltaCommitsInfo.get().getValue(); + HoodieTimeline deltacommitsSinceCompaction = deltaCommitsInfo.get().getKey(); + + // The deltacommit that will be rolled back + HoodieInstant deltaCommitInstant = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, commitInstantTime); + + // The commit being rolled back should not be older than the latest compaction on the MDT. Compaction on MDT only occurs when all actions + // are completed on the dataset. Hence, this case implies a rollback of completed commit which should actually be handled using restore. + if (compactionInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)) { +final String compactionInstantTime = compactionInstant.getTimestamp(); +if (HoodieTimeline.LESSER_THAN_OR_EQUALS.test(commitInstantTime, compactionInstantTime)) { + throw new HoodieMetadataException(String.format("Commit being rolled back %s is older than the latest compaction %s. " + + "There are %d deltacommits after this compaction: %s", commitInstantTime, compactionInstantTime, + deltacommitsSinceCompaction.countInstants(), deltacommitsSinceCompaction.getInstants())); +} } -} else { - LOG.info(String.format("Ignoring rollback of instant %s at %s since there are no corresponding deltacommits on MDT", - commitInstantTime, instantTime)); -} -// Rollback of MOR table may end up adding a new log file. So we need to check
[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
hudi-bot commented on PR #8837: URL: https://github.com/apache/hudi/pull/8837#issuecomment-1615048617 ## CI report: * c401984679350ad245c1b60d4f889b8a18715169 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18220) * 3e22656f66687bb920ec82e6764bf083985df09c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18241) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs
hudi-bot commented on PR #9066: URL: https://github.com/apache/hudi/pull/9066#issuecomment-1615033363 ## CI report: * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18196) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
hudi-bot commented on PR #8837: URL: https://github.com/apache/hudi/pull/8837#issuecomment-1615032413 ## CI report: * c401984679350ad245c1b60d4f889b8a18715169 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18220) * 3e22656f66687bb920ec82e6764bf083985df09c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs
hudi-bot commented on PR #9066: URL: https://github.com/apache/hudi/pull/9066#issuecomment-1615017356 ## CI report: * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString
hudi-bot commented on PR #9064: URL: https://github.com/apache/hudi/pull/9064#issuecomment-1615017280 ## CI report: * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN * af87c98dd4c370bb40287013adcecd314e20b546 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18237) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs
codope commented on PR #9066: URL: https://github.com/apache/hudi/pull/9066#issuecomment-1615001639 CI after the latest commit - https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=18196&view=results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
codope commented on code in PR #8837: URL: https://github.com/apache/hudi/pull/8837#discussion_r1248137779 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java: ## @@ -1871,7 +1865,11 @@ private void testTableOperationsImpl(HoodieSparkEngineContext engineContext, Hoo validateMetadata(client); // Restore - client.restoreToInstant("2021010100060", writeConfig.isMetadataTableEnabled()); + if (metaClient.getTableType() == COPY_ON_WRITE) { +assertThrows(HoodieRestoreException.class, () -> client.restoreToInstant("2021010100060", writeConfig.isMetadataTableEnabled())); + } else { +client.restoreToInstant("2021010100060", writeConfig.isMetadataTableEnabled()); + } Review Comment: There should not be a need to check this based on table type. Need to look into why this fails for COW. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.
hudi-bot commented on PR #9106: URL: https://github.com/apache/hudi/pull/9106#issuecomment-1614965807 ## CI report: * eb56e1be9ea831362a61adccec2ec2826c86d6a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.
hudi-bot commented on PR #9106: URL: https://github.com/apache/hudi/pull/9106#issuecomment-1614956663 ## CI report: * eb56e1be9ea831362a61adccec2ec2826c86d6a7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-6460) Fix Hbase Index for deletes
[ https://issues.apache.org/jira/browse/HUDI-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason reassigned HUDI-6460: Assignee: Prashant Wason > Fix Hbase Index for deletes > --- > > Key: HUDI-6460 > URL: https://issues.apache.org/jira/browse/HUDI-6460 > Project: Apache Hudi > Issue Type: Improvement > Components: index >Reporter: sivabalan narayanan >Assignee: Prashant Wason >Priority: Major > > With adding delete support for RLI, > [https://github.com/apache/hudi/pull/9058/files] > Hbase index needs some fixes. > Test that is failing is: > TestSparkHoodieHBaseIndex. > testTagLocationAndPartitionPathUpdateWithExplicitRollback > > Root cause: > when update partition path is set to true, within same batch we have a > deleted record and a new insert record. So, to hbase we are sending both the > records and for some inserts take precedence, while for others deletes take > precedence. > > we need to fix SparkHoodieHbaseIndex. > updateLocation > to do one pass overWriteStatus and ensure we de-dup if we have two records > where one of them is deleted and another is inserted. > but there are chances that only deletes are present, so in such cases, we > need to ensure deletes are routed to hbase. > > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6118) Testing of MDT and RI code on HDFS
[ https://issues.apache.org/jira/browse/HUDI-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6118: - Labels: pull-request-available (was: ) > Testing of MDT and RI code on HDFS > -- > > Key: HUDI-6118 > URL: https://issues.apache.org/jira/browse/HUDI-6118 > Project: Apache Hudi > Issue Type: Improvement > Components: metadata >Reporter: Prashant Wason >Assignee: Prashant Wason >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > The current defaults are not optimal for large partitions like record index. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] prashantwason opened a new pull request, #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.
prashantwason opened a new pull request, #9106: URL: https://github.com/apache/hudi/pull/9106 [HUDI-6118] Some fixes to improve the MDT and record index code base. ### Change Logs 1. Print MDT partition name instead of the enum tostring in logs 2. Use fsView.loadAllPartitions() 3. When publishing size metrics for MDT, only consider partitions which have been initialized 4. Fixed job status names 5. Limited logs which were printing the entire list of partitions. This is very verbose for datasets with large number of partitions 6. Added a config to reduce the max parallelism of record index initialization. 7. Changed defaults for MDT write configs to reasonable values 8. Added config for MDT logBlock size. Larger blocks are preferred to reduce lookup time. 9. Fixed the size metrics for MDT. These metrics should be set instead of incremented. ### Impact Fixes issues for the recently commited RI and MDT changes ### Risk level (write none, low medium or high below) Low ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6118) Testing of MDT and RI code on HDFS
[ https://issues.apache.org/jira/browse/HUDI-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-6118: - Summary: Testing of MDT and RI code on HDFS (was: Provide reasonable defaults for operation parallelism in MDT write configuration) > Testing of MDT and RI code on HDFS > -- > > Key: HUDI-6118 > URL: https://issues.apache.org/jira/browse/HUDI-6118 > Project: Apache Hudi > Issue Type: Improvement > Components: metadata >Reporter: Prashant Wason >Assignee: Prashant Wason >Priority: Major > Fix For: 0.14.0 > > > The current defaults are not optimal for large partitions like record index. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] BBency commented on issue #9094: Async Clustering failing with errors for MOR table
BBency commented on issue #9094: URL: https://github.com/apache/hudi/issues/9094#issuecomment-1614926417 Is there any other detail that you would want me to share. Any updates? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-6376) Support for DELETE keys in record index
[ https://issues.apache.org/jira/browse/HUDI-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-6376. - Resolution: Fixed > Support for DELETE keys in record index > --- > > Key: HUDI-6376 > URL: https://issues.apache.org/jira/browse/HUDI-6376 > Project: Apache Hudi > Issue Type: Bug >Reporter: Prashant Wason >Assignee: Prashant Wason >Priority: Blocker > Labels: pull-request-available, release-0.14.0-blocker > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] nsivabalan merged pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.
nsivabalan merged PR #9058: URL: https://github.com/apache/hudi/pull/9058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index. (#9058)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 1d5f2f7c63d [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index. (#9058) 1d5f2f7c63d is described below commit 1d5f2f7c63de441b9f475dd7ba4cf1540e0f9c42 Author: Prashant Wason AuthorDate: Fri Jun 30 09:12:36 2023 -0700 [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index. (#9058) * [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index. - Co-authored-by: Shiyan Xu <2701446+xushi...@users.noreply.github.com> Co-authored-by: sivabalan --- .../java/org/apache/hudi/io/HoodieMergeHandle.java | 4 + .../functional/TestHoodieBackedMetadata.java | 94 ++ .../hudi/client/functional/TestHoodieIndex.java| 61 ++ .../index/hbase/TestSparkHoodieHBaseIndex.java | 3 +- .../org/apache/hudi/common/model/HoodieRecord.java | 13 ++- .../hudi/metadata/HoodieBackedTableMetadata.java | 6 +- .../hudi/metadata/HoodieMetadataPayload.java | 29 ++- 7 files changed, 202 insertions(+), 8 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java index cfe11b1fd8d..8c4b0bc18d5 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java @@ -314,6 +314,10 @@ public class HoodieMergeHandle extends HoodieWriteHandle recordsWritten++; } else { recordsDeleted++; +// Clear the new location as the record was deleted +newRecord.unseal(); +newRecord.clearNewLocation(); +newRecord.seal(); } writeStatus.markSuccess(newRecord, recordMetadata); // deflate record payload after recording success. This will help users access payload as a diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java index 075afd61eb1..a1657c204b8 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java @@ -46,6 +46,7 @@ import org.apache.hudi.common.model.HoodieKey; import org.apache.hudi.common.model.HoodieLogFile; import org.apache.hudi.common.model.HoodieRecord; import org.apache.hudi.common.model.HoodieRecord.HoodieRecordType; +import org.apache.hudi.common.model.HoodieRecordGlobalLocation; import org.apache.hudi.common.model.HoodieRecordPayload; import org.apache.hudi.common.model.HoodieTableType; import org.apache.hudi.common.model.HoodieWriteStat; @@ -103,6 +104,7 @@ import org.apache.hudi.table.HoodieTable; import org.apache.hudi.table.action.HoodieWriteMetadata; import org.apache.hudi.table.upgrade.SparkUpgradeDowngradeHelper; import org.apache.hudi.table.upgrade.UpgradeDowngrade; +import org.apache.hudi.testutils.HoodieClientTestUtils; import org.apache.hudi.testutils.MetadataMergeWriteStatus; import org.apache.avro.Schema; @@ -3068,6 +3070,98 @@ public class TestHoodieBackedMetadata extends TestHoodieMetadataBase { validateMetadata(client); } + @Test + public void testDeleteWithRecordIndex() throws Exception { +init(HoodieTableType.COPY_ON_WRITE, true); +HoodieSparkEngineContext engineContext = new HoodieSparkEngineContext(jsc); +HoodieWriteConfig writeConfig = getWriteConfigBuilder(true, true, false) + .withMetadataConfig(HoodieMetadataConfig.newBuilder().withEnableRecordIndex(true).withMaxNumDeltaCommitsBeforeCompaction(1).build()) + .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.RECORD_INDEX).build()) +.build(); + +String firstCommitTime = HoodieActiveTimeline.createNewInstantTime(); +String secondCommitTime; +List allRecords; +List keysToDelete; +List recordsToDelete; + +// Initialize the dataset and add some commits. +try (SparkRDDWriteClient client = new SparkRDDWriteClient(engineContext, writeConfig)) { + // First commit + List firstBatchOfrecords = dataGen.generateInserts(firstCommitTime, 10); + client.startCommitWithTime(firstCommitTime); + client.insert(jsc.parallelize(firstBatchOfrecords, 1), firstCommitTime).collect(); + + // Records got inserted and RI is initialized + metaClient = HoodieTableMetaClient.reload(metaClient); + ass
[jira] [Created] (HUDI-6461) Fix deletion of entire record in MDT for col stats, bloom filter
sivabalan narayanan created HUDI-6461: - Summary: Fix deletion of entire record in MDT for col stats, bloom filter Key: HUDI-6461 URL: https://issues.apache.org/jira/browse/HUDI-6461 Project: Apache Hudi Issue Type: Improvement Components: metadata Reporter: sivabalan narayanan w/ RLI, we are introducing a proper way to delete a MDT record. [https://github.com/apache/hudi/pull/9058] We might have to follow similar logic for other partitions as well to optimize it better. We should avoid relying on nested fields to deduce whether a record is deleted (for eg, ColumnStatsMetadata.isDeleted -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] nsivabalan commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.
nsivabalan commented on PR #9058: URL: https://github.com/apache/hudi/pull/9058#issuecomment-1614873987 @danny0405 : I feel having isDeleted explicitly is more clear and comprehensible. So, will prefer to keep it that way. anyways, we have to fix all other partitions (col stats, etc) in a follow up patch. so lets tackle this in that patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6459) Add Rollback and other tests for Record Level Index
[ https://issues.apache.org/jira/browse/HUDI-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6459: - Labels: pull-request-available (was: ) > Add Rollback and other tests for Record Level Index > --- > > Key: HUDI-6459 > URL: https://issues.apache.org/jira/browse/HUDI-6459 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > > The Jira aims to add validation for rollback with record level index. The > validation is added in TestRecordLevelIndex test. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] lokeshj1703 opened a new pull request, #9105: [WIP] [HUDI-6459] Add Rollback and other tests for Record Level Index
lokeshj1703 opened a new pull request, #9105: URL: https://github.com/apache/hudi/pull/9105 ### Change Logs The Jira aims to add validation for rollback with record level index. The validation is added in TestRecordLevelIndex test. ### Impact NA ### Risk level (write none, low medium or high below) low ### Documentation Update NA ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6459) Add Rollback and other tests for Record Level Index
[ https://issues.apache.org/jira/browse/HUDI-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HUDI-6459: -- Summary: Add Rollback and other tests for Record Level Index (was: Add Rollback test for Record Level Index) > Add Rollback and other tests for Record Level Index > --- > > Key: HUDI-6459 > URL: https://issues.apache.org/jira/browse/HUDI-6459 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > > The Jira aims to add validation for rollback with record level index. The > validation is added in TestRecordLevelIndex test. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1614815163 ## CI report: * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18233) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9104: [HUDI-6445] Removing gc hints from test base
hudi-bot commented on PR #9104: URL: https://github.com/apache/hudi/pull/9104#issuecomment-1614803818 ## CI report: * 022113d3bfa7d479b935b193293fae2a295be46d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18234) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9080: [HUDI-6445] Making some of Spark DS tests as functional
hudi-bot commented on PR #9080: URL: https://github.com/apache/hudi/pull/9080#issuecomment-1614717741 ## CI report: * d28ff949a1dd43456fda75e5624848bb63e030f4 UNKNOWN * b9dd8237e187586c5d05b46d4d4eee891822813e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18232) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the
hudi-bot commented on PR #9087: URL: https://github.com/apache/hudi/pull/9087#issuecomment-1614637415 ## CI report: * 92e8459715422f3e72fb05a298e2b103330a7cce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18229) * 1ff671477f3635ced1643f31de4d2c47acfb3244 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18238) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the
hudi-bot commented on PR #9087: URL: https://github.com/apache/hudi/pull/9087#issuecomment-1614587068 ## CI report: * 92e8459715422f3e72fb05a298e2b103330a7cce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18229) * 1ff671477f3635ced1643f31de4d2c47acfb3244 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString
hudi-bot commented on PR #9064: URL: https://github.com/apache/hudi/pull/9064#issuecomment-1614586903 ## CI report: * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN * 9c6d2bf222b7247bc926302045123bad69157d39 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18198) * af87c98dd4c370bb40287013adcecd314e20b546 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18237) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the
hudi-bot commented on PR #9087: URL: https://github.com/apache/hudi/pull/9087#issuecomment-1614578428 ## CI report: * 92e8459715422f3e72fb05a298e2b103330a7cce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18229) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString
hudi-bot commented on PR #9064: URL: https://github.com/apache/hudi/pull/9064#issuecomment-1614578240 ## CI report: * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN * 9c6d2bf222b7247bc926302045123bad69157d39 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18198) * af87c98dd4c370bb40287013adcecd314e20b546 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni commented on a diff in pull request #9056: [HUDI-6456] [DOC] Add parquet blooms documentation
parisni commented on code in PR #9056: URL: https://github.com/apache/hudi/pull/9056#discussion_r1247776991 ## website/docs/configurations.md: ## @@ -197,7 +197,10 @@ Options useful for reading tables via `read.format.option(...)` ### Write Options {#Write-Options} -You can pass down any of the WriteClient level configs directly using `options()` or `option(k,v)` methods. +Hudi supports [parquet modular encryption](/docs/encryption) and [parquet bloom filters](/docs/parquet_bloom) through hadoop configurations. + Review Comment: added parquet_config heading -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-6457) Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned
[ https://issues.apache.org/jira/browse/HUDI-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6457. Resolution: Fixed Fixed via master branch: a439ea0f449fb334f0823323651ec1512f4cd5df > Keep JavaSizeBasedClusteringPlanStrategy and > SparkSizeBasedClusteringPlanStrategy aligned > - > > Key: HUDI-6457 > URL: https://issues.apache.org/jira/browse/HUDI-6457 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated: [HUDI-6457] Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned (#9099)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a439ea0f449 [HUDI-6457] Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned (#9099) a439ea0f449 is described below commit a439ea0f449fb334f0823323651ec1512f4cd5df Author: ksmou <135721692+ks...@users.noreply.github.com> AuthorDate: Fri Jun 30 19:39:31 2023 +0800 [HUDI-6457] Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned (#9099) --- .../JavaSizeBasedClusteringPlanStrategy.java | 53 +- 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/JavaSizeBasedClusteringPlanStrategy.java b/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/JavaSizeBasedClusteringPlanStrategy.java index fe66cedb133..d8f0c5fc804 100644 --- a/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/JavaSizeBasedClusteringPlanStrategy.java +++ b/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/JavaSizeBasedClusteringPlanStrategy.java @@ -60,41 +60,52 @@ public class JavaSizeBasedClusteringPlanStrategy @Override protected Stream buildClusteringGroupsForPartition(String partitionPath, List fileSlices) { +HoodieWriteConfig writeConfig = getWriteConfig(); + List, Integer>> fileSliceGroups = new ArrayList<>(); List currentGroup = new ArrayList<>(); + +// Sort fileSlices before dividing, which makes dividing more compact +List sortedFileSlices = new ArrayList<>(fileSlices); +sortedFileSlices.sort((o1, o2) -> (int) +((o2.getBaseFile().isPresent() ? o2.getBaseFile().get().getFileSize() : writeConfig.getParquetMaxFileSize()) +- (o1.getBaseFile().isPresent() ? o1.getBaseFile().get().getFileSize() : writeConfig.getParquetMaxFileSize(; + long totalSizeSoFar = 0; -HoodieWriteConfig writeConfig = getWriteConfig(); -for (FileSlice currentSlice : fileSlices) { - // assume each filegroup size is ~= parquet.max.file.size - totalSizeSoFar += currentSlice.getBaseFile().isPresent() ? currentSlice.getBaseFile().get().getFileSize() : writeConfig.getParquetMaxFileSize(); + +for (FileSlice currentSlice : sortedFileSlices) { + long currentSize = currentSlice.getBaseFile().isPresent() ? currentSlice.getBaseFile().get().getFileSize() : writeConfig.getParquetMaxFileSize(); // check if max size is reached and create new group, if needed. - if (totalSizeSoFar >= writeConfig.getClusteringMaxBytesInGroup() && !currentGroup.isEmpty()) { + if (totalSizeSoFar + currentSize > writeConfig.getClusteringMaxBytesInGroup() && !currentGroup.isEmpty()) { int numOutputGroups = getNumberOfOutputFileGroups(totalSizeSoFar, writeConfig.getClusteringTargetFileMaxBytes()); LOG.info("Adding one clustering group " + totalSizeSoFar + " max bytes: " -+ writeConfig.getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups); ++ writeConfig.getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups); fileSliceGroups.add(Pair.of(currentGroup, numOutputGroups)); currentGroup = new ArrayList<>(); totalSizeSoFar = 0; } + + // Add to the current file-group currentGroup.add(currentSlice); - // totalSizeSoFar could be 0 when new group was created in the previous conditional block. - // reset to the size of current slice, otherwise the number of output file group will become 0 even though current slice is present. - if (totalSizeSoFar == 0) { -totalSizeSoFar += currentSlice.getBaseFile().isPresent() ? currentSlice.getBaseFile().get().getFileSize() : writeConfig.getParquetMaxFileSize(); - } + // assume each file group size is ~= parquet.max.file.size + totalSizeSoFar += currentSize; } + if (!currentGroup.isEmpty()) { - int numOutputGroups = getNumberOfOutputFileGroups(totalSizeSoFar, writeConfig.getClusteringTargetFileMaxBytes()); - LOG.info("Adding final clustering group " + totalSizeSoFar + " max bytes: " - + writeConfig.getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups); - fileSliceGroups.add(Pair.of(currentGroup, numOutputGroups)); + if (currentGroup.size() > 1 || writeConfig.shouldClusteringSingleGroup()) { +int numOutputGroups = getNumberOfOutputFileGroups(totalSizeSoFar, writeConfig.getClusteringTargetFileMaxBytes()); +LOG.inf
[GitHub] [hudi] danny0405 merged pull request #9099: [HUDI-6457]Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBase…
danny0405 merged PR #9099: URL: https://github.com/apache/hudi/pull/9099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-6458) Scheduling jobs should not fail when there is no completed commits
[ https://issues.apache.org/jira/browse/HUDI-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6458. Resolution: Fixed Fixed via master branch: a94db121b3aa05fd2243cb0a7794a2c20048065b > Scheduling jobs should not fail when there is no completed commits > -- > > Key: HUDI-6458 > URL: https://issues.apache.org/jira/browse/HUDI-6458 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 merged pull request #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits
danny0405 merged PR #9097: URL: https://github.com/apache/hudi/pull/9097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6458] Scheduling jobs should not fail when there is no completed commits (#9097)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a94db121b3a [HUDI-6458] Scheduling jobs should not fail when there is no completed commits (#9097) a94db121b3a is described below commit a94db121b3aa05fd2243cb0a7794a2c20048065b Author: ksmou <135721692+ks...@users.noreply.github.com> AuthorDate: Fri Jun 30 19:37:33 2023 +0800 [HUDI-6458] Scheduling jobs should not fail when there is no completed commits (#9097) --- .../src/main/java/org/apache/hudi/utilities/HoodieCompactor.java | 4 .../src/main/java/org/apache/hudi/utilities/UtilHelpers.java | 3 --- 2 files changed, 7 deletions(-) diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java index c1958e76e6b..603502affb6 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java @@ -30,7 +30,6 @@ import org.apache.hudi.common.table.timeline.HoodieTimeline; import org.apache.hudi.common.util.Option; import org.apache.hudi.common.util.StringUtils; import org.apache.hudi.config.HoodieCleanConfig; -import org.apache.hudi.exception.HoodieException; import org.apache.hudi.table.action.HoodieWriteMetadata; import org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy; @@ -293,9 +292,6 @@ public class HoodieCompactor { private String getSchemaFromLatestInstant() throws Exception { TableSchemaResolver schemaUtil = new TableSchemaResolver(metaClient); -if (metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().countInstants() == 0) { - throw new HoodieException("Cannot run compaction without any completed commits"); -} Schema schema = schemaUtil.getTableAvroSchema(false); return schema.toString(); } diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java index 5c09cf71a2b..a0d241752c5 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java @@ -589,9 +589,6 @@ public class UtilHelpers { public static String getSchemaFromLatestInstant(HoodieTableMetaClient metaClient) throws Exception { TableSchemaResolver schemaResolver = new TableSchemaResolver(metaClient); -if (metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().countInstants() == 0) { - throw new HoodieException("Cannot run clustering without any completed commits"); -} Schema schema = schemaResolver.getTableAvroSchema(false); return schema.toString(); }
[GitHub] [hudi] danny0405 commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString
danny0405 commented on code in PR #9064: URL: https://github.com/apache/hudi/pull/9064#discussion_r1247765082 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala: ## @@ -561,7 +561,7 @@ class HoodieCDCRDD( originTableSchema.structTypeSchema.zipWithIndex.foreach { case (field, idx) => if (field.dataType.isInstanceOf[StringType]) { -map(field.name) = record.getString(idx) +map(field.name) = Option(record.getUTF8String(idx)).map(_.toString).orNull } else { Review Comment: Looks good to me ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zaza commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString
zaza commented on code in PR #9064: URL: https://github.com/apache/hudi/pull/9064#discussion_r1247764006 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala: ## @@ -561,7 +561,7 @@ class HoodieCDCRDD( originTableSchema.structTypeSchema.zipWithIndex.foreach { case (field, idx) => if (field.dataType.isInstanceOf[StringType]) { -map(field.name) = record.getString(idx) +map(field.name) = Option(record.getUTF8String(idx)).map(_.toString).orNull } else { Review Comment: Is [this](https://github.com/apache/hudi/pull/9064/commits/af87c98dd4c370bb40287013adcecd314e20b546) better or would you like me go further? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3
danny0405 commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1614521616 Should be, but it is more related with how the timestamp type is synced I think: https://github.com/apache/hudi/pull/8867 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #9048: [HUDI-6434] Fix illegalArgumentException when do read_optimized read in Flink
danny0405 commented on PR #9048: URL: https://github.com/apache/hudi/pull/9048#issuecomment-1614517556 That's true. Actually it is even more friendly for Hive query engine too, just a little late for 0.14.0 release because I'm scared for introducing protential bug, we can make the first file slice with parquets once we have enough test cases in production for backing up the confidence. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString
danny0405 commented on code in PR #9064: URL: https://github.com/apache/hudi/pull/9064#discussion_r1247754441 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala: ## @@ -561,7 +561,7 @@ class HoodieCDCRDD( originTableSchema.structTypeSchema.zipWithIndex.foreach { case (field, idx) => if (field.dataType.isInstanceOf[StringType]) { -map(field.name) = record.getString(idx) +map(field.name) = Option(record.getUTF8String(idx)).map(_.toString).orNull } else { Review Comment: I think we are cool, a basic tool test makes sense to me. It's cool if we can make the tool a singleton, no json mapper resigtering for each invocation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9103: [MINOR]move hoodie hfile/orc reader/writer test cases from hudi-client-common to hudi-common
hudi-bot commented on PR #9103: URL: https://github.com/apache/hudi/pull/9103#issuecomment-1614506556 ## CI report: * f26d06b7eb099e698fe7058f3ffba327d4ae5c7f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18228) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #9093: [SUPPORT] Is it allowed using Flink Table API sqlQuery() to read data from hudi tables?
danny0405 commented on issue #9093: URL: https://github.com/apache/hudi/issues/9093#issuecomment-1614496615 You should define the `bulk_insert` option while initializing the table with sql: ```sql String createTabelSql = "create table dept(\n" + " dept_id BIGINT PRIMARY KEY NOT ENFORCED,\n" + " dept_name varchar(10),\n" + " ts timestamp(3)\n" + ")\n" + "with (\n" + " 'connector' = 'hudi',\n" + " 'path' = 'hdfs://localhost:9000/hudi/dept',\n" + " 'table.type' = 'MERGE_ON_READ'\n" + ")"; ``` It's weird you can't query the data, is there any exception thrown out? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9099: [HUDI-6457]Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBase…
hudi-bot commented on PR #9099: URL: https://github.com/apache/hudi/pull/9099#issuecomment-1614453233 ## CI report: * c61be845ddfc82ffcc107f8db437fc75d334eb58 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18226) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] gamblewin commented on issue #9093: [SUPPORT] Is it allowed using Flink Table API sqlQuery() to read data from hudi tables?
gamblewin commented on issue #9093: URL: https://github.com/apache/hudi/issues/9093#issuecomment-1614448235 @danny0405 Thx for replying. 1. Data is committed into the table, but can not be queried by using `sTableEnv.sqlQuery(select * from dept)`. ![image](https://github.com/apache/hudi/assets/39117591/732b92ec-4de2-473c-a80a-8db48db13616) 2. If i use sql way, which is inserting multiple rows in one sql and executing this sql, **is this way bulk insert or not?** ```java sEnv = StreamExecutionEnvironment.getExecutionEnvironment(); sEnv.setRuntimeMode(RuntimeExecutionMode.BATCH);// set execution mode as batch sTableEnv = StreamTableEnvironment.create(sEnv); sEnv.setParallelism(1); sEnv.enableCheckpointing(3000); // SQL way: insert multiple rows in one sql without explicitly configuring write option as bulk insert sTableEnv.executeSql("insert into dept values (1, 'a', NOW()), (2, 'b', NOW())"); ``` 3. If the above sql way is not bulk insert, **is there any way i can bulk insert data by using sql?** I know that for query sql, we can add options to set up some configurations, but i tried add options to insert data sql, it's not working. ```sql insert into dept values (1, 'a', NOW()), (2, 'b', NOW()) /*+ options ( 'write.operation' = 'bulk_insert' )*/ ``` 4. I think what u really mean is using streaming API to bulk insert data. In my understanding, bulk insert means insert a batch of data at a time, but in the following code, **source data is an unbounded stream, how does sink function split source data into different batches?** ```java DataStream dataStream = env.addSource(...); Map options = new HashMap<>(); // other option configurations .. options.put("write.operation", "bulk_insert"); DataStream dataStream = sEnv.addSource(...); HoodiePipeline.Builder builder = HoodiePipeline.builder("dept") .column(...) .options(options); builder.sink(dataStream, false); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni commented on pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming
parisni commented on PR #9053: URL: https://github.com/apache/hudi/pull/9053#issuecomment-1614447464 > Currently, we lack tests that cover the sortDataFrameBySampleSupportAllTypes function. It would be highly beneficial if you could include it as well. Agreed, fill free to submit a patch, I am in vacation for a week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni commented on a diff in pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming
parisni commented on code in PR #9053: URL: https://github.com/apache/hudi/pull/9053#discussion_r1247703419 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/execution/RangeSample.scala: ## @@ -316,6 +316,8 @@ object RangeSampleSort { HoodieClusteringConfig.LAYOUT_OPTIMIZE_BUILD_CURVE_SAMPLE_SIZE.defaultValue.toString).toInt val sample = new RangeSample(zOrderBounds, sampleRdd) val rangeBounds = sample.getRangeBounds() + if (rangeBounds.size <= 1) Review Comment: yes, the test has `height` column which is complex (array). But it did't trigger an error, a simple columns did. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits
hudi-bot commented on PR #9097: URL: https://github.com/apache/hudi/pull/9097#issuecomment-1614383551 ## CI report: * db92d6d09635496b22c27e1375057fed504e6c70 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18225) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8965: [SUPPORT]NoSuchMethodError: org.apache.curator.CuratorZookeeperClient.startAdvancedTrace
ad1happy2go commented on issue #8965: URL: https://github.com/apache/hudi/issues/8965#issuecomment-1614381178 @nb Also it this a deltastreamer job or spark datasource writer, Can you also paste the code snippet so I can take a look into. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8965: [SUPPORT]NoSuchMethodError: org.apache.curator.CuratorZookeeperClient.startAdvancedTrace
ad1happy2go commented on issue #8965: URL: https://github.com/apache/hudi/issues/8965#issuecomment-1614379041 @nb I tried to reproduce this issue but zookeeper concurrency is working fine with Spark 3.1 and Hudi 0.13.0. I checked the stack trace and it looks like while writing data only you are getting this exception. Any special information about your setup you can provide to help me triage this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6448] Improve upgrade/downgrade for table ver. 6 (#9063)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new f57248abb46 [HUDI-6448] Improve upgrade/downgrade for table ver. 6 (#9063) f57248abb46 is described below commit f57248abb465a923418129c18801ec1d64a15a5d Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com> AuthorDate: Fri Jun 30 02:18:13 2023 -0700 [HUDI-6448] Improve upgrade/downgrade for table ver. 6 (#9063) - Co-authored-by: sivabalan --- .../table/upgrade/FiveToFourDowngradeHandler.java | 4 +- .../table/upgrade/FiveToSixUpgradeHandler.java | 20 -- .../table/upgrade/FourToFiveUpgradeHandler.java| 4 +- .../table/upgrade/OneToZeroDowngradeHandler.java | 2 +- .../table/upgrade/SixToFiveDowngradeHandler.java | 44 ++-- .../table/upgrade/TwoToOneDowngradeHandler.java| 2 +- .../functional/TestHoodieBackedMetadata.java | 4 +- .../hudi/table/upgrade/TestUpgradeDowngrade.java | 78 -- .../hudi/common/table/HoodieTableConfig.java | 4 ++ .../hudi/common/table/HoodieTableVersion.java | 2 +- .../TestUpgradeOrDowngradeProcedure.scala | 4 +- 11 files changed, 143 insertions(+), 25 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToFourDowngradeHandler.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToFourDowngradeHandler.java index 51da9810f6a..e51f5496c2d 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToFourDowngradeHandler.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToFourDowngradeHandler.java @@ -23,13 +23,13 @@ import org.apache.hudi.common.config.ConfigProperty; import org.apache.hudi.common.engine.HoodieEngineContext; import org.apache.hudi.config.HoodieWriteConfig; -import java.util.HashMap; +import java.util.Collections; import java.util.Map; public class FiveToFourDowngradeHandler implements DowngradeHandler { @Override public Map downgrade(HoodieWriteConfig config, HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) { -return new HashMap<>(); +return Collections.emptyMap(); } } diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java index e3346c2f455..69086b394bf 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java @@ -18,7 +18,6 @@ package org.apache.hudi.table.upgrade; -import org.apache.hadoop.fs.Path; import org.apache.hudi.common.config.ConfigProperty; import org.apache.hudi.common.engine.HoodieEngineContext; import org.apache.hudi.common.table.HoodieTableMetaClient; @@ -28,11 +27,13 @@ import org.apache.hudi.common.table.timeline.HoodieTimeline; import org.apache.hudi.config.HoodieWriteConfig; import org.apache.hudi.exception.HoodieUpgradeDowngradeException; import org.apache.hudi.table.HoodieTable; + +import org.apache.hadoop.fs.Path; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; -import java.util.HashMap; +import java.util.Collections; import java.util.Map; /** @@ -46,9 +47,18 @@ public class FiveToSixUpgradeHandler implements UpgradeHandler { @Override public Map upgrade(HoodieWriteConfig config, HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) { -HoodieTable table = upgradeDowngradeHelper.getTable(config, context); +final HoodieTable table = upgradeDowngradeHelper.getTable(config, context); + +deleteCompactionRequestedFileFromAuxiliaryFolder(table); + +return Collections.emptyMap(); + } + + /** + * See HUDI-6040. + */ + private void deleteCompactionRequestedFileFromAuxiliaryFolder(HoodieTable table) { HoodieTableMetaClient metaClient = table.getMetaClient(); -// delete compaction file from .aux HoodieTimeline compactionTimeline = metaClient.getActiveTimeline().filterPendingCompactionTimeline() .filter(instant -> instant.getState() == HoodieInstant.State.REQUESTED); compactionTimeline.getInstantsAsStream().forEach( @@ -65,6 +75,6 @@ public class FiveToSixUpgradeHandler implements UpgradeHandler { } } ); -return new HashMap<>(); } + } diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FourToFiveUpgradeHandler.java b/hudi-client/hudi-client-common/src/main/java/org/
[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6
hudi-bot commented on PR #9063: URL: https://github.com/apache/hudi/pull/9063#issuecomment-1614372389 ## CI report: * 045511c3843e115d0df5d97f5f38726b75c98be7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18224) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9098: [MINOR] Reverting disabled tests for multiwriter archival
hudi-bot commented on PR #9098: URL: https://github.com/apache/hudi/pull/9098#issuecomment-1614315715 ## CI report: * 120a4bcce84c866dfff254294f2a20a54a7d0b1e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18223) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1614305979 ## CI report: * f154ee335eb307e2bcffd895cfd95bfb1f417a1e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18218) * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18233) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9104: [HUDI-6445] Removing gc hints from test base
hudi-bot commented on PR #9104: URL: https://github.com/apache/hudi/pull/9104#issuecomment-1614295931 ## CI report: * 022113d3bfa7d479b935b193293fae2a295be46d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18234) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9080: [HUDI-6445] Making some of Spark DS tests as functional
hudi-bot commented on PR #9080: URL: https://github.com/apache/hudi/pull/9080#issuecomment-1614254548 ## CI report: * d28ff949a1dd43456fda75e5624848bb63e030f4 UNKNOWN * 645cc6e14e3bac64ddce26dcad6a51fd4aec3f51 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18174) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18173) * b9dd8237e187586c5d05b46d4d4eee891822813e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18232) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction
hudi-bot commented on PR #9007: URL: https://github.com/apache/hudi/pull/9007#issuecomment-1614254342 ## CI report: * f154ee335eb307e2bcffd895cfd95bfb1f417a1e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18218) * c221efd733a444258780949b698830c2cef47931 UNKNOWN * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9104: [HUDI-6445] Removing gc hints from test base
hudi-bot commented on PR #9104: URL: https://github.com/apache/hudi/pull/9104#issuecomment-1614246214 ## CI report: * 022113d3bfa7d479b935b193293fae2a295be46d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org