Re: [PR] [HUDI-7975] Provide an API to create empty commit [hudi]
hudi-bot commented on PR #11606: URL: https://github.com/apache/hudi/pull/11606#issuecomment-2219689162 ## CI report: * 7c2dc1d616944a7e24693e7710005c52fc446601 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24804) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7975] Provide an API to create empty commit [hudi]
hudi-bot commented on PR #11606: URL: https://github.com/apache/hudi/pull/11606#issuecomment-2219677617 ## CI report: * 7c2dc1d616944a7e24693e7710005c52fc446601 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] build: add info for rust and python artifacts [hudi-rs]
codecov[bot] commented on PR #60: URL: https://github.com/apache/hudi-rs/pull/60#issuecomment-2219668014 ## [Codecov](https://app.codecov.io/gh/apache/hudi-rs/pull/60?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report All modified and coverable lines are covered by tests :white_check_mark: > Project coverage is 87.19%. Comparing base [(`78a558f`)](https://app.codecov.io/gh/apache/hudi-rs/commit/78a558f00c8a6c4556db5ee98f26369fd90fabcf?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) to head [(`97fa31d`)](https://app.codecov.io/gh/apache/hudi-rs/commit/97fa31d7937583b56e0a900b6a50c58c40f44f6a?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #60 +/- ## === Coverage 87.19% 87.19% === Files 13 13 Lines 687 687 === Hits 599 599 Misses 88 88 ``` [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/hudi-rs/pull/60?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]
hudi-bot commented on PR #11539: URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219666454 ## CI report: * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN * 83fe235b703ba4fa1224b41eec2e19f27600671f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24799) * c14015c3618d231bc439c0a4fb14ce2dff32de00 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24803) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] build: add info for rust and python artifacts [hudi-rs]
xushiyan opened a new pull request, #60: URL: https://github.com/apache/hudi-rs/pull/60 - Make `datafusion` a feature to hudi crate - Add `__version__` to python package - Add more info for package repositories -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-3625][RFC-60] Add StorageStrtegy to HoodieStorage [hudi]
CTTY opened a new pull request, #11607: URL: https://github.com/apache/hudi/pull/11607 ### Change Logs This is a part of RFC-60: Object Storage Storage Strategy https://github.com/apache/hudi/blob/master/rfc/rfc-60/rfc-60.md. The end goal is to leverage the HoodieStorage layer to further separate Hudi logic from File IO, allowing more flexibility in the physical location of files. This PR will add StorageStrategy to HoodieStorage, but StorageStrategy will NOT be used anywhere just yet. ### Impact No impact. ### Risk level (write none, low medium or high below) None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7975) Transfer extrametada to new commits when new data is not ingeested to trigger table services on the dataset
[ https://issues.apache.org/jira/browse/HUDI-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7975: - Labels: pull-request-available (was: ) > Transfer extrametada to new commits when new data is not ingeested to trigger > table services on the dataset > --- > > Key: HUDI-7975 > URL: https://issues.apache.org/jira/browse/HUDI-7975 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Surya Prasanna Yalla >Assignee: Surya Prasanna Yalla >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7975] Provide an API to create empty commit [hudi]
suryaprasanna opened a new pull request, #11606: URL: https://github.com/apache/hudi/pull/11606 Summary: By creating empty commit, checkpoints from the commit files can be transferred to new instants. So, this change is used to create emptyCommit by copying the extrametadata from the last completed non-table service commit in the timeline. Generally, empty commits are created max one instant per day per dataset, the cadence to transfer metadata can be configured that way bloating of the commit timeline can be avoided. whenever there is no new data to be written. ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7975) Transfer extrametada to new commits when new data is not ingeested to trigger table services on the dataset
Surya Prasanna Yalla created HUDI-7975: -- Summary: Transfer extrametada to new commits when new data is not ingeested to trigger table services on the dataset Key: HUDI-7975 URL: https://issues.apache.org/jira/browse/HUDI-7975 Project: Apache Hudi Issue Type: Improvement Reporter: Surya Prasanna Yalla -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7975) Transfer extrametada to new commits when new data is not ingeested to trigger table services on the dataset
[ https://issues.apache.org/jira/browse/HUDI-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surya Prasanna Yalla reassigned HUDI-7975: -- Assignee: Surya Prasanna Yalla > Transfer extrametada to new commits when new data is not ingeested to trigger > table services on the dataset > --- > > Key: HUDI-7975 > URL: https://issues.apache.org/jira/browse/HUDI-7975 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Surya Prasanna Yalla >Assignee: Surya Prasanna Yalla >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]
hudi-bot commented on PR #11539: URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219609595 ## CI report: * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN * 83fe235b703ba4fa1224b41eec2e19f27600671f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24799) * c14015c3618d231bc439c0a4fb14ce2dff32de00 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] fix: update home page title [hudi]
pintusoliya commented on PR #11530: URL: https://github.com/apache/hudi/pull/11530#issuecomment-2219609458 > @pintusoliya would you able to run the website locally? pls screenshot your local run so we can see the outcome. also look like the CI is not passing Uploaded video as screenshot was not possible due to hover effect -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7974] Create empty clean commit at a cadence and make it configurable [hudi]
hudi-bot commented on PR #11605: URL: https://github.com/apache/hudi/pull/11605#issuecomment-2219602875 ## CI report: * 2fc956794c1effc3dbd09b665eac1266503f407f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24802) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7974] Create empty clean commit at a cadence and make it configurable [hudi]
hudi-bot commented on PR #11605: URL: https://github.com/apache/hudi/pull/11605#issuecomment-2219546966 ## CI report: * 2fc956794c1effc3dbd09b665eac1266503f407f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24802) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7974] Create empty clean commit at a cadence and make it configurable [hudi]
hudi-bot commented on PR #11605: URL: https://github.com/apache/hudi/pull/11605#issuecomment-2219539346 ## CI report: * 2fc956794c1effc3dbd09b665eac1266503f407f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… URL: https://github.com/apache/hudi/pull/11578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Remote connection issue while testing locally Apache Hudi with Glue Image and LocalStack [hudi]
cannon-tp commented on issue #8691: URL: https://github.com/apache/hudi/issues/8691#issuecomment-2219537023 Hey, @danfran I think setting hadoop properties in spark conf could be a problem. I faced the same, resolved it using the following code. ``` import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext, SparkConf from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame conf = (SparkConf().setAppName("hudi-1") .set("spark.hadoop.fs.s3a.endpoint", "http://localstack:4566";) .set("spark.hadoop.fs.s3a.connection.ssl.enabled", "false") .set("spark.hadoop.fs.s3a.multipart.size", "104857600") .set("spark.hadoop.fs.s3a.access.key", "test") .set("spark.hadoop.fs.s3a.secret.key", "test") .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") .set("spark.hadoop.fs.s3a.path.style.access", "true") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.jars.packages", "org.apache.hudi:hudi-spark3.3-bundle_2.12:0.15.0,org.apache.hadoop:hadoop-aws:3.3.3") .set("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.hudi.catalog.HoodieCatalog") .set("spark.sql.extensions", "org.apache.spark.sql.hudi.HoodieSparkSessionExtension") .set("spark.sql.legacy.timeParserPolicy", "LEGACY") ) sc = SparkContext(conf=conf) glueContext = GlueContext(sc) spark = glueContext.spark_session ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7859] Rename instant files to be consistent with 0.x naming format when downgrade [hudi]
codope commented on code in PR #11545: URL: https://github.com/apache/hudi/pull/11545#discussion_r1671596452 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/EightToSevenDowngradeHandler.java: ## @@ -20,18 +20,53 @@ import org.apache.hudi.common.config.ConfigProperty; import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.timeline.HoodieInstant; import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.exception.HoodieIOException; +import org.apache.hudi.storage.StoragePath; +import org.apache.hudi.table.HoodieTable; +import java.io.IOException; import java.util.Collections; +import java.util.List; import java.util.Map; + /** * Version 7 is going to be placeholder version for bridge release 0.16.0. * Version 8 is the placeholder version to track 1.x. */ public class EightToSevenDowngradeHandler implements DowngradeHandler { @Override public Map downgrade(HoodieWriteConfig config, HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) { +final HoodieTable table = upgradeDowngradeHelper.getTable(config, context); +UpgradeDowngradeUtils.runCompaction(table, context, config, upgradeDowngradeHelper); +UpgradeDowngradeUtils.syncCompactionRequestedFileToAuxiliaryFolder(table); + +HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(context.getStorageConf().newInstance()).setBasePath(config.getBasePath()).build(); +List instants = metaClient.getActiveTimeline().getInstants(); +if (!instants.isEmpty()) { + context.map(instants, instant -> { +if (!instant.getFileName().contains("_")) { + return false; +} +try { + // Rename the metadata file name from the ${instant_time}_${completion_time}.action[.state] format in version 1.x to the ${instant_time}.action[.state] format in version 0.x. + StoragePath fromPath = new StoragePath(metaClient.getMetaPath(), instant.getFileName()); + StoragePath toPath = new StoragePath(metaClient.getMetaPath(), instant.getFileName().replaceAll("_\\d+", "")); + boolean success = metaClient.getStorage().rename(fromPath, toPath); + // TODO: We need to rename the action-related part of the metadata file name here when we bring separate action name for clustering/compaction in 1.x as well. Review Comment: Is there a separate ticket tracking this TODO? ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestUpgradeOrDowngradeProcedure.scala: ## @@ -142,6 +143,56 @@ class TestUpgradeOrDowngradeProcedure extends HoodieSparkProcedureTestBase { } } + test("Test downgrade table from version eight to version seven") { +withTempDir { tmp => + val tableName = generateTableName + val tablePath = s"${tmp.getCanonicalPath}/$tableName" + // create table + spark.sql( +s""" + |create table $tableName ( + | id int, + | name string, + | price double, + | ts long + |) using hudi + | location '$tablePath' + | options ( + | type = 'mor', + | primaryKey = 'id', + | preCombineField = 'ts' + | ) + """.stripMargin) + + spark.sql("set hoodie.compact.inline=true") + spark.sql("set hoodie.compact.inline.max.delta.commits=1") + spark.sql("set hoodie.clean.commits.retained = 2") + spark.sql("set hoodie.keep.min.commits = 3") + spark.sql("set hoodie.keep.min.commits = 4") + spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") + spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") + spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") + spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") + spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") + + var metaClient = createMetaClient(spark, tablePath) + // verify hoodie.table.version of the table is EIGHT + if (metaClient.getTableConfig.getTableVersion.versionCode().equals(HoodieTableVersion.EIGHT.versionCode())) { +// downgrade table from version eight to version seven +checkAnswer(s"""call downgrade_table(table => '$tableName', to_version => 'SEVEN')""")(Seq(true)) +metaClient = HoodieTableMetaClient.reload(metaClient) +assertResult(HoodieTableVersion.SEVEN.versionCode) { + metaClient.getTableConfig.getTableVersion.versionCode() +} +// Verify whether the naming format of instant files is consistent with 0.x + metaClient.reloadActiveTimeline().getInstants.iterator().asScala.forall(f => !f.getFileName.contains("_")) Review Comment: Can we add a pat
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
danny0405 commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219523986 > Both algorithms have drawbacks. @xicm That's fine, the new algorithm looks simpler, there is no need to distinguish between different parallelisms. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7974) Create empty clean commit at a cadence and make it configurable
Surya Prasanna Yalla created HUDI-7974: -- Summary: Create empty clean commit at a cadence and make it configurable Key: HUDI-7974 URL: https://issues.apache.org/jira/browse/HUDI-7974 Project: Apache Hudi Issue Type: Improvement Reporter: Surya Prasanna Yalla Assignee: Surya Prasanna Yalla -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7974) Create empty clean commit at a cadence and make it configurable
[ https://issues.apache.org/jira/browse/HUDI-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7974: - Labels: pull-request-available (was: ) > Create empty clean commit at a cadence and make it configurable > --- > > Key: HUDI-7974 > URL: https://issues.apache.org/jira/browse/HUDI-7974 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Surya Prasanna Yalla >Assignee: Surya Prasanna Yalla >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7974] Create empty clean commit at a cadence and make it configurable [hudi]
suryaprasanna opened a new pull request, #11605: URL: https://github.com/apache/hudi/pull/11605 Summary: This change fixes empty clean commit logic and also makes it configurable. ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… URL: https://github.com/apache/hudi/pull/11578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
KnightChess commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219457551 don't know why this check will contain docker moudle, other success look like not contain, retrigger again ![image](https://github.com/apache/hudi/assets/20125927/f70572ae-afd0-4e0d-b9c3-e5f4d343ca62) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219424072 ## CI report: * d9c0ce277a202dc66f56b40418b4746fdcb6e1b6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24800) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Failed to update metadata(hudi 0.15.0) [hudi]
MrAladdin commented on issue #11587: URL: https://github.com/apache/hudi/issues/11587#issuecomment-2219397124 > hey @MrAladdin : are you in hudi slack. we can connect and investigate faster. can you post a msg there and tag me (shivnarayan) and sagar (sagar sumit) I'm really sorry, but due to certain reasons, I am unable to help you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]
hudi-bot commented on PR #11604: URL: https://github.com/apache/hudi/pull/11604#issuecomment-2219384701 ## CI report: * b1b476f2cea0fb02c9665e818711c0892b686352 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24798) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]
hudi-bot commented on PR #11539: URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219384101 ## CI report: * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN * 83fe235b703ba4fa1224b41eec2e19f27600671f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24799) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Failed to update metadata(hudi 0.15.0) [hudi]
MrAladdin commented on issue #11587: URL: https://github.com/apache/hudi/issues/11587#issuecomment-2219370420 > also, do you think you can use global simple in the mean time while we try to find the root cause and get a fix out? Due to my business scenario involving a large number of upsert operations (public opinion data), other index types did not perform well in previous tests. Only dynamic bucket and the newly released record_index in version 0.14 met the requirements. I have always wanted to find an index that doesn't require manual intervention or adjustment, so record_index has attracted my attention and interest. This test is mainly focused on the performance improvements of record_index in version 0.15. I can wait for this to be fixed before conducting tests. Actually, there is another issue mentioned in https://github.com/apache/hudi/issues/11567. When the amount of already stored data is huge, it is also a maddening issue. You can pay attention to this as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7969] Fix data loss caused by concurrent write and clean [hudi]
Zouxxyy closed pull request #11600: [HUDI-7969] Fix data loss caused by concurrent write and clean URL: https://github.com/apache/hudi/pull/11600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
xicm commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219339643 > @xicm @KnightChess So we reach concensus the algorithm raised by @KnightChess is better? If that's true, let's fire a fix in a separate PR. Both algorithms have drawbacks. For example, parallelism = 10, bucketNumber = 5 and partition = ["2021-01-01", "2021-01-03"] old: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] new: [2, 2, 2, 2, 2] parallelism = 20, bucketNumber = 5 and partition = ["2021-01-01", "2021-01-03"] old: [2, 2, 2, 2, 2] new: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] The element in the array means how many data slice each TM processes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7968] Claiming rfc for robust spark writes (#11592)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new dbcd089b679 [HUDI-7968] Claiming rfc for robust spark writes (#11592) dbcd089b679 is described below commit dbcd089b679b9df5de763b115db1b0162a05ea6f Author: Sivabalan Narayanan AuthorDate: Tue Jul 9 19:04:16 2024 -0700 [HUDI-7968] Claiming rfc for robust spark writes (#11592) --- rfc/README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rfc/README.md b/rfc/README.md index 2fdd3d8db49..1c6c927ac58 100644 --- a/rfc/README.md +++ b/rfc/README.md @@ -113,4 +113,5 @@ The list of all RFCs can be found here. | 75 | [Hudi-Native HFile Reader and Writer](./rfc-75/rfc-75.md) | `UNDER REVIEW` | | 76 | [Auto Record key generation](./rfc-76/rfc-76.md) | `IN PROGRESS` | | 77 | [Secondary Index](./rfc-77/rfc-77.md) | `UNDER REVIEW` | -| 78 | [Bridge release for 1.x](./rfc-78/rfc-78.md) | `IN PROGRESS` | \ No newline at end of file +| 78 | [Bridge release for 1.x](./rfc-78/rfc-78.md) | `IN PROGRESS` | +| 79 | [Robust handling of spark task retries and failures](./rfc-79/rfc-79.md) | `IN PROGRESS` | \ No newline at end of file
Re: [PR] [HUDI-7968] Claiming rfc for robust spark writes [hudi]
yihua merged PR #11592: URL: https://github.com/apache/hudi/pull/11592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219319141 ## CI report: * 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24773) * d9c0ce277a202dc66f56b40418b4746fdcb6e1b6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24800) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]
hudi-bot commented on PR #11539: URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219318998 ## CI report: * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN * 13e49581c971458be6c84c60f69aa595bb7f73fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24793) * 83fe235b703ba4fa1224b41eec2e19f27600671f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24799) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219311546 ## CI report: * 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24773) * d9c0ce277a202dc66f56b40418b4746fdcb6e1b6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]
hudi-bot commented on PR #11539: URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219311406 ## CI report: * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN * 13e49581c971458be6c84c60f69aa595bb7f73fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24793) * 83fe235b703ba4fa1224b41eec2e19f27600671f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
danny0405 commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219306922 @xicm @KnightChess So we reach concensus the algorithm raised by @KnightChess is better? If that's true, let's fire a fix in a separate PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7969] Fix data loss caused by concurrent write and clean [hudi]
danny0405 commented on code in PR #11600: URL: https://github.com/apache/hudi/pull/11600#discussion_r1671438127 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -451,8 +451,12 @@ && noSubsequentReplaceCommit(earliestInstant.getTimestamp(), partitionPath)) { * IMPORTANT: {@code fsView.getAllFileGroups} does not return pending file groups for metadata table, * file listing must be used instead. */ - private boolean hasPendingFiles(String partitionPath) { + private boolean mayHavePendingFiles(String partitionPath) { try { + // As long as there are pending commits never delete empty partitions, because they may write files to any partitions. + if (!hoodieTable.getMetaClient().getCommitsTimeline().filterInflightsAndRequested().empty()) { +return true; Review Comment: For streaming ingestion, there should always be an pending instant on the timeline, and the following up logic may never reach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7859] Rename instant files to be consistent with 0.x naming format when downgrade [hudi]
watermelon12138 commented on PR #11545: URL: https://github.com/apache/hudi/pull/11545#issuecomment-2219305418 @danny0405 @codope Hi, masters, all checks has passed. Could you help to review the code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
xicm commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219304171 > @xicm no, although fixing the overflow problem, the old will not be better, you can try the ut. I have tried before. oh, there's something wrong with my test case , the old algorithm also has drawbacks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
KnightChess commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219288802 @danny0405 I have tried before, the result is the new algorithm better. I will fix it in a separate pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
KnightChess commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219285300 @xicm no, although fixing the overflow problem, the old will not be better, you can try the ut. I have tried before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671424828 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671419636 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671419636 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671414686 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-2955] Support Hadoop3 [hudi]
hudi-bot commented on PR #11572: URL: https://github.com/apache/hudi/pull/11572#issuecomment-221989 ## CI report: * 2639f581f20ab0b8fddf22d0fcfeb54f164ec346 UNKNOWN * 301d9bc766ce39ffdfec634a790b9ba7aee51165 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24797) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]
hudi-bot commented on PR #11604: URL: https://github.com/apache/hudi/pull/11604#issuecomment-2219222801 ## CI report: * b1b476f2cea0fb02c9665e818711c0892b686352 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24798) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]
hudi-bot commented on PR #11604: URL: https://github.com/apache/hudi/pull/11604#issuecomment-2219197965 ## CI report: * b1b476f2cea0fb02c9665e818711c0892b686352 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-2955] Support Hadoop3 [hudi]
hudi-bot commented on PR #11572: URL: https://github.com/apache/hudi/pull/11572#issuecomment-2219197455 ## CI report: * 2639f581f20ab0b8fddf22d0fcfeb54f164ec346 UNKNOWN * 830fa27e599f91574af60b01837586a1d3f5764a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24794) * 301d9bc766ce39ffdfec634a790b9ba7aee51165 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7866) Pull commit metadata changes in bridge release.
[ https://issues.apache.org/jira/browse/HUDI-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7866: -- Epic Link: (was: HUDI-7856) > Pull commit metadata changes in bridge release. > --- > > Key: HUDI-7866 > URL: https://issues.apache.org/jira/browse/HUDI-7866 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Assignee: sivabalan narayanan >Priority: Major > Fix For: 0.16.0, 1.0.0 > > > In 1.0.0, we changed some commit metadata to be written in avro. The scope > this task is to ensure that bridge release should be able to read commit > metadata written by 1.0.0. > > Scope could be lot more. > We could try to parse commit metadata at lot of adhoc places like compaction > planning, clean execution etc. So, we need to ensure we account for both > formats (json and avro) with 0.16.0 reader since we do not know if commit > metadata is from 0.16.0 or from 1.0. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7866) Pull commit metadata changes in bridge release.
[ https://issues.apache.org/jira/browse/HUDI-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-7866: - Assignee: Balaji Varadarajan (was: sivabalan narayanan) > Pull commit metadata changes in bridge release. > --- > > Key: HUDI-7866 > URL: https://issues.apache.org/jira/browse/HUDI-7866 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Assignee: Balaji Varadarajan >Priority: Major > Fix For: 0.16.0, 1.0.0 > > > In 1.0.0, we changed some commit metadata to be written in avro. The scope > this task is to ensure that bridge release should be able to read commit > metadata written by 1.0.0. > > Scope could be lot more. > We could try to parse commit metadata at lot of adhoc places like compaction > planning, clean execution etc. So, we need to ensure we account for both > formats (json and avro) with 0.16.0 reader since we do not know if commit > metadata is from 0.16.0 or from 1.0. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7866) Pull commit metadata changes in bridge release.
[ https://issues.apache.org/jira/browse/HUDI-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7866: -- Parent: HUDI-7882 Issue Type: Sub-task (was: Task) > Pull commit metadata changes in bridge release. > --- > > Key: HUDI-7866 > URL: https://issues.apache.org/jira/browse/HUDI-7866 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Assignee: sivabalan narayanan >Priority: Major > Fix For: 0.16.0, 1.0.0 > > > In 1.0.0, we changed some commit metadata to be written in avro. The scope > this task is to ensure that bridge release should be able to read commit > metadata written by 1.0.0. > > Scope could be lot more. > We could try to parse commit metadata at lot of adhoc places like compaction > planning, clean execution etc. So, we need to ensure we account for both > formats (json and avro) with 0.16.0 reader since we do not know if commit > metadata is from 0.16.0 or from 1.0. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7973) Add table property to track list of columns being indexed in col stats
[ https://issues.apache.org/jira/browse/HUDI-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7973: -- Fix Version/s: 1.0.0 > Add table property to track list of columns being indexed in col stats > --- > > Key: HUDI-7973 > URL: https://issues.apache.org/jira/browse/HUDI-7973 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Reporter: sivabalan narayanan >Priority: Major > Fix For: 1.0.0 > > > we need to add a new table property to track what cols are being indexed in > col stats. > If not for table property, could be a aux folder or somewhere. but we need to > store this state somewhere. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6510] Support GHCI on Java 17 [hudi]
hudi-bot commented on PR #11573: URL: https://github.com/apache/hudi/pull/11573#issuecomment-2219172213 ## CI report: * 03fb589a59c05f9c6f5c4ee99c934bb0b67de617 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24795) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7973) Add table property to track list of columns being indexed in col stats
sivabalan narayanan created HUDI-7973: - Summary: Add table property to track list of columns being indexed in col stats Key: HUDI-7973 URL: https://issues.apache.org/jira/browse/HUDI-7973 Project: Apache Hudi Issue Type: Improvement Components: metadata Reporter: sivabalan narayanan we need to add a new table property to track what cols are being indexed in col stats. If not for table property, could be a aux folder or somewhere. but we need to store this state somewhere. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7973) Add table property to track list of columns being indexed in col stats
[ https://issues.apache.org/jira/browse/HUDI-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7973: -- Epic Link: (was: HUDI-7856) > Add table property to track list of columns being indexed in col stats > --- > > Key: HUDI-7973 > URL: https://issues.apache.org/jira/browse/HUDI-7973 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Reporter: sivabalan narayanan >Priority: Major > > we need to add a new table property to track what cols are being indexed in > col stats. > If not for table property, could be a aux folder or somewhere. but we need to > store this state somewhere. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7973) Add table property to track list of columns being indexed in col stats
[ https://issues.apache.org/jira/browse/HUDI-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7973: -- Epic Link: HUDI-7856 > Add table property to track list of columns being indexed in col stats > --- > > Key: HUDI-7973 > URL: https://issues.apache.org/jira/browse/HUDI-7973 > Project: Apache Hudi > Issue Type: Improvement > Components: metadata >Reporter: sivabalan narayanan >Priority: Major > > we need to add a new table property to track what cols are being indexed in > col stats. > If not for table property, could be a aux folder or somewhere. but we need to > store this state somewhere. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7973) Add table property to track list of columns being indexed in col stats
[ https://issues.apache.org/jira/browse/HUDI-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7973: -- Parent: HUDI-7882 Issue Type: Sub-task (was: Improvement) > Add table property to track list of columns being indexed in col stats > --- > > Key: HUDI-7973 > URL: https://issues.apache.org/jira/browse/HUDI-7973 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Reporter: sivabalan narayanan >Priority: Major > > we need to add a new table property to track what cols are being indexed in > col stats. > If not for table property, could be a aux folder or somewhere. but we need to > store this state somewhere. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]
danny0405 commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219159512 > if we fix the overflow problem, the old algorithm is better. Let's fire a fix for it, and @KnightChess let's keep the Flink hashing algorithm the same as it is and we can improve it in a separate PR I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Metaserver read/write errors [hudi]
danny0405 commented on issue #9814: URL: https://github.com/apache/hudi/issues/9814#issuecomment-2219146486 cc @yihua for visibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7692] Extract metadata record type to MetadataPartitionType enum [hudi]
danny0405 commented on code in PR #11597: URL: https://github.com/apache/hudi/pull/11597#discussion_r1671384805 ## hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java: ## @@ -137,6 +148,10 @@ public String getFileIdPrefix() { return fileIdPrefix; } + public int getRecordType(String key) { +return recordType; Review Comment: The key is never used? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671383928 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
(hudi) branch master updated: [HUDI-7929] Fix file name in k8s example (#11603)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b0ee6152d99 [HUDI-7929] Fix file name in k8s example (#11603) b0ee6152d99 is described below commit b0ee6152d998e7ad75295b384ed0932b9f7e3c30 Author: Peter Huang AuthorDate: Tue Jul 9 17:16:53 2024 -0700 [HUDI-7929] Fix file name in k8s example (#11603) --- .../config/k8s/{flink-deployment.yml => flink-deployment.yaml}| 0 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/hudi-examples/hudi-examples-k8s/config/k8s/flink-deployment.yml b/hudi-examples/hudi-examples-k8s/config/k8s/flink-deployment.yaml similarity index 100% rename from hudi-examples/hudi-examples-k8s/config/k8s/flink-deployment.yml rename to hudi-examples/hudi-examples-k8s/config/k8s/flink-deployment.yaml
Re: [PR] [HUDI-7929] fix file name in k8s example [hudi]
danny0405 merged PR #11603: URL: https://github.com/apache/hudi/pull/11603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] refactor: enhance error handling with custom ConfigError type [hudi-rs]
xushiyan commented on code in PR #59: URL: https://github.com/apache/hudi-rs/pull/59#discussion_r1671383401 ## crates/core/src/config/mod.rs: ## @@ -18,14 +18,33 @@ */ use std::any::type_name; use std::collections::HashMap; +use std::error::Error; +use std::fmt; use std::sync::Arc; -use anyhow::Result; - pub mod internal; pub mod read; pub mod table; +#[derive(Debug)] +pub enum ConfigError { +NotFound, +ParseError(String), +Other(String), Review Comment: that's a nice improvement. I would suggest capture underlying error as source, like ParseError should capture std::ParseIntError, etc, and NotFound should capture which key (ConfigParser) it refers to. On a bigger scope, we should definitely standardize error types throughout hudi-core and other hudi crates. I chose `anyhow` for fast iteration and uncover error handling paths first. So all errors come out from hudi are now anyhow::Error. I suggest replace anyhow dependency with well-defined custom error enums implemented with [thiserror](https://docs.rs/thiserror/latest/thiserror/) in the next release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7888) Throw meaningful error when reading partial update or DV written in 1.x from 0.16.0 reader
[ https://issues.apache.org/jira/browse/HUDI-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7888: -- Fix Version/s: 1.0.0 > Throw meaningful error when reading partial update or DV written in 1.x from > 0.16.0 reader > -- > > Key: HUDI-7888 > URL: https://issues.apache.org/jira/browse/HUDI-7888 > Project: Apache Hudi > Issue Type: Sub-task > Components: reader-core >Reporter: sivabalan narayanan >Assignee: Jonathan Vexler >Priority: Major > Fix For: 1.0.0 > > > If 0.16.x reader is used to read 1.x table having partial updates/merges > enabled, we need to throw meaningful error to end user. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7888) Throw meaningful error when reading partial update or DV written in 1.x from 0.16.0 reader
[ https://issues.apache.org/jira/browse/HUDI-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7888: -- Description: If 0.16.x reader is used to read 1.x table having partial updates/merges enabled, we need to throw meaningful error to end user. was: We wanted to support reading 1.x tables in 0.16.0 reader. If 1.x table does not have any new features enabled which are backwards incompatible we are good. If not, if someone has enabled partial update feature or deletion vector support, we should parse and throw a meaningful error from 0.16.0 reader. Lets also comb for any other additional features in 1.x and throw meaningful error. > Throw meaningful error when reading partial update or DV written in 1.x from > 0.16.0 reader > -- > > Key: HUDI-7888 > URL: https://issues.apache.org/jira/browse/HUDI-7888 > Project: Apache Hudi > Issue Type: Sub-task > Components: reader-core >Reporter: sivabalan narayanan >Assignee: Jonathan Vexler >Priority: Major > > If 0.16.x reader is used to read 1.x table having partial updates/merges > enabled, we need to throw meaningful error to end user. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7972) Add fallback for deletion vector in 0.16.x reader while reading 1.x tables
[ https://issues.apache.org/jira/browse/HUDI-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7972: -- Parent: HUDI-7882 Issue Type: Sub-task (was: Improvement) > Add fallback for deletion vector in 0.16.x reader while reading 1.x tables > -- > > Key: HUDI-7972 > URL: https://issues.apache.org/jira/browse/HUDI-7972 > Project: Apache Hudi > Issue Type: Sub-task > Components: reader-core >Reporter: sivabalan narayanan >Priority: Major > Labels: 1.0-migration > Fix For: 1.0.0 > > > If 0.16.x reader is used to read a 1.x table with deletion vector, we should > fallback to using key based merges instead of position based merges. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7972) Add fallback for deletion vector in 0.16.x reader while reading 1.x tables
sivabalan narayanan created HUDI-7972: - Summary: Add fallback for deletion vector in 0.16.x reader while reading 1.x tables Key: HUDI-7972 URL: https://issues.apache.org/jira/browse/HUDI-7972 Project: Apache Hudi Issue Type: Improvement Components: reader-core Reporter: sivabalan narayanan If 0.16.x reader is used to read a 1.x table with deletion vector, we should fallback to using key based merges instead of position based merges. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7886) Make metadata payload from 1.x readable in 0.16.0
[ https://issues.apache.org/jira/browse/HUDI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-7886: Assignee: Lokesh Jain (was: Balaji Varadarajan) > Make metadata payload from 1.x readable in 0.16.0 > - > > Key: HUDI-7886 > URL: https://issues.apache.org/jira/browse/HUDI-7886 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Reporter: sivabalan narayanan >Assignee: Lokesh Jain >Priority: Major > > We wanted to support reading 1.x tables in 0.16.0 reader. > > So, lets port over all metadata payload schema changes to 0.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7865) Pull table properties changes in bridge release
[ https://issues.apache.org/jira/browse/HUDI-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7865: -- Parent: HUDI-7882 Issue Type: Sub-task (was: Task) > Pull table properties changes in bridge release > --- > > Key: HUDI-7865 > URL: https://issues.apache.org/jira/browse/HUDI-7865 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Assignee: Balaji Varadarajan >Priority: Major > Fix For: 0.16.0, 1.0.0 > > > In 1.0.0, we changed some table properties to have nums as value instead of > classnames and then added infer functions. The scope of this task is to > ensure that bridge release should be able to read hoodie.properties written > by 1.0.0. > a. Payload enum change reference - > [https://github.com/apache/hudi/pull/9590/files] > b. hoodie.record.merge.mode : ref links : #9894, #11439. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7888) Throw meaningful error when reading partial update or DV written in 1.x from 0.16.0 reader
[ https://issues.apache.org/jira/browse/HUDI-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7888: -- Epic Link: (was: HUDI-7856) > Throw meaningful error when reading partial update or DV written in 1.x from > 0.16.0 reader > -- > > Key: HUDI-7888 > URL: https://issues.apache.org/jira/browse/HUDI-7888 > Project: Apache Hudi > Issue Type: Sub-task > Components: reader-core >Reporter: sivabalan narayanan >Assignee: Jonathan Vexler >Priority: Major > > We wanted to support reading 1.x tables in 0.16.0 reader. > > If 1.x table does not have any new features enabled which are backwards > incompatible we are good. If not, if someone has enabled partial update > feature or deletion vector support, we should parse and throw a meaningful > error from 0.16.0 reader. Lets also comb for any other additional features in > 1.x and throw meaningful error. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7865) Pull table properties changes in bridge release
[ https://issues.apache.org/jira/browse/HUDI-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7865: -- Epic Link: (was: HUDI-7856) > Pull table properties changes in bridge release > --- > > Key: HUDI-7865 > URL: https://issues.apache.org/jira/browse/HUDI-7865 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Assignee: Balaji Varadarajan >Priority: Major > Fix For: 0.16.0, 1.0.0 > > > In 1.0.0, we changed some table properties to have nums as value instead of > classnames and then added infer functions. The scope of this task is to > ensure that bridge release should be able to read hoodie.properties written > by 1.0.0. > a. Payload enum change reference - > [https://github.com/apache/hudi/pull/9590/files] > b. hoodie.record.merge.mode : ref links : #9894, #11439. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7888) Throw meaningful error when reading partial update or DV written in 1.x from 0.16.0 reader
[ https://issues.apache.org/jira/browse/HUDI-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7888: -- Parent: HUDI-7882 Issue Type: Sub-task (was: Improvement) > Throw meaningful error when reading partial update or DV written in 1.x from > 0.16.0 reader > -- > > Key: HUDI-7888 > URL: https://issues.apache.org/jira/browse/HUDI-7888 > Project: Apache Hudi > Issue Type: Sub-task > Components: reader-core >Reporter: sivabalan narayanan >Assignee: Jonathan Vexler >Priority: Major > > We wanted to support reading 1.x tables in 0.16.0 reader. > > If 1.x table does not have any new features enabled which are backwards > incompatible we are good. If not, if someone has enabled partial update > feature or deletion vector support, we should parse and throw a meaningful > error from 0.16.0 reader. Lets also comb for any other additional features in > 1.x and throw meaningful error. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7972) Add fallback for deletion vector in 0.16.x reader while reading 1.x tables
[ https://issues.apache.org/jira/browse/HUDI-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7972: -- Fix Version/s: 1.0.0 > Add fallback for deletion vector in 0.16.x reader while reading 1.x tables > -- > > Key: HUDI-7972 > URL: https://issues.apache.org/jira/browse/HUDI-7972 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: sivabalan narayanan >Priority: Major > Labels: 1.0-migration > Fix For: 1.0.0 > > > If 0.16.x reader is used to read a 1.x table with deletion vector, we should > fallback to using key based merges instead of position based merges. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7972) Add fallback for deletion vector in 0.16.x reader while reading 1.x tables
[ https://issues.apache.org/jira/browse/HUDI-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7972: -- Labels: 1.0-migration (was: ) > Add fallback for deletion vector in 0.16.x reader while reading 1.x tables > -- > > Key: HUDI-7972 > URL: https://issues.apache.org/jira/browse/HUDI-7972 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: sivabalan narayanan >Priority: Major > Labels: 1.0-migration > > If 0.16.x reader is used to read a 1.x table with deletion vector, we should > fallback to using key based merges instead of position based merges. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7886) Make metadata payload from 1.x readable in 0.16.0
[ https://issues.apache.org/jira/browse/HUDI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-7886: Assignee: Balaji Varadarajan (was: Lokesh Jain) > Make metadata payload from 1.x readable in 0.16.0 > - > > Key: HUDI-7886 > URL: https://issues.apache.org/jira/browse/HUDI-7886 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Reporter: sivabalan narayanan >Assignee: Balaji Varadarajan >Priority: Major > > We wanted to support reading 1.x tables in 0.16.0 reader. > > So, lets port over all metadata payload schema changes to 0.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7887) Any log format header types changes need to be ported to 0.16.0 from 1.x
[ https://issues.apache.org/jira/browse/HUDI-7887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7887: -- Epic Link: (was: HUDI-7856) > Any log format header types changes need to be ported to 0.16.0 from 1.x > > > Key: HUDI-7887 > URL: https://issues.apache.org/jira/browse/HUDI-7887 > Project: Apache Hudi > Issue Type: Sub-task > Components: reader-core >Reporter: sivabalan narayanan >Assignee: Jonathan Vexler >Priority: Major > > We wanted to support reading 1.x tables in 0.16.0 reader. > > Port any new log header metadata types introduced in 1.x to 0.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7887) Any log format header types changes need to be ported to 0.16.0 from 1.x
[ https://issues.apache.org/jira/browse/HUDI-7887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7887: -- Parent: HUDI-7882 Issue Type: Sub-task (was: Improvement) > Any log format header types changes need to be ported to 0.16.0 from 1.x > > > Key: HUDI-7887 > URL: https://issues.apache.org/jira/browse/HUDI-7887 > Project: Apache Hudi > Issue Type: Sub-task > Components: reader-core >Reporter: sivabalan narayanan >Assignee: Jonathan Vexler >Priority: Major > > We wanted to support reading 1.x tables in 0.16.0 reader. > > Port any new log header metadata types introduced in 1.x to 0.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7886) Make metadata payload from 1.x readable in 0.16.0
[ https://issues.apache.org/jira/browse/HUDI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7886: -- Parent: HUDI-7882 Issue Type: Sub-task (was: Improvement) > Make metadata payload from 1.x readable in 0.16.0 > - > > Key: HUDI-7886 > URL: https://issues.apache.org/jira/browse/HUDI-7886 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Reporter: sivabalan narayanan >Assignee: Lokesh Jain >Priority: Major > > We wanted to support reading 1.x tables in 0.16.0 reader. > > So, lets port over all metadata payload schema changes to 0.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7886) Make metadata payload from 1.x readable in 0.16.0
[ https://issues.apache.org/jira/browse/HUDI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7886: -- Epic Link: (was: HUDI-7856) > Make metadata payload from 1.x readable in 0.16.0 > - > > Key: HUDI-7886 > URL: https://issues.apache.org/jira/browse/HUDI-7886 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Reporter: sivabalan narayanan >Assignee: Lokesh Jain >Priority: Major > > We wanted to support reading 1.x tables in 0.16.0 reader. > > So, lets port over all metadata payload schema changes to 0.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7971) Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader
[ https://issues.apache.org/jira/browse/HUDI-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7971: -- Epic Link: (was: HUDI-7856) > Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader > - > > Key: HUDI-7971 > URL: https://issues.apache.org/jira/browse/HUDI-7971 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: sivabalan narayanan >Priority: Major > Fix For: 1.0.0 > > > Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x > tables > > Readers : 1.x > # Spark SQL > # Spark Datasource > # Trino/Presto > # Hive > # Flink > Writer: 0.16 > Table State: > * COW > * Pending clustering > * Completed Clustering > * Failed writes with no rollbacks > * Insert overwrite table/partition > * Savepoint for Time-travel query > * MOR > * Same as COW > * Pending and completed async compaction (with log-files and no base file) > * Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload) > * Rollback formats - DELETE, rollback block > Other knobs: > # Metadata enabled/disabled > # Column Stats enabled/disabled and data-skipping enabled/disabled > # RLI enabled with eq/IN queries > # Non-Partitioned dataset > # CDC Reads > # Incremental Reads > # Time-travel query > > What to test ? > # Query Results Correctness > # Performance : See the benefit of > # Partition Pruning > # Metadata table - col stats, RLI, > > Corner Case Testing: > > # Schema Evolution with different file-groups having different generation of > schema > # Dynamic Partition Pruning > # Does Column Projection work correctly for log files reading -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7971) Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader
[ https://issues.apache.org/jira/browse/HUDI-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7971: -- Parent: HUDI-7882 Issue Type: Sub-task (was: Test) > Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader > - > > Key: HUDI-7971 > URL: https://issues.apache.org/jira/browse/HUDI-7971 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: sivabalan narayanan >Priority: Major > Fix For: 1.0.0 > > > Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x > tables > > Readers : 1.x > # Spark SQL > # Spark Datasource > # Trino/Presto > # Hive > # Flink > Writer: 0.16 > Table State: > * COW > * Pending clustering > * Completed Clustering > * Failed writes with no rollbacks > * Insert overwrite table/partition > * Savepoint for Time-travel query > * MOR > * Same as COW > * Pending and completed async compaction (with log-files and no base file) > * Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload) > * Rollback formats - DELETE, rollback block > Other knobs: > # Metadata enabled/disabled > # Column Stats enabled/disabled and data-skipping enabled/disabled > # RLI enabled with eq/IN queries > # Non-Partitioned dataset > # CDC Reads > # Incremental Reads > # Time-travel query > > What to test ? > # Query Results Correctness > # Performance : See the benefit of > # Partition Pruning > # Metadata table - col stats, RLI, > > Corner Case Testing: > > # Schema Evolution with different file-groups having different generation of > schema > # Dynamic Partition Pruning > # Does Column Projection work correctly for log files reading -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7971) Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader
sivabalan narayanan created HUDI-7971: - Summary: Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader Key: HUDI-7971 URL: https://issues.apache.org/jira/browse/HUDI-7971 Project: Apache Hudi Issue Type: Test Reporter: sivabalan narayanan Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x tables Readers : 1.x # Spark SQL # Spark Datasource # Trino/Presto # Hive # Flink Writer: 0.16 Table State: * COW * Pending clustering * Completed Clustering * Failed writes with no rollbacks * Insert overwrite table/partition * Savepoint for Time-travel query * MOR * Same as COW * Pending and completed async compaction (with log-files and no base file) * Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload) * Rollback formats - DELETE, rollback block Other knobs: # Metadata enabled/disabled # Column Stats enabled/disabled and data-skipping enabled/disabled # RLI enabled with eq/IN queries # Non-Partitioned dataset # CDC Reads # Incremental Reads # Time-travel query What to test ? # Query Results Correctness # Performance : See the benefit of # Partition Pruning # Metadata table - col stats, RLI, Corner Case Testing: # Schema Evolution with different file-groups having different generation of schema # Dynamic Partition Pruning # Does Column Projection work correctly for log files reading -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7971) Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader
[ https://issues.apache.org/jira/browse/HUDI-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7971: -- Fix Version/s: 1.0.0 > Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader > - > > Key: HUDI-7971 > URL: https://issues.apache.org/jira/browse/HUDI-7971 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Priority: Major > Fix For: 1.0.0 > > > Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x > tables > > Readers : 1.x > # Spark SQL > # Spark Datasource > # Trino/Presto > # Hive > # Flink > Writer: 0.16 > Table State: > * COW > * Pending clustering > * Completed Clustering > * Failed writes with no rollbacks > * Insert overwrite table/partition > * Savepoint for Time-travel query > * MOR > * Same as COW > * Pending and completed async compaction (with log-files and no base file) > * Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload) > * Rollback formats - DELETE, rollback block > Other knobs: > # Metadata enabled/disabled > # Column Stats enabled/disabled and data-skipping enabled/disabled > # RLI enabled with eq/IN queries > # Non-Partitioned dataset > # CDC Reads > # Incremental Reads > # Time-travel query > > What to test ? > # Query Results Correctness > # Performance : See the benefit of > # Partition Pruning > # Metadata table - col stats, RLI, > > Corner Case Testing: > > # Schema Evolution with different file-groups having different generation of > schema > # Dynamic Partition Pruning > # Does Column Projection work correctly for log files reading -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-6510] Support compilation on Java 17 [hudi]
CTTY opened a new pull request, #11604: URL: https://github.com/apache/hudi/pull/11604 ### Change Logs Make Hudi compilable with Java 17 ### Impact No public facing-API changes ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]
hudi-bot commented on PR #11573: URL: https://github.com/apache/hudi/pull/11573#issuecomment-2219027152 ## CI report: * 9ff2b8fef206bdb1a4e2b3dd61b6e4417db5e41f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24781) * 03fb589a59c05f9c6f5c4ee99c934bb0b67de617 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24795) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-2955] Support Hadoop3 [hudi]
hudi-bot commented on PR #11572: URL: https://github.com/apache/hudi/pull/11572#issuecomment-2219027062 ## CI report: * 2639f581f20ab0b8fddf22d0fcfeb54f164ec346 UNKNOWN * 830fa27e599f91574af60b01837586a1d3f5764a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24794) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi-rs) branch main updated: docs: update CONTRIBUTING with minor changes (#58)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/hudi-rs.git The following commit(s) were added to refs/heads/main by this push: new 78a558f docs: update CONTRIBUTING with minor changes (#58) 78a558f is described below commit 78a558f00c8a6c4556db5ee98f26369fd90fabcf Author: Sagar Sumit AuthorDate: Wed Jul 10 05:10:56 2024 +0530 docs: update CONTRIBUTING with minor changes (#58) Corrected typos and linked to source files for clarity. - Co-authored-by: Shiyan Xu <2701446+xushi...@users.noreply.github.com> --- CONTRIBUTING.md | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0698faa..5a451d6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -25,25 +25,32 @@ platform. This guide will walk you through the process of making your first cont ## File an issue Testing and reporting bugs are also valueable contributions. Please follow -the [issue template](https://github.com/apache/hudi-rs/issues/new?template=bug_report.md) to file bug reports. +the [issue template](https://github.com/apache/hudi-rs/issues/new?template=bug_report.yml) to file bug reports. ## Prepare for development - Install Rust, e.g. as described [here](https://doc.rust-lang.org/cargo/getting-started/installation.html) -- Have a compatible Python version installed (check `python/pyproject.toml` for current requirement) +- Have a compatible Python version installed (check [`python/pyproject.toml`](./python/pyproject.toml) for current + requirement) ## Commonly used dev commands -For most of the time, use dev commands specified in `python/Makefile`, it applies to both Python and Rust modules. You -don't need to -CD to the root directory and run `cargo` commands. +For most of the time, use dev commands specified in [`python/Makefile`](./python/Makefile), it applies to both Python +and Rust modules. You don't need to `cd` to the root directory and run `cargo` commands. To setup python virtual env, run ```shell -make setup-env +make setup-venv ``` +> [!NOTE] +> This will run `python` command to setup the virtual environment. You can either change that to `python3.X`, +> or simply alias `python` to your local `python3.X` installation, for example: +> ```shell +> echo "alias python=/Library/Frameworks/Python.framework/Versions/3.12/bin/python3" >> ~/.zshrc` +> ``` + Once activate virtual env, build the project for development by ```shell
Re: [PR] docs: update CONTRIBUTING with minor changes [hudi-rs]
xushiyan merged PR #58: URL: https://github.com/apache/hudi-rs/pull/58 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]
hudi-bot commented on PR #11573: URL: https://github.com/apache/hudi/pull/11573#issuecomment-2219004260 ## CI report: * 9ff2b8fef206bdb1a4e2b3dd61b6e4417db5e41f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24781) * 03fb589a59c05f9c6f5c4ee99c934bb0b67de617 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-2955] Support Hadoop3 [hudi]
hudi-bot commented on PR #11572: URL: https://github.com/apache/hudi/pull/11572#issuecomment-2219004107 ## CI report: * 2639f581f20ab0b8fddf22d0fcfeb54f164ec346 UNKNOWN * f714012ecb37f584c7bd6d6656b93096f7f1cc10 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24748) * 830fa27e599f91574af60b01837586a1d3f5764a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7929] fix file name in k8s example [hudi]
hudi-bot commented on PR #11603: URL: https://github.com/apache/hudi/pull/11603#issuecomment-2218981621 ## CI report: * a275778fb2062747f9b4ada9344e6a8d26d8b438 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24792) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]
hudi-bot commented on PR #11539: URL: https://github.com/apache/hudi/pull/11539#issuecomment-2218980767 ## CI report: * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN * 13e49581c971458be6c84c60f69aa595bb7f73fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24793) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7882) Umbrella ticket for 1.x tables and 0.16.x compatibility
[ https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7882: -- Description: We have 4 major goals w/ this umbrella ticket. a. 1.x reader should be capable of reading any of 0.14.x to 0.16.x tables for all query types. b. 0.16.x should be capable of reading 1.x tables for most features c. Upgrade 0.16.x to 1.x d. Downgrade 1.x to 0.16.0. We wanted to support reading 1.x tables in 0.16.0 release. So, creating this umbrella ticket to track all of them. RFC in progress: [https://github.com/apache/hudi/pull/11514] Changes required to be ported: 0. Creating 0.16.0 branch 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 1. Timeline 1.a Hoodie instant parsing should be able to read 1.x instants. https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 1.b Commit metadata parsing is able to handle both json and avro formats. Scope might be non-trivial. https://issues.apache.org/jira/browse/HUDI-7866 Siva. 1.c HoodieDefaultTimeline able to read both timelines based on table version. https://issues.apache.org/jira/browse/HUDI-7884 Siva. 1.d Reading LSM timeline using 0.16.0 https://issues.apache.org/jira/browse/HUDI-7890 Siva. 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901 2. Table property changes 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885 https://issues.apache.org/jira/browse/HUDI-7865 LJ 3. MDT table changes 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ 3.b MDT payload schema changes. https://issues.apache.org/jira/browse/HUDI-7886 LJ 4. Log format changes 4.a All metadata header types porting https://issues.apache.org/jira/browse/HUDI-7887 Jon 4.b Meaningful error for incompatible features from 1.x https://issues.apache.org/jira/browse/HUDI-7888 Jon 5. Log file slice or grouping detection compatibility 5. Tests 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 6 Doc changes 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. https://issues.apache.org/jira/browse/HUDI-7889 was: We wanted to support reading 1.x tables in 0.16.0 release. So, creating this umbrella ticket to track all of them. RFC in progress: [https://github.com/apache/hudi/pull/11514] Changes required to be ported: 0. Creating 0.16.0 branch 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 1. Timeline 1.a Hoodie instant parsing should be able to read 1.x instants. https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 1.b Commit metadata parsing is able to handle both json and avro formats. Scope might be non-trivial. https://issues.apache.org/jira/browse/HUDI-7866 Siva. 1.c HoodieDefaultTimeline able to read both timelines based on table version. https://issues.apache.org/jira/browse/HUDI-7884 Siva. 1.d Reading LSM timeline using 0.16.0 https://issues.apache.org/jira/browse/HUDI-7890 Siva. 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901 2. Table property changes 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885 https://issues.apache.org/jira/browse/HUDI-7865 LJ 3. MDT table changes 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ 3.b MDT payload schema changes. https://issues.apache.org/jira/browse/HUDI-7886 LJ 4. Log format changes 4.a All metadata header types porting https://issues.apache.org/jira/browse/HUDI-7887 Jon 4.b Meaningful error for incompatible features from 1.x https://issues.apache.org/jira/browse/HUDI-7888 Jon 5. Log file slice or grouping detection compatibility 5. Tests 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 6 Doc changes 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. https://issues.apache.org/jira/browse/HUDI-7889 > Umbrella ticket for 1.x tables and 0.16.x compatibility > --- > > Key: HUDI-7882 > URL: https://issues.apache.org/jira/browse/HUDI-7882 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > We have 4 major goals w/ this umbrella ticket. > a. 1.x reader should be capable of reading any of 0.14.x to 0.16.x tables for > all query types. > b. 0.16.x should be capable of reading 1.x tables for most features > c. Upgrade 0.16.x to 1.x > d. Downgrade 1.x to 0.16.0. > > > We wanted to support re
[jira] [Updated] (HUDI-7882) Umbrella ticket for 1.x tables and 0.16.x compatibility
[ https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7882: -- Summary: Umbrella ticket for 1.x tables and 0.16.x compatibility (was: Umbrella ticket 1.x tables and 0.16.x compatibility) > Umbrella ticket for 1.x tables and 0.16.x compatibility > --- > > Key: HUDI-7882 > URL: https://issues.apache.org/jira/browse/HUDI-7882 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > We wanted to support reading 1.x tables in 0.16.0 release. So, creating this > umbrella ticket to track all of them. > > RFC in progress: [https://github.com/apache/hudi/pull/11514] > > Changes required to be ported: > 0. Creating 0.16.0 branch > 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. > > 1. Timeline > 1.a Hoodie instant parsing should be able to read 1.x instants. > https://issues.apache.org/jira/browse/HUDI-7883 Sagar. > 1.b Commit metadata parsing is able to handle both json and avro formats. > Scope might be non-trivial. https://issues.apache.org/jira/browse/HUDI-7866 > Siva. > 1.c HoodieDefaultTimeline able to read both timelines based on table version. > https://issues.apache.org/jira/browse/HUDI-7884 Siva. > 1.d Reading LSM timeline using 0.16.0 > https://issues.apache.org/jira/browse/HUDI-7890 Siva. > 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901 > > 2. Table property changes > 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885 > https://issues.apache.org/jira/browse/HUDI-7865 LJ > > 3. MDT table changes > 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ > 3.b MDT payload schema changes. > https://issues.apache.org/jira/browse/HUDI-7886 LJ > > 4. Log format changes > 4.a All metadata header types porting > https://issues.apache.org/jira/browse/HUDI-7887 Jon > 4.b Meaningful error for incompatible features from 1.x > https://issues.apache.org/jira/browse/HUDI-7888 Jon > > 5. Log file slice or grouping detection compatibility > > 5. Tests > 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 > https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. > > 6 Doc changes > 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. > https://issues.apache.org/jira/browse/HUDI-7889 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7882) Umbrella ticket 1.x tables and 0.16.x compatibility
[ https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7882: -- Summary: Umbrella ticket 1.x tables and 0.16.x compatibility (was: Umbrella ticket to track all changes required to support reading 1.x tables with 0.16.0 ) > Umbrella ticket 1.x tables and 0.16.x compatibility > --- > > Key: HUDI-7882 > URL: https://issues.apache.org/jira/browse/HUDI-7882 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > We wanted to support reading 1.x tables in 0.16.0 release. So, creating this > umbrella ticket to track all of them. > > RFC in progress: [https://github.com/apache/hudi/pull/11514] > > Changes required to be ported: > 0. Creating 0.16.0 branch > 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. > > 1. Timeline > 1.a Hoodie instant parsing should be able to read 1.x instants. > https://issues.apache.org/jira/browse/HUDI-7883 Sagar. > 1.b Commit metadata parsing is able to handle both json and avro formats. > Scope might be non-trivial. https://issues.apache.org/jira/browse/HUDI-7866 > Siva. > 1.c HoodieDefaultTimeline able to read both timelines based on table version. > https://issues.apache.org/jira/browse/HUDI-7884 Siva. > 1.d Reading LSM timeline using 0.16.0 > https://issues.apache.org/jira/browse/HUDI-7890 Siva. > 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901 > > 2. Table property changes > 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885 > https://issues.apache.org/jira/browse/HUDI-7865 LJ > > 3. MDT table changes > 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ > 3.b MDT payload schema changes. > https://issues.apache.org/jira/browse/HUDI-7886 LJ > > 4. Log format changes > 4.a All metadata header types porting > https://issues.apache.org/jira/browse/HUDI-7887 Jon > 4.b Meaningful error for incompatible features from 1.x > https://issues.apache.org/jira/browse/HUDI-7888 Jon > > 5. Log file slice or grouping detection compatibility > > 5. Tests > 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 > https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. > > 6 Doc changes > 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. > https://issues.apache.org/jira/browse/HUDI-7889 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7865) Pull table properties changes in bridge release
[ https://issues.apache.org/jira/browse/HUDI-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-7865: - Assignee: Balaji Varadarajan (was: Lokesh Jain) > Pull table properties changes in bridge release > --- > > Key: HUDI-7865 > URL: https://issues.apache.org/jira/browse/HUDI-7865 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Balaji Varadarajan >Priority: Major > Fix For: 0.16.0, 1.0.0 > > > In 1.0.0, we changed some table properties to have nums as value instead of > classnames and then added infer functions. The scope of this task is to > ensure that bridge release should be able to read hoodie.properties written > by 1.0.0. > a. Payload enum change reference - > [https://github.com/apache/hudi/pull/9590/files] > b. hoodie.record.merge.mode : ref links : #9894, #11439. > -- This message was sent by Atlassian Jira (v8.20.10#820010)