Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687133956 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687119981 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687116969 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687110917 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687110393 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687107980 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687094555 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. Review Comment: Above I have given justification for need for a bridge release. here, I am listing what are the deliverables from this RFC. not sure if we can combine them. I have taken a stab -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@h
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687091052 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687089711 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687081729 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687080897 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687080177 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1681068587 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. Review Comment: hey @danny0405 : not sure what do you intend by this comment. here, I am mainly focussing a happy path migration just to give a glimpse of what a typical migration looks like. I guess current content L46 to 67 conveys that. I feel, we don't need to discuss the downgrade scenario here. let me know what do you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
vinothchandar commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1680074674 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new fea
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1676778486 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1676771229 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1676770629 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1676770241 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new featur
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1675307482 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new feature
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1675305938 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new feature
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1675303077 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new feature
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1675301492 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new feature
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1675296125 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that are being introduced in 1.x, we may not be able to support all of them. Will be calling out which new feature
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1675266281 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,339 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. Review Comment: like you mentioned, there might be some issues after upgrade, so the user wants to downgrade the 1.x table back to 0.x, and we got a scenario where the old reader needs to read the new files(or just force restore to where the old table commits ends)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671424828 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671419636 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671419636 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671414686 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1671383928 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,301 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. + But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1669278251 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. Review Comment: accommodated your suggestions. ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. Review Comment: I have covered it under section titled "1.0 ➝ 0.16.0 downgrade" in this RFC below ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1665124619 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). + +### Log format changes: +- We have added new header types in 1.x. (IS_PARTIAL) + +## Changes to be ported over 0.16.x to support reading 1.x tables +### What will be supported +- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader should be able to provide consistent reads w/o any breakage. +### What will not be supported +- A 0.16 writer cannot write to a table that has been upgraded-to/created usin
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1665122005 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. Review Comment: We do not introduce the completion time based inc queries for Spark yet, but for the GA release, we might need to have a compatible solution for migrattion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1665122268 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. Review Comment: +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
danny0405 commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1665121084 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. Review Comment: Might not be relared, but should `hoodie.record.merge.mode` should be a table config instead of a write config? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on PR #11514: URL: https://github.com/apache/hudi/pull/11514#issuecomment-2206976055 here is a glimpse of changes I had to make to 0.x timeline to support 1.x table reads https://github.com/apache/hudi/pull/11562 this is just a draft/hacky PR, just incase you wanna take a peek. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1664474919 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). + +### Log format changes: +- We have added new header types in 1.x. (IS_PARTIAL) + +## Changes to be ported over 0.16.x to support reading 1.x tables +### What will be supported +- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader should be able to provide consistent reads w/o any breakage. +### What will not be supported Review Comment: sure. -- This is an automated message from the Apache
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1664463903 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). + +### Log format changes: +- We have added new header types in 1.x. (IS_PARTIAL) + +## Changes to be ported over 0.16.x to support reading 1.x tables +### What will be supported +- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader should be able to provide consistent reads w/o any breakage. +### What will not be supported +- A 0.16 writer cannot write to a table that has been upgraded-to/created usi
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1664459466 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. Review Comment: updated the details -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
vinothchandar commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663256899 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. Review Comment: ```suggestion - Document steps for rolling upgrade from 0.16.x to 1.x , with minimal downtime ``` ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. Review Comment: may be good to document what works. but best to get everyone to 0.16 or those users can choose to take a downtime and do it directly? ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + -
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663046076 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). + +### Log format changes: +- We have added new header types in 1.x. (IS_PARTIAL) + +## Changes to be ported over 0.16.x to support reading 1.x tables +### What will be supported +- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader should be able to provide consistent reads w/o any breakage. +### What will not be supported +- A 0.16 writer cannot write to a table that has been upgraded-to/created usi
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663043878 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. Review Comment: good point. we might need to solve this elegantly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663038794 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). + +### Log format changes: +- We have added new header types in 1.x. (IS_PARTIAL) + +## Changes to be ported over 0.16.x to support reading 1.x tables +### What will be supported +- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader should be able to provide consistent reads w/o any breakage. +### What will not be supported +- A 0.16 writer cannot write to a table that has been upgraded-to/created usi
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663037956 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). Review Comment: or anyways, we are adding filterUncommittedLogs capability in FSV to read 1.x reader. So, we should be good. but lets add tests for these. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, pleas
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663036653 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). Review Comment: during 1.x upgrade, we need to ensure we do not do new way of rolling back log files for a failed write. bcoz, there could be a concurrent reader in 0.16.x reading the table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663035029 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. Review Comment: sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663032167 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x Review Comment: are you talking about 1.x reader or 0.16.x reader. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663030969 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. Review Comment: if its custom, from where custom configs are picked up. its a table prop in 0.x, but in 1.x? lets validate the flows -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663024868 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). + +### Log format changes: +- We have added new header types in 1.x. (IS_PARTIAL) + +## Changes to be ported over 0.16.x to support reading 1.x tables +### What will be supported +- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader should be able to provide consistent reads w/o any breakage. +### What will not be supported +- A 0.16 writer cannot write to a table that has been upgraded-to/created usi
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663022140 ## rfc/rfc-78/rfc-78.md: ## @@ -0,0 +1,220 @@ + +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. We released beta releases which was meant for +enthusiastic developers/users to give a try of advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +- 1.x reader should be able to read 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. In this case, we explicitly request users to not turn on these features till readers are completely in 1.x. +- Document upgrade steps from 0.16.x to 1.x with limited user perceived latency. This will be auto upgrade, but document clearly what needs to be done. +- Downgrade from 1.x to 0.16.x documented with call outs on any functionality. + +### Considerations when choosing Migration strategy +- While migration is happening, we want to allow readers to continue reading data. This means, we cannot employ a stop-the-world strategy when we are migrating. +All the actions that we are performing as part of table upgrade should not have any side-effects of breaking snapshot isolation for readers. +- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do not want to add read support for very old versions of hudi in 1.x(for eg 0.7.0). +- So, in an effort to bring everyone to latest hudi versions, 1.x reader will have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader may not have full reader support. +The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and then slowly start upgrading to 1.x(readers followed by writers). + +Before we dive in further, lets understand the format changes: + +## Format changes +### Table properties +- Payload class ➝ payload type. +- New metadata partitions could be added (optionally enabled) + +### MDT changes +- New MDT partitions are available in 1.x. MDT schema upgraded. +- RLI schema is upgraded to hold row position + +### Timeline: +- [storage changes] Completed write commits have completed times in the file name. +- [storage changes] Completed and inflight write commits are in avro format which were json in 0.x. +- We are switching the action type for clustering from “replace commit” to “cluster”. +- Similarly, for completed compaction, we are switching from “commit” to “compaction” in an effort to standardize actions for a given write operation. +- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 1.x +- [In-memory changes] HoodieInstant changes due to presence of completion time for completed HoodieInstants. + +### Filegroup/FileSlice changes: +- Log files contain delta commit time instead of base instant time. +- Log appends are disabled in 1.x. In other words, each log block is already appended to a new log file. +- File Slice determination logic for log files changed (in 0.x, we have base instant time in log files and its straight forward. In 1.x, we find completion time for a log file and find the base instant time (parsed from base files) which has the highest value lesser than the completion time of the log file). +- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l log file versions ➝ write token) to order diff log files. in 1.x, we will be using completion time to order). + +### Log format changes: +- We have added new header types in 1.x. (IS_PARTIAL) + +## Changes to be ported over 0.16.x to support reading 1.x tables +### What will be supported +- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader should be able to provide consistent reads w/o any breakage. +### What will not be supported +- A 0.16 writer cannot write to a table that has been upgraded-to/created usi
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663014814 ## rfc/rfc-78/rfc-78.md: ## @@ -191,15 +198,15 @@ We need to add back these older methods to HoodieDefaultTimeline, so that we do - e. We need to port code changes which accounts for uncommitted log files. In 0.16.0, from FSV standpoint, all log files(including partially failed) are valid. We let the log record reader ignore the partially failed log files. But in 1.x, log files could be rolledback (deleted) by a concurrent rollback. So, the FSV should ensure it ignores the uncommitted log files. - f. Looks like we only have to make changes/appends to few methods in HoodieDefaultTimeline. But one option to potentially consider (if we see us making lot of changes to 0.16.0 HoodieDefaultTimeline in order to support reading 1.x tables), we could introduce Hoodie016xDefaultTimeline and Hoodie1xDefaultTimeline and use delegate pattern to delegate to either of the timelines. Using hoodie table version we could instantiate (internally to HoodieDefaultTimeline) to either of Hoodie016xDefaultTimeline or Hoodie1xDefaultTimeline. But for now, we don’t feel we might need to take this route. Just calling it out as an option depending on the changes we had to make. +- g. Since log file ordering logic will differ from 0.16.x and 1.x, and we have a table upgrade commit time, we could leverage that to use diff log file ordering logic based on whether a file slice's base instant time is less or greater than table upgrade commit time. ### FileSystemView changes Once all timeline changes are incorporated, we need to account for FSV changes. Major change as called out earlier is the Completion time based log files from 1.x writer and the log file naming referring to delta commit time instead of base commit time. So, w/o any changes to FSV/HoodieFileGroup/HoodieFileSlice code snippets, our file slice deduction logic might be wrong. Each log file could be tagged as its own file slice since each has a different base commit time (thats how 0.16.x HoodieLogFile would deduce it). So, we might have to port over CompletionTimeQueryView class and associated logic to 0.16.0. So, for file slice deduction logic in 0.16.0 will be pretty much similar to 1.x reader. But the log file ordering for log reading purpose, we do not need to maintain parity with 1.x reader as of yet. (unless we make NBCC default with MDT). Assuming 1.x reader and 1.x FSV should be able to read data written in older hudi versions, we also have a potential option here for avoid making nit-picky changes similar to the option called out earlier. We could instantiate two different FSV depending on the table version. If table version is 7 (0.16.0), we could instantiate FSV_V0 may be and if table version is 8 (1.0.0), we could instantiate FSV_V1. So that we don’t break/regress any of 0.16.0 read functionality in the interest of supporting 1.x table reads. We should strive to cover all scenarios and not let any bugs creep in, but trying to see if we can keep the changes isolated so that battle tested code (FSV) is not touched or changed for the purpose of supporting 1.x table reads. If we run into any bugs with 1.x reads, we could ask users to not upgrade any of the writers to 1.x and stick with 0.16.0 unless we have say 1.0.1 or something. But it would be really bad if we break 0.16.0 table read in some edge case. Just calling out as one of the safe option to upgrade. - Pending exploration: -How partially failed log files are ignored in 1.x. I see all log files are accounted for while building FSV. +1. We removed special suffixes to MDT operations in 1x. we need to test the flow and flush out details if anything to be added to 0.16.x reader. Review Comment: understand the new commit time generation logic is foolproof. what incase there is a concurrent ingestion in data table co-incidentally generates the same commit time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663011966 ## rfc/rfc-78/rfc-78.md: ## @@ -179,6 +183,9 @@ Let’s reiterate what we need to support w/ 0.16.0 reader. On a high level, we need to ensure commit metadata in either format (avro or json) need to be supported. And “cluster” and completed “compaction”s need to be readable in 0.16.0 reader. - But the challenging part is, for every commit metadata, we might have to deserialize to avro and on exception try json. We could deduce the format using completion file name, but as per current code layering, deserialization methods does not know the file name( method takes byte[]). - Similarly for clustering commits, unless we have some kind of watermark, we have to keep considering replace commits as well in the FSV building logic to ensure we do not miss any clustering commits. +- To be decided: We also need to use diff LogFileComparators depending on the file slice's base instant time. If the file slices's base instant time is < table upgrade commit time, we use older log file comparator to order log files. but if file slice's base instant time > table upgrade commit time, we have to use new log file comparator (completion time). Tricky part is if a file slice contains a mix of log files. + This fix definitely needs to go into 1.x, but whether we wanted to port this change to 0.16.x or not is yet to be discussed and decided. Lets zoom in a bit to see what will happen if a single file slice could contain a mix of log files using 1.x reader(this is a basic requirement to support 0.16.x tables in 1.x). Review Comment: which mean, the log file compartor logic from 1.x needs to be ported to 0.16.x reader -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663010943 ## rfc/rfc-78/rfc-78.md: ## @@ -179,6 +183,9 @@ Let’s reiterate what we need to support w/ 0.16.0 reader. On a high level, we need to ensure commit metadata in either format (avro or json) need to be supported. And “cluster” and completed “compaction”s need to be readable in 0.16.0 reader. - But the challenging part is, for every commit metadata, we might have to deserialize to avro and on exception try json. We could deduce the format using completion file name, but as per current code layering, deserialization methods does not know the file name( method takes byte[]). - Similarly for clustering commits, unless we have some kind of watermark, we have to keep considering replace commits as well in the FSV building logic to ensure we do not miss any clustering commits. +- To be decided: We also need to use diff LogFileComparators depending on the file slice's base instant time. If the file slices's base instant time is < table upgrade commit time, we use older log file comparator to order log files. but if file slice's base instant time > table upgrade commit time, we have to use new log file comparator (completion time). Tricky part is if a file slice contains a mix of log files. + This fix definitely needs to go into 1.x, but whether we wanted to port this change to 0.16.x or not is yet to be discussed and decided. Lets zoom in a bit to see what will happen if a single file slice could contain a mix of log files using 1.x reader(this is a basic requirement to support 0.16.x tables in 1.x). Review Comment: we need to fix 1.x reader to enforce completion time based log file ordering for file slice. after the fix, from our understanding, same logic should work for a file slice completely written in 0.x. bcoz, completion time will match for all log files. and then we should use log version to determine the ordering. we need to have lot of tests covering all these scenarios. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1662991098 ## rfc/rfc-78/rfc-78.md: ## @@ -145,7 +147,9 @@ This will be an automatic upgrade for users when they start using 1.x hudi libra - No changes to log reader. - Check custom payload class in table properties and switch to payload type. - Trigger compaction for latest file slices. We do not want a single file slice having a mix of log files from 0.x and log files from 1.x. So, we will trigger a full compaction -of the table to ensure all latest file slices has just the base files. +of the table to ensure all latest file slices has just the base files. + - Lets dissect and see what it needs to support not requiring the full compaction. In general, we plan to add a table config to track the commit time (more on this later in this doc) when the upgrade was done. +So, using the upgrade commit time, we should be able to use different log file comparator to order log files within a given file slice. Review Comment: 0.16.x reader is not going to order log files based on completion time and will only be ordering based on log version even for 1.x tables. which means, even for a single file slice, having a mix of 0.x log files and 1.x log files, we should be good here. file slice determination: HoodieLogFile.getBaseInstantTime() has to work for both log files (0.x log files and 1.x log files). if we ensure this is intact, we should be good. From skimming 1.x master, it should work OOB for 0.x log files. but lets test it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]
nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1662991098 ## rfc/rfc-78/rfc-78.md: ## @@ -145,7 +147,9 @@ This will be an automatic upgrade for users when they start using 1.x hudi libra - No changes to log reader. - Check custom payload class in table properties and switch to payload type. - Trigger compaction for latest file slices. We do not want a single file slice having a mix of log files from 0.x and log files from 1.x. So, we will trigger a full compaction -of the table to ensure all latest file slices has just the base files. +of the table to ensure all latest file slices has just the base files. + - Lets dissect and see what it needs to support not requiring the full compaction. In general, we plan to add a table config to track the commit time (more on this later in this doc) when the upgrade was done. +So, using the upgrade commit time, we should be able to use different log file comparator to order log files within a given file slice. Review Comment: 0.16.x reader is not going to order log files based on completion time and will only be ordering based on log version even for 1.x tables. which means, even for a single file slice, having a mix of 0.x log files and 1.x log files, we should be good here. pending: file slice determination: HoodieLogFile.getBaseInstantTime() has to work for both log files (0.x log files and 1.x log files). if we ensure this is intact, we should be good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org