Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-13 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1676778486


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-13 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1676771229


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-13 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1676770629


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-13 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1676770241


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-11 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1675307482


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-11 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1675305938


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-11 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1675303077


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-11 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1675301492


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-11 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1675296125


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that are being introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-11 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1675266281


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,339 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.

Review Comment:
like you mentioned, there might be some issues after upgrade, so the user 
wants to downgrade the 1.x table back to 0.x, and we got a scenario where the 
old reader needs to read the new files(or just force restore to where the old 
table commits ends)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671424828


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671419636


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671419636


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671414686


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671383928


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-08 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1669278251


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.

Review Comment:
   accommodated your suggestions.



##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.

Review Comment:
   I have covered it under section titled "1.0 ➝ 0.16.0 downgrade" in this RFC 
below



##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-03 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1665124619


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).
+
+### Log format changes:
+- We have added new header types in 1.x. (IS_PARTIAL)
+
+## Changes to be ported over 0.16.x to support reading 1.x tables
+### What will be supported
+- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader 
should be able to provide consistent reads w/o any breakage.
+### What will not be supported
+- A 0.16 writer cannot write to a table that has been upgraded-to/created 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-03 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1665122005


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.

Review Comment:
   We do not introduce the completion time based inc queries for Spark yet, but 
for the GA release, we might need to have a compatible solution for migrattion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-03 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1665122268


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-03 Thread via GitHub


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1665121084


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.

Review Comment:
   Might not be relared, but should `hoodie.record.merge.mode` should be a 
table config instead of a write config?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-03 Thread via GitHub


nsivabalan commented on PR #11514:
URL: https://github.com/apache/hudi/pull/11514#issuecomment-2206976055

   here is a glimpse of changes I had to make to 0.x timeline to support 1.x 
table reads
   https://github.com/apache/hudi/pull/11562 
   this is just a draft/hacky PR, just incase you wanna take a peek. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-03 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1664474919


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).
+
+### Log format changes:
+- We have added new header types in 1.x. (IS_PARTIAL)
+
+## Changes to be ported over 0.16.x to support reading 1.x tables
+### What will be supported
+- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader 
should be able to provide consistent reads w/o any breakage.
+### What will not be supported

Review Comment:
   sure. 



-- 
This is an automated message from the Apache 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-03 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1664463903


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).
+
+### Log format changes:
+- We have added new header types in 1.x. (IS_PARTIAL)
+
+## Changes to be ported over 0.16.x to support reading 1.x tables
+### What will be supported
+- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader 
should be able to provide consistent reads w/o any breakage.
+### What will not be supported
+- A 0.16 writer cannot write to a table that has been upgraded-to/created 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-03 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1664459466


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 

Review Comment:
   updated the details



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


vinothchandar commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663256899


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.

Review Comment:
   ```suggestion
   - Document steps for rolling upgrade from 0.16.x to 1.x , with minimal 
downtime
   ```



##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 

Review Comment:
   may be good to document what works. but best to get everyone to 0.16 or 
those users can choose to take a downtime and do it directly? 



##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663046076


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).
+
+### Log format changes:
+- We have added new header types in 1.x. (IS_PARTIAL)
+
+## Changes to be ported over 0.16.x to support reading 1.x tables
+### What will be supported
+- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader 
should be able to provide consistent reads w/o any breakage.
+### What will not be supported
+- A 0.16 writer cannot write to a table that has been upgraded-to/created 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663043878


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.

Review Comment:
   good point. we might need to solve this elegantly 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663038794


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).
+
+### Log format changes:
+- We have added new header types in 1.x. (IS_PARTIAL)
+
+## Changes to be ported over 0.16.x to support reading 1.x tables
+### What will be supported
+- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader 
should be able to provide consistent reads w/o any breakage.
+### What will not be supported
+- A 0.16 writer cannot write to a table that has been upgraded-to/created 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663037956


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).

Review Comment:
   or anyways, we are adding filterUncommittedLogs capability in FSV to read 
1.x reader. So, we should be good. but lets add tests for these.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663036653


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).

Review Comment:
   during 1.x upgrade, we need to ensure we do not do new way of rolling back 
log files for a failed write. bcoz, there could be a concurrent reader in 
0.16.x reading the table. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663035029


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.

Review Comment:
   sure. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663032167


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x

Review Comment:
   are you talking about 1.x reader or 0.16.x reader. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663030969


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.

Review Comment:
   if its custom, from where custom configs are picked up. its a table prop in 
0.x, but in 1.x? 
   lets validate the flows



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663024868


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).
+
+### Log format changes:
+- We have added new header types in 1.x. (IS_PARTIAL)
+
+## Changes to be ported over 0.16.x to support reading 1.x tables
+### What will be supported
+- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader 
should be able to provide consistent reads w/o any breakage.
+### What will not be supported
+- A 0.16 writer cannot write to a table that has been upgraded-to/created 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663022140


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,220 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.
+
+### Filegroup/FileSlice changes:
+- Log files contain delta commit time instead of base instant time.
+- Log appends are disabled in 1.x. In other words, each log block is already 
appended to a new log file.
+- File Slice determination logic for log files changed (in 0.x, we have base 
instant time in log files and its straight forward. In 1.x, we find completion 
time for a log file and find the base instant time (parsed from base files) 
which has the highest value lesser than the completion time of the log file).
+- Log file ordering within a file slice. (in 0.x, we use base instant time ➝l 
log file versions ➝ write token) to order diff log files. in 1.x, we will be 
using completion time to order).
+
+### Log format changes:
+- We have added new header types in 1.x. (IS_PARTIAL)
+
+## Changes to be ported over 0.16.x to support reading 1.x tables
+### What will be supported
+- For features introduced in 0.x, and tables written in 1.x, 0.16.0 reader 
should be able to provide consistent reads w/o any breakage.
+### What will not be supported
+- A 0.16 writer cannot write to a table that has been upgraded-to/created 

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663014814


##
rfc/rfc-78/rfc-78.md:
##
@@ -191,15 +198,15 @@ We need to add back these older methods to 
HoodieDefaultTimeline, so that we do
 - e. We need to port code changes which accounts for uncommitted log files. In 
0.16.0, from FSV standpoint, all log files(including partially failed) are 
valid. We let the log record reader ignore the partially failed log files. But
   in 1.x, log files could be rolledback (deleted) by a concurrent rollback. 
So, the FSV should ensure it ignores the uncommitted log files.
 - f. Looks like we only have to make changes/appends to few methods in 
HoodieDefaultTimeline. But one option to potentially consider (if we see us 
making lot of changes to 0.16.0 HoodieDefaultTimeline in order to support 
reading 1.x tables), we could introduce Hoodie016xDefaultTimeline and 
Hoodie1xDefaultTimeline and use delegate pattern to delegate to either of the 
timelines. Using hoodie table version we could instantiate (internally to 
HoodieDefaultTimeline) to either of Hoodie016xDefaultTimeline or 
Hoodie1xDefaultTimeline. But for now, we don’t feel we might need to take this 
route. Just calling it out as an option depending on the changes we had to make.
+- g. Since log file ordering logic will differ from 0.16.x and 1.x, and we 
have a table upgrade commit time, we could leverage that to use diff log file 
ordering logic based on whether a file slice's base instant time is less or 
greater than table upgrade commit time. 
 
 ### FileSystemView changes
 Once all timeline changes are incorporated, we need to account for FSV 
changes. Major change as called out earlier is the Completion time based log 
files from 1.x writer and the log file naming referring to delta commit time 
instead of base commit time. So, w/o any changes to 
FSV/HoodieFileGroup/HoodieFileSlice code snippets, our file slice deduction 
logic might be wrong. Each log file could be tagged as its own file slice since 
each has a different base commit time (thats how 0.16.x HoodieLogFile would 
deduce it). So, we might have to port over CompletionTimeQueryView class and 
associated logic to 0.16.0. So, for file slice deduction logic in 0.16.0 will 
be pretty much similar to 1.x reader. But the log file ordering for log reading 
purpose, we do not need to maintain parity with 1.x reader as of yet. (unless 
we make NBCC default with MDT).
 Assuming 1.x reader and 1.x FSV should be able to read data written in older 
hudi versions, we also have a potential option here for avoid making nit-picky 
changes similar to the option called out earlier.
 We could instantiate two different FSV depending on the table version. If 
table version is 7 (0.16.0), we could instantiate FSV_V0 may be and if table 
version is 8 (1.0.0), we could instantiate FSV_V1. So that we don’t 
break/regress any of 0.16.0 read functionality in the interest of supporting 
1.x table reads. We should strive to cover all scenarios and not let any bugs 
creep in, but trying to see if we can keep the changes isolated so that battle 
tested code (FSV) is not touched or changed for the purpose of supporting 1.x 
table reads. If we run into any bugs with 1.x reads, we could ask users to not 
upgrade any of the writers to 1.x and stick with 0.16.0 unless we have say 
1.0.1 or something. But it would be really bad if we break 0.16.0 table read in 
some edge case.  Just calling out as one of the safe option to upgrade.
 
-
  Pending exploration:
-How partially failed log files are ignored in 1.x. I see all log files are 
accounted for while building FSV.
+1. We removed special suffixes to MDT operations in 1x. we need to test the 
flow and flush out details if anything to be added to 0.16.x reader. 

Review Comment:
   understand the new commit time generation logic is foolproof. what incase 
there is a concurrent ingestion in data table co-incidentally generates the 
same commit time? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663011966


##
rfc/rfc-78/rfc-78.md:
##
@@ -179,6 +183,9 @@ Let’s reiterate what we need to support w/ 0.16.0 reader.
 On a high level, we need to ensure commit metadata in either format (avro or 
json) need to be supported. And “cluster” and completed “compaction”s need to 
be readable in 0.16.0 reader.
 - But the challenging part is, for every commit metadata, we might have to 
deserialize to avro and on exception try json. We could deduce the format using 
completion file name, but as per current code layering, deserialization methods 
does not know the file name( method takes byte[]).
 - Similarly for clustering commits, unless we have some kind of watermark, we 
have to keep considering replace commits as well in the FSV building logic to 
ensure we do not miss any clustering commits.
+- To be decided: We also need to use diff LogFileComparators depending on the 
file slice's base instant time. If the file slices's base instant time is < 
table upgrade commit time, we use older log file comparator to order log files. 
but if file slice's base instant time > table upgrade commit time, we have to 
use new log file comparator (completion time). Tricky part is if a file slice 
contains a mix of log files. 
+ This fix definitely needs to go into 1.x, but whether we wanted to port this 
change to 0.16.x or not is yet to be discussed and decided. Lets zoom in a bit 
to see what will happen if a single file slice could contain a mix of log files 
using 1.x reader(this is a basic requirement to support 0.16.x tables in 1.x). 

Review Comment:
   which mean, the log file compartor logic from 1.x needs to be ported to 
0.16.x reader 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1663010943


##
rfc/rfc-78/rfc-78.md:
##
@@ -179,6 +183,9 @@ Let’s reiterate what we need to support w/ 0.16.0 reader.
 On a high level, we need to ensure commit metadata in either format (avro or 
json) need to be supported. And “cluster” and completed “compaction”s need to 
be readable in 0.16.0 reader.
 - But the challenging part is, for every commit metadata, we might have to 
deserialize to avro and on exception try json. We could deduce the format using 
completion file name, but as per current code layering, deserialization methods 
does not know the file name( method takes byte[]).
 - Similarly for clustering commits, unless we have some kind of watermark, we 
have to keep considering replace commits as well in the FSV building logic to 
ensure we do not miss any clustering commits.
+- To be decided: We also need to use diff LogFileComparators depending on the 
file slice's base instant time. If the file slices's base instant time is < 
table upgrade commit time, we use older log file comparator to order log files. 
but if file slice's base instant time > table upgrade commit time, we have to 
use new log file comparator (completion time). Tricky part is if a file slice 
contains a mix of log files. 
+ This fix definitely needs to go into 1.x, but whether we wanted to port this 
change to 0.16.x or not is yet to be discussed and decided. Lets zoom in a bit 
to see what will happen if a single file slice could contain a mix of log files 
using 1.x reader(this is a basic requirement to support 0.16.x tables in 1.x). 

Review Comment:
   we need to fix 1.x reader to enforce completion time based log file ordering 
for file slice. after the fix, from our understanding, same logic should work 
for a file slice completely written in 0.x. bcoz, completion time will match 
for all log files. and then we should use log version to determine the 
ordering. 
   we need to have lot of tests covering all these scenarios. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1662991098


##
rfc/rfc-78/rfc-78.md:
##
@@ -145,7 +147,9 @@ This will be an automatic upgrade for users when they start 
using 1.x hudi libra
 - No changes to log reader.
 - Check custom payload class in table properties and switch to payload type.
 - Trigger compaction for latest file slices. We do not want a single file 
slice having a mix of log files from 0.x and log files from 1.x. So, we will 
trigger a full compaction 
-of the table to ensure all latest file slices has just the base files.
+of the table to ensure all latest file slices has just the base files. 
+  - Lets dissect and see what it needs to support not requiring the full 
compaction. In general, we plan to add a table config to track the commit time 
(more on this later in this doc) when the upgrade was done. 
+So, using the upgrade commit time, we should be able to use different log 
file comparator to order log files within a given file slice. 

Review Comment:
   0.16.x reader is not going to order log files based on completion time and 
will only be ordering based on log version even for 1.x tables. 
   which means, even for a single file slice, having a mix of 0.x log files and 
1.x log files, we should be good here. 
   
   file slice determination:
   HoodieLogFile.getBaseInstantTime() has to work for both log files (0.x log 
files and 1.x log files). if we ensure this is intact, we should be good. 
   From skimming 1.x master, it should work OOB for 0.x log files. but lets 
test it out. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-02 Thread via GitHub


nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1662991098


##
rfc/rfc-78/rfc-78.md:
##
@@ -145,7 +147,9 @@ This will be an automatic upgrade for users when they start 
using 1.x hudi libra
 - No changes to log reader.
 - Check custom payload class in table properties and switch to payload type.
 - Trigger compaction for latest file slices. We do not want a single file 
slice having a mix of log files from 0.x and log files from 1.x. So, we will 
trigger a full compaction 
-of the table to ensure all latest file slices has just the base files.
+of the table to ensure all latest file slices has just the base files. 
+  - Lets dissect and see what it needs to support not requiring the full 
compaction. In general, we plan to add a table config to track the commit time 
(more on this later in this doc) when the upgrade was done. 
+So, using the upgrade commit time, we should be able to use different log 
file comparator to order log files within a given file slice. 

Review Comment:
   0.16.x reader is not going to order log files based on completion time and 
will only be ordering based on log version even for 1.x tables. 
   which means, even for a single file slice, having a mix of 0.x log files and 
1.x log files, we should be good here. 
   
   pending:
   file slice determination:
   HoodieLogFile.getBaseInstantTime() has to work for both log files (0.x log 
files and 1.x log files). if we ensure this is intact, we should be good. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org