Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

via GitHub Wed, 03 Jul 2024 22:03:59 -0700


danny0405 commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1665122005



##########
rfc/rfc-78/rfc-78.md:
##########
@@ -0,0 +1,220 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. We 
released beta releases which was meant for 
+enthusiastic developers/users to give a try of advanced features. But as we 
are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x.
+
+- 1.x reader should be able to read 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. In this case, we explicitly request users to not turn on these features 
till readers are completely in 1.x.
+- Document upgrade steps from 0.16.x to 1.x with limited user perceived 
latency. This will be auto upgrade, but document clearly what needs to be done.
+- Downgrade from 1.x to 0.16.x documented with call outs on any functionality.
+
+### Considerations when choosing Migration strategy
+- While migration is happening, we want to allow readers to continue reading 
data. This means, we cannot employ a stop-the-world strategy when we are 
migrating. 
+All the actions that we are performing as part of table upgrade should not 
have any side-effects of breaking snapshot isolation for readers.
+- Also, users should have migrated to 0.16.x before upgrading to 1.x. We do 
not want to add read support for very old versions of hudi in 1.x(for eg 
0.7.0). 
+- So, in an effort to bring everyone to latest hudi versions, 1.x reader will 
have full read capabilities for 0.16.x, but for older hudi versions, 1.x reader 
may not have full reader support. 
+The reocmmended guideline is to upgrade all readers and writers to 0.16.x. and 
then slowly start upgrading to 1.x(readers followed by writers). 
+
+Before we dive in further, lets understand the format changes:
+
+## Format changes
+### Table properties
+- Payload class ➝ payload type.
+- New metadata partitions could be added (optionally enabled)
+
+### MDT changes
+- New MDT partitions are available in 1.x. MDT schema upgraded.
+- RLI schema is upgraded to hold row position
+
+### Timeline:
+- [storage changes] Completed write commits have completed times in the file 
name.
+- [storage changes] Completed and inflight write commits are in avro format 
which were json in 0.x.
+- We are switching the action type for clustering from “replace commit” to 
“cluster”.
+- Similarly, for completed compaction, we are switching from “commit” to 
“compaction” in an effort to standardize actions for a given write operation.
+- [storage changes] Timeline ➝ LST timeline. There is no archived timeline in 
1.x
+- [In-memory changes] HoodieInstant changes due to presence of completion time 
for completed HoodieInstants.

Review Comment:
   We do not introduce the completion time based inc queries for Spark yet, but 
for the GA release, we might need to have a compatible solution for migrattion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

Reply via email to