errose28 commented on code in PR #9664:
URL: https://github.com/apache/ozone/pull/9664#discussion_r2920895756


##########
hadoop-hdds/docs/content/design/zdu-design.md:
##########
@@ -0,0 +1,535 @@
+---
+jira: HDDS-3331
+authors:
+- Stephen O'Donnell
+- Ethan Rose
+- Istvan Fajth
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Zero Downtime Upgrade (ZDU)
+
+## The Goal
+
+The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an 
existing Ozone cluster to be upgraded while the cluster remains operational. 
There should be no gaps in service and the upgrade should be transparent to 
applications using the cluster.
+
+Ozone is already designed to be fault tolerant, so the rolling restart of SCM, 
OM and Datanodes is already possible without impacting users of the cluster. 
The challenge with ZDU is therefore related to wire and disk compatibility, as 
different components within the cluster can be running different software 
versions concurrently. This design will focus on how we solve the wire and disk 
compatibility issues.
+
+## Component Upgrade Order
+
+To simplify reasoning about components of different types running in different 
versions, we should reduce the number of possible version combinations allowed 
as much as possible. Clients are considered external to the Ozone cluster, 
therefore we cannot control their version. However, we already have a framework 
to handle client/server cross compatibility, so rolling upgrade only needs to 
focus on compatibility of internal components. For internal Ozone components, 
we can define and enforce an order that the components must be upgraded in. 
Consider the following Ozone service diagram:
+
+![Ozone connection diagram](zdu-image1.png)
+
+Here the arrows represent client to server interactions between components, 
with the arrow pointing from the client to the server. The red arrow is 
external clients interacting with Ozone. The shield means that the client needs 
to see a consistent API surface despite leader changes in mixed version 
clusters so that APIs do not seem to disappear and reappear based on the node 
serving the request. The orange lines represent client to server interactions 
for internal Ozone components. For components connected by this internal line, 
**we can control the order that they are upgraded such that the server is 
always newer and handles all compatibility issues**. This greatly reduces the 
matrix of possible versions we may see within Ozone and mostly eliminates the 
need for internal Ozone components to be aware of each other’s versions, as 
long as servers remain backwards compatible. This order is:
+
+1. Upgrade all SCMs to the new version  
+2. Upgrade Recon to the new version  
+3. Upgrade all Datanodes to the new version  
+4. Upgrade all OMs to the new version  

Review Comment:
   There upgrade/restart steps are done by an admin, possibly with an 
orchestration layer, so Ozone doesn't decide whether or not the decom nodes get 
upgraded. If they do, nothing about the decom/maintenace/recom process is 
expected to change though since ZDU means all existing operations are allowed 
throughout the upgrade and finalization process.
   
   Starting on line 227 we spec out how datanodes are handled relative to SCM, 
which includes if they are offline and come back later.  Let me know if there's 
more questions in that area. Note that once SCM is finalized, any datanodes 
that later appear with the old software version will be fenced out until the 
admin upgrades them.
   
   The doc currently doesn't specify whether nodes undergoing decom or 
maintenance will be instructed to finalize by SCM. I think we should still send 
them the finalize commands so they don't block further upgrade steps 
unnecessarily. @sodonnel what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to