errose28 commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2920895756
########## hadoop-hdds/docs/content/design/zdu-design.md: ########## @@ -0,0 +1,535 @@ +--- +jira: HDDS-3331 +authors: +- Stephen O'Donnell +- Ethan Rose +- Istvan Fajth +--- + +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Zero Downtime Upgrade (ZDU) + +## The Goal + +The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an existing Ozone cluster to be upgraded while the cluster remains operational. There should be no gaps in service and the upgrade should be transparent to applications using the cluster. + +Ozone is already designed to be fault tolerant, so the rolling restart of SCM, OM and Datanodes is already possible without impacting users of the cluster. The challenge with ZDU is therefore related to wire and disk compatibility, as different components within the cluster can be running different software versions concurrently. This design will focus on how we solve the wire and disk compatibility issues. + +## Component Upgrade Order + +To simplify reasoning about components of different types running in different versions, we should reduce the number of possible version combinations allowed as much as possible. Clients are considered external to the Ozone cluster, therefore we cannot control their version. However, we already have a framework to handle client/server cross compatibility, so rolling upgrade only needs to focus on compatibility of internal components. For internal Ozone components, we can define and enforce an order that the components must be upgraded in. Consider the following Ozone service diagram: + + + +Here the arrows represent client to server interactions between components, with the arrow pointing from the client to the server. The red arrow is external clients interacting with Ozone. The shield means that the client needs to see a consistent API surface despite leader changes in mixed version clusters so that APIs do not seem to disappear and reappear based on the node serving the request. The orange lines represent client to server interactions for internal Ozone components. For components connected by this internal line, **we can control the order that they are upgraded such that the server is always newer and handles all compatibility issues**. This greatly reduces the matrix of possible versions we may see within Ozone and mostly eliminates the need for internal Ozone components to be aware of each other’s versions, as long as servers remain backwards compatible. This order is: + +1. Upgrade all SCMs to the new version +2. Upgrade Recon to the new version +3. Upgrade all Datanodes to the new version +4. Upgrade all OMs to the new version Review Comment: There upgrade/restart steps are done by an admin, possibly with an orchestration layer, so Ozone doesn't decide whether or not the decom nodes get upgraded. If they do, nothing about the decom/maintenace/recom process is expected to change though since ZDU means all existing operations are allowed throughout the upgrade and finalization process. Starting on line 227 we spec out how datanodes are handled relative to SCM, which includes if they are offline and come back later. Let me know if there's more questions in that area. Note that once SCM is finalized, any datanodes that later appear with the old software version will be fenced out until the admin upgrades them. The doc currently doesn't specify whether nodes undergoing decom or maintenance will be instructed to finalize by SCM. I think we should still send them the finalize commands so they don't block further upgrade steps unnecessarily. @sodonnel what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
