ctubbsii commented on code in PR #5357:
URL: https://github.com/apache/accumulo/pull/5357#discussion_r1970694765


##########
server/manager/src/main/java/org/apache/accumulo/manager/upgrade/UpgradeProgressTracker.java:
##########
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.manager.upgrade;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.accumulo.core.util.LazySingletons.GSON;
+
+import java.util.Objects;
+
+import org.apache.accumulo.core.Constants;
+import org.apache.accumulo.core.fate.zookeeper.ZooReaderWriter;
+import org.apache.accumulo.core.fate.zookeeper.ZooUtil.NodeExistsPolicy;
+import org.apache.accumulo.core.fate.zookeeper.ZooUtil.NodeMissingPolicy;
+import org.apache.accumulo.server.AccumuloDataVersion;
+import org.apache.accumulo.server.ServerContext;
+import org.apache.zookeeper.KeeperException;
+
+import com.google.common.base.Preconditions;
+
+public class UpgradeProgressTracker {
+
+  public static class ComponentVersions {
+    private int zooKeeperVersion;
+    private int rootVersion;
+    private int metadataVersion;

Review Comment:
   If we don't track a target version here, we could still check that any 
completed upgrade versions are not newer than what the current code supports.
   
   However, if we don't track a target version, then we can't be sure that 
there wasn't a partial upgrade attempt after the most recently recorded 
successful upgrade version, that is now no longer possible to recover from. So, 
the target version is kind of important... or at least some insight into 
whether the most recently recorded version is actually a terminal state at the 
end of any upgrade efforts.
   
   For example, let's say we fail while upgrading ZooKeeper, going from 2.1 to 
4.0, and it failed after we upgrade ZooKeeper to 3.1. If we don't track the 
target version, and we don't know for sure that 3.1 was actually the terminal 
condition we were trying to reach, then we can't really be sure that the 
failure didn't occur after making some changes from 3.1 to 4.0. If we then try 
to start up 3.1, it may believe it can complete the upgrade to 3.1, when the 
upgrade progress has already moved a little past 3.1 in an attempt to get it to 
4.0. So, we would need to track either a target version, or at least know that 
no additional work was done after the successful update to 3.1.
   
   We could increment the target version each time before starting an upgrade 
stage, though. That would ensure that a partial upgrade to 4.0 that is 
interrupted after getting as far as 3.1 could be completed to 3.1 by the 3.1 
code, if it hadn't yet started the 3.1 to 4.0 transition.
   
   For example, considering just the ZK upgrade loop going from version A to C:
   
   1. Bump ZK target version to B
   2. Upgrade ZK from A to B
   3. Set ZK completed version to B
   4. Bump ZK target version to C
   5. Upgrade ZK from B to C
   6. Set ZK completed version to C
   
   This means that if the upgrade was interrupted between step 3 and 4 using 
Accumulo version C, Accumulo version B could still complete an upgrade to B.
   
   I think what @keith-turner was proposing is more of a "final target 
version", in which case only version C could complete the upgrade that version 
C started, even if it hadn't gotten to B yet. We can call this algorithm I've 
outlined here the "next target version" model, instead of the "final target 
version" model, so you just increment the target to the next step. I don't know 
if it's worth it, but it could be helpful if version C had a problem that 
version B didn't, and version B was able to at least complete an upgrade to B, 
to get Accumulo back into a useful state.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to