ctubbsii commented on code in PR #5344:
URL: https://github.com/apache/accumulo/pull/5344#discussion_r1964331620


##########
server/manager/src/main/java/org/apache/accumulo/manager/upgrade/Upgrader12to13.java:
##########
@@ -363,8 +364,18 @@ private void 
validateEmptyZKWorkerServerPaths(ServerContext context) {
         List<String> children = zr.getChildren(zkRoot + serverPath);
         for (String child : children) {
           if (child.contains(":")) {
-            throw new IllegalStateException("Found server address at " + 
serverPath + "/" + child
-                + ". Was expecting either a resource group name or nothing. 
Stop any referenced servers.");
+            String childPath = zkRoot + serverPath + "/" + child;
+            if (zr.getChildren(childPath).isEmpty()) {
+              // child is likely host:port and is an empty directory. Since 
there
+              // is no lock here, then the server is likely down (or should 
be).
+              // Remove the entry and move on.
+              
context.getZooSession().asReaderWriter().recursiveDelete(childPath,
+                  NodeMissingPolicy.FAIL);

Review Comment:
   It might be okay to use NodeMissingPolicy.SKIP instead... if something else 
deleted the lock node in the interim, that might be okay?



##########
server/manager/src/main/java/org/apache/accumulo/manager/upgrade/Upgrader12to13.java:
##########
@@ -363,8 +364,18 @@ private void 
validateEmptyZKWorkerServerPaths(ServerContext context) {
         List<String> children = zr.getChildren(zkRoot + serverPath);
         for (String child : children) {
           if (child.contains(":")) {
-            throw new IllegalStateException("Found server address at " + 
serverPath + "/" + child
-                + ". Was expecting either a resource group name or nothing. 
Stop any referenced servers.");
+            String childPath = zkRoot + serverPath + "/" + child;
+            if (zr.getChildren(childPath).isEmpty()) {
+              // child is likely host:port and is an empty directory. Since 
there
+              // is no lock here, then the server is likely down (or should 
be).
+              // Remove the entry and move on.
+              
context.getZooSession().asReaderWriter().recursiveDelete(childPath,

Review Comment:
   I wonder if it would be safer if this upgrade code reserved its own lock on 
these empty nodes, to prevent other services from getting the lock while in 
this upgrade code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to