Re: [PR] Reduce the number of ZooSession objects created [accumulo]

via GitHub Fri, 07 Feb 2025 20:15:37 -0800


ctubbsii commented on code in PR #5315:
URL: https://github.com/apache/accumulo/pull/5315#discussion_r1947456390



##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -795,18 +792,18 @@ public synchronized void stop() throws IOException, 
InterruptedException {
     // is restarted, then the processes will start right away
     // and not wait for the old locks to be cleaned up.
     try {
-      new ZooZap().zap(getServerContext().getSiteConfiguration(), "-manager",
-          "-compaction-coordinators", "-tservers", "-compactors", "-sservers");
+      new ZooZap().zap(getServerContext().getZooSession(),
+          getServerContext().getSiteConfiguration(), "-manager", 
"-compaction-coordinators",
+          "-tservers", "-compactors", "-sservers");
     } catch (RuntimeException e) {
       log.error("Error zapping zookeeper locks", e);
     }
-    control.stop(ServerType.ZOOKEEPER, null);
 
     // Clear the location of the servers in ZooCache.
-    // When ZooKeeper was stopped in the previous method call,
-    // the local ZooKeeper watcher did not fire. If MAC is
-    // restarted, then ZooKeeper will start on the same port with
-    // the same data, but no Watchers will fire.
+    // If MAC is restarted, then ZooKeeper will start
+    // on the same port with the same data and the
+    // start of the server processes will wait until
+    // the pre-existing ephemeral locks time out.
     boolean startCalled = true;
     try {
       getServerContext().getZooKeeperRoot();

Review Comment:
   This whole block is sketchy. This seems to be relying on a side-effect of 
ServerContext.getZooKeeperRoot() having already computed the root from the 
instanceId... but you could just call getInstanceId() instead to verify that it 
has been looked up.
   
   However, in order to even make it this far, the 
`getServerContext().getZooSession()` passed to the above ZooZap call will have 
already had to do all this... at least... it will have had to in order to use 
the context's ZooSession, which will have already been chrooted. That's why 
ZooZap takes config only... so it can create its own non-chrooted ZooSession 
that doesn't depend on whether the cluster has been initialized yet or not.



##########
server/base/src/main/java/org/apache/accumulo/server/ServerContext.java:
##########
@@ -140,8 +140,15 @@ private ServerContext(ServerInfo info) {
    * Used during initialization to set the instance name and ID.
    */
   public static ServerContext initialize(SiteConfiguration siteConfig, String 
instanceName,
-      InstanceId instanceID) {
-    return new ServerContext(ServerInfo.initialize(siteConfig, instanceName, 
instanceID));
+      InstanceId instanceId) {
+    return new ServerContext(ServerInfo.initialize(siteConfig, instanceName, 
instanceId)) {
+
+      @Override
+      public InstanceId getInstanceID() {
+        return instanceId;
+      }
+
+    };

Review Comment:
   A similar thing is already being done in the ServerInfo.initialize() today. 
Initialize doesn't create a an unnecessary ZooSession to do this lookup, so 
there's nothing to fix here. It's already handled by ServerInfo.



##########
server/base/src/main/java/org/apache/accumulo/server/ServerInfo.java:
##########
@@ -57,18 +58,17 @@ public class ServerInfo implements ClientInfo {
   // set things up using the config file, the instanceId from HDFS, and ZK for 
the instanceName
   static ServerInfo fromServerConfig(SiteConfiguration siteConfig) {
     final Function<ServerInfo,String> instanceNameFromZk = si -> {
+      InstanceId iid = VolumeManager.getInstanceIDFromHdfs(
+          
si.getServerDirs().getInstanceIdLocation(si.getVolumeManager().getFirst()),
+          si.getHadoopConf());

Review Comment:
   By merging this instanceId lookup into the function to look up the 
instanceName, and removing the function to track the instanceId, that creates 
an unnecessary need to look up the instanceId in ZooKeeper later.
   
   One of the goals of the previous iteration I did in here was to make sure 
that we don't do unnecessary lookups like that. The ServerInfo should already 
have the instanceId from the volume manager lookup, so when ServerContext asks 
ServerInfo for it, it should never need to go back to ZooKeeper, since it 
already has it. Here, it goes to the volume manager, but then doesn't remember 
it, so ZooKeeper must be consulted later.



##########
core/src/main/java/org/apache/accumulo/core/clientImpl/ClientContext.java:
##########
@@ -230,6 +231,9 @@ public ClientContext(SingletonReservation reservation, 
ClientInfo info,
       return zk;
     });
 
+    this.instanceId =
+        memoize(() -> ZooUtil.getInstanceId(this.zooSession.get(), 
getInstanceName()));
+

Review Comment:
   This isn't compatible with the chroot effort. `this.zooSession` is going to 
be chrooted, but the lookup of the mapping between instanceName/instanceId 
lives outside the chroot hierarchy.



##########
server/base/src/main/java/org/apache/accumulo/server/util/ZooZap.java:
##########
@@ -94,13 +94,15 @@ public void execute(String[] args) throws Exception {
       if (siteConf.getBoolean(Property.INSTANCE_RPC_SASL_ENABLED)) {
         SecurityUtil.serverLogin(siteConf);
       }
-      zap(siteConf, args);
+      try (var zk = new ZooSession(getClass().getSimpleName(), siteConf)) {
+        zap(zk, siteConf, args);
+      }
     } finally {
       SingletonManager.setMode(Mode.CLOSED);
     }
   }
 
-  public void zap(SiteConfiguration siteConf, String... args) {
+  public void zap(ZooSession zk, SiteConfiguration siteConf, String... args) {

Review Comment:
   The two use cases should be:
   
   1. ZooZap.execute(mainArgs), or
   2. ZooZap.zap(serverContext, options)
   
   The execute version should do the options parsing, and the other should just 
be able to be called directly, with options suitable for code (like a Set of 
services to zap, rather than option strings like `-manager`).



##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -795,18 +792,18 @@ public synchronized void stop() throws IOException, 
InterruptedException {
     // is restarted, then the processes will start right away
     // and not wait for the old locks to be cleaned up.
     try {
-      new ZooZap().zap(getServerContext().getSiteConfiguration(), "-manager",
-          "-compaction-coordinators", "-tservers", "-compactors", "-sservers");
+      new ZooZap().zap(getServerContext().getZooSession(),
+          getServerContext().getSiteConfiguration(), "-manager", 
"-compaction-coordinators",
+          "-tservers", "-compactors", "-sservers");
     } catch (RuntimeException e) {
       log.error("Error zapping zookeeper locks", e);
     }
-    control.stop(ServerType.ZOOKEEPER, null);
 
     // Clear the location of the servers in ZooCache.
-    // When ZooKeeper was stopped in the previous method call,
-    // the local ZooKeeper watcher did not fire. If MAC is
-    // restarted, then ZooKeeper will start on the same port with
-    // the same data, but no Watchers will fire.
+    // If MAC is restarted, then ZooKeeper will start
+    // on the same port with the same data and the
+    // start of the server processes will wait until
+    // the pre-existing ephemeral locks time out.

Review Comment:
   I think the old version of this comment is wrong, so it should be changed. 
However, I think these changes aren't quite correct either. It doesn't matter, 
for example, that ZooKeeper is restarting on the same port with the same data, 
only that the locks created by the Accumulo servers still exist until they time 
out. That's why ZooZap is called. ZooCache is only cleared here, because it has 
no idea what ZooZap did, so we want to make sure the current process is updated.
   
   If my suggestion to make ZooZap take a context is applied here, then ZooZap 
can clear ZooCache itself, and there's no need for much of the rest of this 
code at all.
   
   Also, the comment above the try block is sufficient to explain that the 
locks need cleared after stopping the services. However, it has some typos in 
it.
   If ZooCache is cleared next, then the only thing this comment needs to say 
is something like:
   
    ```suggestion
       // Clear the zapped locations from ZooCache
   ```



##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterControl.java:
##########
@@ -283,7 +283,8 @@ public synchronized void stop(ServerType server, String 
hostname) throws IOExcep
           try {
             cluster.stopProcessWithTimeout(managerProcess, 30, 
TimeUnit.SECONDS);
             try {
-              new 
ZooZap().zap(cluster.getServerContext().getSiteConfiguration(), "-manager");
+              new ZooZap().zap(cluster.getServerContext().getZooSession(),
+                  cluster.getServerContext().getSiteConfiguration(), 
"-manager");

Review Comment:
   I think it would be worth having a zap method that takes a context here, but 
not a ZooSession. The context will have the ZooSession, and it will be chrooted 
properly for ZooZap to work with, once the chroot stuff is done. It should not 
take a ZooSession directly, because it won't know the relative paths without 
knowing whether the passed in ZooSession is chrooted or not.



##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -642,79 +642,76 @@ private void verifyUp() throws InterruptedException, 
IOException {
       waitForProcessStart(tsp, "TabletServer" + tsExpectedCount);
     }
 
-    String secret = getSiteConfiguration().get(Property.INSTANCE_SECRET);
     String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + 
getInstanceName();
-    try (var zk = new ZooSession(MiniAccumuloClusterImpl.class.getSimpleName() 
+ ".verifyUp()",
-        getZooKeepers(), 60000, secret)) {
-      var rdr = zk.asReader();
-      InstanceId instanceId = null;
-      for (int i = 0; i < numTries; i++) {
-        try {
-          // make sure it's up enough we can perform operations successfully
-          rdr.sync("/");
-          // wait for the instance to be created
-          instanceId = InstanceId.of(new String(rdr.getData(instanceNamePath), 
UTF_8));
-          break;
-        } catch (KeeperException e) {
-          log.warn("Error trying to read instance id from zookeeper: {}", 
e.getMessage());
-          log.debug("Unable to read instance id from zookeeper.", e);
-        }
-        Thread.sleep(1000);
-      }
-
-      if (instanceId == null) {
-        try {
-          log.warn("******* COULD NOT FIND INSTANCE ID - DUMPING ZK 
************");
-          log.warn("Connected to ZooKeeper: {}", getZooKeepers());
-          log.warn("Looking for instanceId at {}", instanceNamePath);
-          ZKUtil.visitSubTreeDFS(zk, Constants.ZROOT, false,
-              (rc, path, ctx, name) -> log.warn("{}", path));
-          log.warn("******* END ZK DUMP ************");
-        } catch (KeeperException e) {
-          log.error("Error dumping zk", e);
-        }
-        throw new IllegalStateException("Unable to find instance id from 
zookeeper.");
-      }
-
-      String rootPath = ZooUtil.getRoot(instanceId);
-      int tsActualCount = 0;
+    ZooSession zk = getServerContext().getZooSession();

Review Comment:
   This ZooSession instance must be separate because it is checking the 
instanceName/instanceId hierarchy, outside the (to-be) chrooted hierarchy that 
the context.getZooSession() will be. This is specific to 3.1, though. The main 
branch has a complete rewrite of verifyUp, so it doesn't need to do this 
directly. I actually think it'd be better to backport those changes to 3.1, if 
possible, or else it should just leave this ZooSession alone... or it should be 
rewritten to only check for things inside the chroot hierarchy, and not the 
instanceName/instanceId mapping, in which case it could use the ZooSession from 
the context.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Reduce the number of ZooSession objects created [accumulo]

Reply via email to