madrob commented on a change in pull request #1297: SOLR-14253 Replace various 
sleep calls with ZK waits
URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r385776252
 
 

 ##########
 File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
 ##########
 @@ -1684,58 +1685,37 @@ private void 
doGetShardIdAndNodeNameProcess(CoreDescriptor cd) {
   }
 
   private void waitForCoreNodeName(CoreDescriptor descriptor) {
-    int retryCount = 320;
-    log.debug("look for our core node name");
-    while (retryCount-- > 0) {
-      final DocCollection docCollection = zkStateReader.getClusterState()
-          
.getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName());
-      if (docCollection != null && docCollection.getSlicesMap() != null) {
-        final Map<String, Slice> slicesMap = docCollection.getSlicesMap();
-        for (Slice slice : slicesMap.values()) {
-          for (Replica replica : slice.getReplicas()) {
-            // TODO: for really large clusters, we could 'index' on this
-
-            String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP);
-            String core = replica.getStr(ZkStateReader.CORE_NAME_PROP);
-
-            String msgNodeName = getNodeName();
-            String msgCore = descriptor.getName();
-
-            if (msgNodeName.equals(nodeName) && core.equals(msgCore)) {
-              descriptor.getCloudDescriptor()
-                  .setCoreNodeName(replica.getName());
-              getCoreContainer().getCoresLocator().persist(getCoreContainer(), 
descriptor);
-              return;
-            }
-          }
+    log.debug("waitForCoreNodeName >>> look for our core node name");
+    try {
+      zkStateReader.waitForState(descriptor.getCollectionName(), 320, 
TimeUnit.SECONDS, c -> {
+        String name = ClusterStateMutator.getAssignedCoreNodeName(c, 
getNodeName(), descriptor.getName());
+        if (name == null) {
+          return false;
         }
-      }
-      try {
-        Thread.sleep(1000);
-      } catch (InterruptedException e) {
-        Thread.currentThread().interrupt();
-      }
+        descriptor.getCloudDescriptor().setCoreNodeName(name);
+        return true;
+      });
+    } catch (TimeoutException | InterruptedException e) {
+      throw new SolrException(ErrorCode.SERVER_ERROR, "Timeout waiting for 
collection state", e);
 
 Review comment:
   A couple of logic changes here, yes. 1) Before we would continue to retry on 
interrupt, i.e. the interruption would only count against the current attempt 
not the whole method. That's probably wrong. 2) We wouldn't fail if we don't 
see the result state. Also probably wrong, and I suspect that we would end up 
failing later when this was missing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to