risdenk commented on a change in pull request #705:
URL: https://github.com/apache/solr/pull/705#discussion_r815150824



##########
File path: solr/core/src/java/org/apache/solr/api/ContainerPluginsRegistry.java
##########
@@ -88,16 +88,16 @@ public boolean onChange(Map<String, Object> properties) {
     Phaser localPhaser = phaser; // volatile read
     if (localPhaser != null) {
       assert localPhaser.getRegisteredParties() == 1;
-      localPhaser.arrive(); // we should be the only ones registered, so this 
will advance phase each time
+      localPhaser
+          .arrive(); // we should be the only ones registered, so this will 
advance phase each time

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -93,48 +95,57 @@ public DistributedApiAsyncTracker(SolrZkClient zkClient, 
String rootPath) {
     persistentIdsPath = rootPath + ZK_ASYNC_PERSISTENT;
     inFlightIdsPath = rootPath + ZK_ASYNC_INFLIGHT;
 
-    trackedAsyncTasks = new SizeLimitedDistributedMap(zkClient, 
persistentIdsPath, maxTrackedTasks, null);
+    trackedAsyncTasks =
+        new SizeLimitedDistributedMap(zkClient, persistentIdsPath, 
maxTrackedTasks, null);
     inFlightAsyncTasks = new InFlightJobs(zkClient, inFlightIdsPath);
   }
 
   /**
-   * After a successful call to this method, caller MUST eventually call 
{@link #setTaskCompleted} or {@link #cancelAsyncId}
-   * otherwise the task will forever be considered as in progress.
+   * After a successful call to this method, caller MUST eventually call 
{@link #setTaskCompleted}
+   * or {@link #cancelAsyncId} otherwise the task will forever be considered 
as in progress.
+   *
    * @param asyncId if {@code null} this method will do nothing.
-   * @return {@code true} if the asyncId was not already in use (or is {@code 
null}) and {@code false} if it is already
-   * in use and can't be allocated again.
+   * @return {@code true} if the asyncId was not already in use (or is {@code 
null}) and {@code
+   *     false} if it is already in use and can't be allocated again.
    */
   public boolean createNewAsyncJobTracker(String asyncId) {
     if (asyncId == null) {
       return true;
     }
     try {
-      // First create the persistent node, with no content. If that fails, it 
means the asyncId has been previously used
+      // First create the persistent node, with no content. If that fails, it 
means the asyncId has
+      // been previously used
       // and not yet cleared...
       if (!trackedAsyncTasks.putIfAbsent(asyncId, null)) {
         return false;
       }
 
-      // ...then create the transient node. If the corresponding ephemeral 
node already exists, it means the persistent node
-      // was removed (maybe trackedAsyncTasks grew too large? It has a max 
size then evicts). We cannot then track the new
+      // ...then create the transient node. If the corresponding ephemeral 
node already exists, it
+      // means the persistent node
+      // was removed (maybe trackedAsyncTasks grew too large? It has a max 
size then evicts). We
+      // cannot then track the new
       // provided asyncId, and have simply "revived" its persistent node...

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -226,15 +244,20 @@ public void cancelAsyncId(String asyncId) {
       return new Pair<>(RequestStatusState.RUNNING, null);
     }
 
-    // The task has failed, but there are two options: if response is null, it 
has failed because the node on which it was
-    // running has crashed. If it is not null, it has failed because the 
execution has failed. Because caller expects a non
+    // The task has failed, but there are two options: if response is null, it 
has failed because
+    // the node on which it was
+    // running has crashed. If it is not null, it has failed because the 
execution has failed.
+    // Because caller expects a non
     // null response in any case, let's make up one if needed...
     if (response == null) {
-      // Node crash has removed the ephemeral node, but the command did not 
complete execution (or didn't even start it, who
+      // Node crash has removed the ephemeral node, but the command did not 
complete execution (or
+      // didn't even start it, who
       // knows). We have a failure to report though so let's create a 
reasonable return response.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -129,17 +148,21 @@ private boolean collectCurrentPropStats() {
       for (Replica replica : slice.getReplicas()) {
         if (onlyActiveNodes && isActive(replica) == false) {
           if (StringUtils.isNotBlank(replica.getStr(property))) {
-            removeProp(slice, replica.getName()); // Note, we won't be 
committing this to ZK until later.
+            removeProp(
+                slice, replica.getName()); // Note, we won't be committing 
this to ZK until later.
           }

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -342,12 +387,15 @@ boolean balanceProperty() {
       }
     }
 
-    // At this point, nodesHostingProp should contain _only_ lists of replicas 
that belong to slices that do _not_
+    // At this point, nodesHostingProp should contain _only_ lists of replicas 
that belong to slices
+    // that do _not_
     // have any replica hosting the property. So let's assign them.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -283,18 +310,22 @@ public void run() {
             continue;
           }
 
-          // clear the blocked tasks, may get refilled below. Given 
blockedTasks can only get entries from heads and heads
-          // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never 
exceed MAX_BLOCKED_TASKS entries.
-          // Note blockedTasks can't be cleared too early as it is used in the 
excludedTasks Predicate above.
+          // clear the blocked tasks, may get refilled below. Given 
blockedTasks can only get
+          // entries from heads and heads
+          // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never 
exceed MAX_BLOCKED_TASKS
+          // entries.
+          // Note blockedTasks can't be cleared too early as it is used in the 
excludedTasks
+          // Predicate above.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/ShardLeaderElectionContext.java
##########
@@ -133,8 +157,10 @@ void runLeaderProcess(boolean weAreReplacement, int 
pauseBeforeStart) throws Kee
       }
 
       if (isClosed) {
-        // Solr is shutting down or the ZooKeeper session expired while 
waiting for replicas. If the later,
-        // we cannot be sure we are still the leader, so we should bail out. 
The OnReconnect handler will
+        // Solr is shutting down or the ZooKeeper session expired while 
waiting for replicas. If the
+        // later,
+        // we cannot be sure we are still the leader, so we should bail out. 
The OnReconnect handler
+        // will
         // re-register the cores and handle a new leadership election.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -564,9 +588,11 @@ public final void doSyncOrReplicateRecovery(SolrCore core) 
throws Exception {
 
     final String ourUrl = ZkCoreNodeProps.getCoreUrl(baseUrl, coreName);
     Future<RecoveryInfo> replayFuture = null;
-    while (!successfulRecovery && !Thread.currentThread().isInterrupted() && 
!isClosed()) { // don't use interruption or
-                                                                               
             // it will close channels
-                                                                               
             // though
+    while (!successfulRecovery
+        && !Thread.currentThread().isInterrupted()
+        && !isClosed()) { // don't use interruption or
+      // it will close channels
+      // though

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -365,27 +411,37 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
 
       // Note we DO NOT track nor use the live nodes in the cluster state.
       // That may means the two abstractions (collection metadata vs. nodes) 
should be separated.
-      // For now trying to diverge as little as possible from existing data 
structures and code given the need to
-      // support both the old way (Overseer) and new way (distributed) of 
handling cluster state update.
+      // For now trying to diverge as little as possible from existing data 
structures and code
+      // given the need to
+      // support both the old way (Overseer) and new way (distributed) of 
handling cluster state
+      // update.
       final Set<String> liveNodes = Collections.emptySet();
 
-      // Per Replica States updates are done before all other updates and not 
subject to the number of attempts of CAS
-      // made here, given they have their own CAS strategy and implementation 
(see PerReplicaStatesOps.persist()).
+      // Per Replica States updates are done before all other updates and not 
subject to the number
+      // of attempts of CAS
+      // made here, given they have their own CAS strategy and implementation 
(see
+      // PerReplicaStatesOps.persist()).

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -421,17 +480,26 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
         }
 
         // Get the latest version of the collection from the cluster state 
first.
-        // There is no notion of "cached" here (the boolean passed below) as 
we the updatedState is based on CollectionRef
-        DocCollection docCollection = 
updatedState.getCollectionOrNull(updater.getCollectionName(), true);
-
-        // If we did update per replica states and we're also updating 
state.json, update the content of state.json to reflect
-        // the changes made to replica states. Not strictly necessary (the 
state source of truth is in per replica states), but nice to have...
+        // There is no notion of "cached" here (the boolean passed below) as 
we the updatedState is
+        // based on CollectionRef
+        DocCollection docCollection =
+            updatedState.getCollectionOrNull(updater.getCollectionName(), 
true);
+
+        // If we did update per replica states and we're also updating 
state.json, update the
+        // content of state.json to reflect
+        // the changes made to replica states. Not strictly necessary (the 
state source of truth is
+        // in per replica states), but nice to have...

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -705,36 +850,54 @@ public ClusterState getUpdatedClusterState() {
     }
 
     /**
-     * Using optimistic locking (and retries when needed) updates Zookeeper 
with the changes previously recorded by calls
-     * to {@link #record(MutatingCommand, ZkNodeProps)}.
+     * Using optimistic locking (and retries when needed) updates Zookeeper 
with the changes
+     * previously recorded by calls to {@link #record(MutatingCommand, 
ZkNodeProps)}.
      */
-    public void executeStateUpdates(SolrCloudManager scm, ZkStateReader 
zkStateReader) throws KeeperException, InterruptedException {
+    public void executeStateUpdates(SolrCloudManager scm, ZkStateReader 
zkStateReader)
+        throws KeeperException, InterruptedException {
       if (log.isDebugEnabled()) {
-        log.debug("Executing updates for collection " + collectionName + ", is 
creation=" + isCollectionCreation + ", " + mutations.size() + " recorded 
mutations.", new Exception("StackTraceOnly")); // nowarn
+        log.debug(
+            "Executing updates for collection "
+                + collectionName
+                + ", is creation="
+                + isCollectionCreation
+                + ", "
+                + mutations.size()
+                + " recorded mutations.",
+            new Exception("StackTraceOnly")); // nowarn
       }
       if (mutations.isEmpty()) {
-        final String err = "Internal bug. Unexpected empty set of mutations to 
apply for collection " + collectionName;
+        final String err =
+            "Internal bug. Unexpected empty set of mutations to apply for 
collection "
+                + collectionName;
         log.error(err);
         throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, err);
       }
 
-      RecordedMutationsPlayer mutationPlayer = new 
RecordedMutationsPlayer(scm, collectionName, isCollectionCreation, mutations);
+      RecordedMutationsPlayer mutationPlayer =
+          new RecordedMutationsPlayer(scm, collectionName, 
isCollectionCreation, mutations);
       ZkUpdateApplicator.applyUpdate(zkStateReader, mutationPlayer);
 
       // TODO update stats here for the various commands executed successfully 
or not?
-      // This would replace the stats about cluster state updates that the 
Collection API currently makes available using
-      // the OVERSEERSTATUS command, but obviously would be per node and will 
not have stats about queues (since there
-      // will be no queues). Would be useful in some tests though, for example 
TestSkipOverseerOperations.
-      // Probably better to rethink what types of stats are expected from a 
distributed system rather than trying to present
+      // This would replace the stats about cluster state updates that the 
Collection API currently
+      // makes available using
+      // the OVERSEERSTATUS command, but obviously would be per node and will 
not have stats about
+      // queues (since there
+      // will be no queues). Would be useful in some tests though, for example
+      // TestSkipOverseerOperations.
+      // Probably better to rethink what types of stats are expected from a 
distributed system
+      // rather than trying to present
       // those previously provided by a central server in the system (the 
Overseer).

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -146,19 +157,21 @@ public void setTaskRunning(String asyncId) {
       throw new SolrException(SERVER_ERROR, "Error setting async task as 
running " + asyncId, ke);
     } catch (InterruptedException ie) {
       Thread.currentThread().interrupt();
-      throw new SolrException(SERVER_ERROR, "Interrupted setting async task as 
running " + asyncId, ie);
+      throw new SolrException(
+          SERVER_ERROR, "Interrupted setting async task as running " + 
asyncId, ie);
     }
   }
 
   /**
-   * Mark the completion (success or error) of an async task. The success or 
error is judged by the contents
-   * of the {@link OverseerSolrResponse}.
+   * Mark the completion (success or error) of an async task. The success or 
error is judged by the
+   * contents of the {@link OverseerSolrResponse}.
    */
   public void setTaskCompleted(String asyncId, OverseerSolrResponse 
solrResponse) {
     if (asyncId == null) {
       return;
     }
-    // First update the persistent node with the execution result, only then 
remove the transient node
+    // First update the persistent node with the execution result, only then 
remove the transient
+    // node
     // (otherwise a status check might report the task in error)

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -226,15 +244,20 @@ public void cancelAsyncId(String asyncId) {
       return new Pair<>(RequestStatusState.RUNNING, null);
     }
 
-    // The task has failed, but there are two options: if response is null, it 
has failed because the node on which it was
-    // running has crashed. If it is not null, it has failed because the 
execution has failed. Because caller expects a non
+    // The task has failed, but there are two options: if response is null, it 
has failed because
+    // the node on which it was
+    // running has crashed. If it is not null, it has failed because the 
execution has failed.
+    // Because caller expects a non
     // null response in any case, let's make up one if needed...

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -56,12 +56,14 @@
   private final DocCollection collection;
   private final String collectionName;
 
-  // Key structure. For each node, list all replicas on it regardless of 
whether they have the property or not.
+  // Key structure. For each node, list all replicas on it regardless of 
whether they have the
+  // property or not.
   private final Map<String, List<SliceReplica>> nodesHostingReplicas = new 
HashMap<>();
   // Key structure. For each node, a list of the replicas _currently_ hosting 
the property.
   private final Map<String, List<SliceReplica>> nodesHostingProp = new 
HashMap<>();
   Set<String> shardsNeedingHosts = new HashSet<>();
-  Map<String, Slice> changedSlices = new HashMap<>(); // Work on copies rather 
than the underlying cluster state.
+  Map<String, Slice> changedSlices =
+      new HashMap<>(); // Work on copies rather than the underlying cluster 
state.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -325,7 +369,8 @@ boolean balanceProperty() {
     // So, remove a replica from the nodes that have too many
     removeOverallocatedReplicas();
 
-    // prune replicas belonging to a slice that have the property currently 
assigned from the list of replicas
+    // prune replicas belonging to a slice that have the property currently 
assigned from the list
+    // of replicas
     // that could host the property.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -93,48 +95,57 @@ public DistributedApiAsyncTracker(SolrZkClient zkClient, 
String rootPath) {
     persistentIdsPath = rootPath + ZK_ASYNC_PERSISTENT;
     inFlightIdsPath = rootPath + ZK_ASYNC_INFLIGHT;
 
-    trackedAsyncTasks = new SizeLimitedDistributedMap(zkClient, 
persistentIdsPath, maxTrackedTasks, null);
+    trackedAsyncTasks =
+        new SizeLimitedDistributedMap(zkClient, persistentIdsPath, 
maxTrackedTasks, null);
     inFlightAsyncTasks = new InFlightJobs(zkClient, inFlightIdsPath);
   }
 
   /**
-   * After a successful call to this method, caller MUST eventually call 
{@link #setTaskCompleted} or {@link #cancelAsyncId}
-   * otherwise the task will forever be considered as in progress.
+   * After a successful call to this method, caller MUST eventually call 
{@link #setTaskCompleted}
+   * or {@link #cancelAsyncId} otherwise the task will forever be considered 
as in progress.
+   *
    * @param asyncId if {@code null} this method will do nothing.
-   * @return {@code true} if the asyncId was not already in use (or is {@code 
null}) and {@code false} if it is already
-   * in use and can't be allocated again.
+   * @return {@code true} if the asyncId was not already in use (or is {@code 
null}) and {@code
+   *     false} if it is already in use and can't be allocated again.
    */
   public boolean createNewAsyncJobTracker(String asyncId) {
     if (asyncId == null) {
       return true;
     }
     try {
-      // First create the persistent node, with no content. If that fails, it 
means the asyncId has been previously used
+      // First create the persistent node, with no content. If that fails, it 
means the asyncId has
+      // been previously used
       // and not yet cleared...

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -208,7 +234,8 @@ private void balanceUnassignedReplicas() {
       SliceReplica srToChange = null;
       for (String slice : shardsNeedingHosts) {
         for (Map.Entry<String, List<SliceReplica>> ent : 
nodesHostingReplicas.entrySet()) {
-          // A little tricky. If we don't set this to something below, then it 
means all possible places to
+          // A little tricky. If we don't set this to something below, then it 
means all possible
+          // places to
           // put this property are full up, so just put it somewhere.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -157,24 +180,28 @@ private boolean collectCurrentPropStats() {
     }
 
     // If the total number of already-hosted properties assigned to nodes
-    // that have potential to host leaders is equal to the slice count _AND_ 
none of the current nodes has more than
+    // that have potential to host leaders is equal to the slice count _AND_ 
none of the current
+    // nodes has more than
     // the max number of properties, there's nothing to do.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -112,11 +128,14 @@ private boolean isActive(Replica replica) {
     return replica.getState() == Replica.State.ACTIVE;
   }
 
-  // Collect a list of all the nodes that _can_ host the indicated property. 
Along the way, also collect any of
-  // the replicas on that node that _already_ host the property as well as any 
slices that do _not_ have the
+  // Collect a list of all the nodes that _can_ host the indicated property. 
Along the way, also
+  // collect any of
+  // the replicas on that node that _already_ host the property as well as any 
slices that do _not_
+  // have the
   // property hosted.
   //
-  // Return true if anything node needs it's property reassigned. False if the 
property is already balanced for
+  // Return true if anything node needs it's property reassigned. False if the 
property is already
+  // balanced for
   // the collection.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -208,11 +217,17 @@ public void run() {
 
     // TODO: Make maxThreads configurable.
 
-    this.tpe = new ExecutorUtil.MDCAwareThreadPoolExecutor(5, 
MAX_PARALLEL_TASKS, 0L, TimeUnit.MILLISECONDS,
-        new SynchronousQueue<>(),
-        new SolrNamedThreadFactory("OverseerThreadFactory"));
-
-    // In OverseerCollectionMessageHandler, a new Session needs to be created 
for each new iteration over the tasks in the
+    this.tpe =
+        new ExecutorUtil.MDCAwareThreadPoolExecutor(
+            5,
+            MAX_PARALLEL_TASKS,
+            0L,
+            TimeUnit.MILLISECONDS,
+            new SynchronousQueue<>(),
+            new SolrNamedThreadFactory("OverseerThreadFactory"));
+
+    // In OverseerCollectionMessageHandler, a new Session needs to be created 
for each new iteration
+    // over the tasks in the
     // queue. Incrementing this id causes a new session to be created there.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -169,20 +173,24 @@ public void run() {
     LeaderStatus isLeader = amILeader();
     while (isLeader == LeaderStatus.DONT_KNOW) {
       log.debug("am_i_leader unclear {}", isLeader);
-      isLeader = amILeader();  // not a no, not a yes, try ask again
+      isLeader = amILeader(); // not a no, not a yes, try ask again
     }
 
     String oldestItemInWorkQueue = null;
-    // hasLeftOverItems - used for avoiding re-execution of async tasks that 
were processed by a previous Overseer.
-    // This variable is set in case there's any task found on the workQueue 
when the OCP starts up and
-    // the id for the queue tail is used as a marker to check for the task in 
completed/failed map in zk.
+    // hasLeftOverItems - used for avoiding re-execution of async tasks that 
were processed by a
+    // previous Overseer.
+    // This variable is set in case there's any task found on the workQueue 
when the OCP starts up
+    // and
+    // the id for the queue tail is used as a marker to check for the task in 
completed/failed map
+    // in zk.
     // Beyond the marker, all tasks can safely be assumed to have never been 
executed.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -169,20 +173,24 @@ public void run() {
     LeaderStatus isLeader = amILeader();
     while (isLeader == LeaderStatus.DONT_KNOW) {
       log.debug("am_i_leader unclear {}", isLeader);
-      isLeader = amILeader();  // not a no, not a yes, try ask again
+      isLeader = amILeader(); // not a no, not a yes, try ask again
     }
 
     String oldestItemInWorkQueue = null;
-    // hasLeftOverItems - used for avoiding re-execution of async tasks that 
were processed by a previous Overseer.
-    // This variable is set in case there's any task found on the workQueue 
when the OCP starts up and
-    // the id for the queue tail is used as a marker to check for the task in 
completed/failed map in zk.
+    // hasLeftOverItems - used for avoiding re-execution of async tasks that 
were processed by a
+    // previous Overseer.
+    // This variable is set in case there's any task found on the workQueue 
when the OCP starts up
+    // and
+    // the id for the queue tail is used as a marker to check for the task in 
completed/failed map
+    // in zk.
     // Beyond the marker, all tasks can safely be assumed to have never been 
executed.
     boolean hasLeftOverItems = true;
 
     try {
       oldestItemInWorkQueue = workQueue.getTailId();
     } catch (KeeperException e) {
-      // We don't need to handle this. This is just a fail-safe which comes in 
handy in skipping already processed
+      // We don't need to handle this. This is just a fail-safe which comes in 
handy in skipping
+      // already processed
       // async calls.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -332,27 +336,33 @@ final public void doRecovery(SolrCore core) throws 
Exception {
     }
   }
 
-  final private void doReplicateOnlyRecovery(SolrCore core) throws 
InterruptedException {
+  private final void doReplicateOnlyRecovery(SolrCore core) throws 
InterruptedException {
     final RTimer timer = new RTimer();
     boolean successfulRecovery = false;
 
     // if (core.getUpdateHandler().getUpdateLog() != null) {
-    // SolrException.log(log, "'replicate-only' recovery strategy should only 
be used if no update logs are present, but
+    // SolrException.log(log, "'replicate-only' recovery strategy should only 
be used if no update
+    // logs are present, but
     // this core has one: "
     // + core.getUpdateHandler().getUpdateLog());
     // return;
     // }
-    while (!successfulRecovery && !Thread.currentThread().isInterrupted() && 
!isClosed()) { // don't use interruption or
-                                                                               
             // it will close channels
-                                                                               
             // though
+    while (!successfulRecovery
+        && !Thread.currentThread().isInterrupted()
+        && !isClosed()) { // don't use interruption or
+      // it will close channels
+      // though

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -238,41 +256,50 @@ public void run() {
 
           while (runningTasks.size() > MAX_PARALLEL_TASKS) {
             synchronized (waitLock) {
-              waitLock.wait(100);//wait for 100 ms or till a task is complete
+              waitLock.wait(100); // wait for 100 ms or till a task is complete
             }
             waited = true;
           }
 
-          if (waited)
-            cleanUpWorkQueue();
+          if (waited) cleanUpWorkQueue();
 
           ArrayList<QueueEvent> heads = new ArrayList<>(blockedTasks.size() + 
MAX_PARALLEL_TASKS);
           heads.addAll(blockedTasks.values());
 
-          //If we have enough items in the blocked tasks already, it makes
+          // If we have enough items in the blocked tasks already, it makes
           // no sense to read more items from the work queue. it makes sense
           // to clear out at least a few items in the queue before we read 
more items
           if (heads.size() < MAX_BLOCKED_TASKS) {
-            //instead of reading MAX_PARALLEL_TASKS items always, we should 
only fetch as much as we can execute
-            int toFetch = Math.min(MAX_BLOCKED_TASKS - heads.size(), 
MAX_PARALLEL_TASKS - runningTasks.size());
+            // instead of reading MAX_PARALLEL_TASKS items always, we should 
only fetch as much as
+            // we can execute
+            int toFetch =
+                Math.min(
+                    MAX_BLOCKED_TASKS - heads.size(), MAX_PARALLEL_TASKS - 
runningTasks.size());
             List<QueueEvent> newTasks = workQueue.peekTopN(toFetch, 
excludedTasks, 2000L);
             if (log.isDebugEnabled()) {
               log.debug("Got {} tasks from work-queue : [{}]", 
newTasks.size(), newTasks);
             }
             // heads has at most MAX_BLOCKED_TASKS tasks.
             heads.addAll(newTasks);
           } else {
-            // The sleep below slows down spinning when heads is full from 
previous work dispatch attempt below and no new
-            // tasks got executed (all executors are busy or all waiting tasks 
require locks currently held by executors).
+            // The sleep below slows down spinning when heads is full from 
previous work dispatch
+            // attempt below and no new
+            // tasks got executed (all executors are busy or all waiting tasks 
require locks
+            // currently held by executors).
             //
-            // When heads is not full but no progress was made (no new work 
got dispatched in the for loop below), slowing down
+            // When heads is not full but no progress was made (no new work 
got dispatched in the
+            // for loop below), slowing down
             // of the spinning is done by the wait time in the call to 
workQueue.peekTopN() above.
-            // (at least in theory because the method eventually called from 
there is ZkDistributedQueue.peekElements()
-            // and because it filters out entries that have just completed on 
a Runner thread in a different way than the
-            // predicate based filtering, it can return quickly without 
waiting the configured delay time. Therefore spinning
+            // (at least in theory because the method eventually called from 
there is
+            // ZkDistributedQueue.peekElements()
+            // and because it filters out entries that have just completed on 
a Runner thread in a
+            // different way than the
+            // predicate based filtering, it can return quickly without 
waiting the configured delay
+            // time. Therefore spinning
             // can be observed, likely something to clean up at some point).
             //
-            // If heads is not empty and new tasks appeared in the queue 
there's no delay, workQueue.peekTopN() above will
+            // If heads is not empty and new tasks appeared in the queue 
there's no delay,
+            // workQueue.peekTopN() above will
             // return immediately.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -157,24 +180,28 @@ private boolean collectCurrentPropStats() {
     }
 
     // If the total number of already-hosted properties assigned to nodes
-    // that have potential to host leaders is equal to the slice count _AND_ 
none of the current nodes has more than
+    // that have potential to host leaders is equal to the slice count _AND_ 
none of the current
+    // nodes has more than
     // the max number of properties, there's nothing to do.
     origMaxPropPerNode = collection.getSlices().size() / allHosts.size();
 
     // Some nodes can have one more of the proeprty if the numbers aren't 
exactly even.
     origModulo = collection.getSlices().size() % allHosts.size();
     if (origModulo > 0) {
-      origMaxPropPerNode++;  // have to have some nodes with 1 more property.
+      origMaxPropPerNode++; // have to have some nodes with 1 more property.
     }
 
-    // We can say for sure that we need to rebalance if we don't have as many 
assigned properties as slices.
+    // We can say for sure that we need to rebalance if we don't have as many 
assigned properties as
+    // slices.
     if (assigned != collection.getSlices().size()) {
       return true;
     }
 
     // Make sure there are no more slices at the limit than the "leftovers"
-    // Let's say there's 7 slices and 3 nodes. We need to distribute the 
property as 3 on node1, 2 on node2 and 2 on node3
-    // (3, 2, 2) We need to be careful to not distribute them as 3, 3, 1. 
that's what this check is all about.
+    // Let's say there's 7 slices and 3 nodes. We need to distribute the 
property as 3 on node1, 2
+    // on node2 and 2 on node3
+    // (3, 2, 2) We need to be careful to not distribute them as 3, 3, 1. 
that's what this check is
+    // all about.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ReplicateFromLeader.java
##########
@@ -80,29 +81,36 @@ public void startReplication(boolean switchTransactionLog) {
         pollIntervalStr = "00:00:01";
       }
       if (uinfo.autoCommmitMaxTime != -1) {
-        pollIntervalStr = toPollIntervalStr(uinfo.autoCommmitMaxTime/2);
+        pollIntervalStr = toPollIntervalStr(uinfo.autoCommmitMaxTime / 2);
       } else if (uinfo.autoSoftCommmitMaxTime != -1) {
-        pollIntervalStr = toPollIntervalStr(uinfo.autoSoftCommmitMaxTime/2);
+        pollIntervalStr = toPollIntervalStr(uinfo.autoSoftCommmitMaxTime / 2);
       }
-      log.info("Will start replication from leader with poll interval: {}", 
pollIntervalStr );
+      log.info("Will start replication from leader with poll interval: {}", 
pollIntervalStr);
 
       NamedList<Object> followerConfig = new NamedList<>();
       followerConfig.add("fetchFromLeader", Boolean.TRUE);
 
-      // don't commit on leader version zero for PULL replicas as PULL should 
only get its index state from leader
+      // don't commit on leader version zero for PULL replicas as PULL should 
only get its index
+      // state from leader
       boolean skipCommitOnLeaderVersionZero = switchTransactionLog;
       if (!skipCommitOnLeaderVersionZero) {
         CloudDescriptor cloudDescriptor = 
core.getCoreDescriptor().getCloudDescriptor();
         if (cloudDescriptor != null) {
           Replica replica =
-              
cc.getZkController().getZkStateReader().getCollection(cloudDescriptor.getCollectionName())
-                  
.getSlice(cloudDescriptor.getShardId()).getReplica(cloudDescriptor.getCoreNodeName());
+              cc.getZkController()
+                  .getZkStateReader()
+                  .getCollection(cloudDescriptor.getCollectionName())
+                  .getSlice(cloudDescriptor.getShardId())
+                  .getReplica(cloudDescriptor.getCoreNodeName());
           if (replica != null && replica.getType() == Replica.Type.PULL) {
-            skipCommitOnLeaderVersionZero = true; // only set this to true if 
we're a PULL replica, otherwise use value of switchTransactionLog
+            skipCommitOnLeaderVersionZero =
+                true; // only set this to true if we're a PULL replica, 
otherwise use value of
+            // switchTransactionLog

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/api/ApiBag.java
##########
@@ -115,7 +122,8 @@ protected void attachValueToNode(PathTrie<Api>.Node node, 
Api o) {
         return;
       }
 
-      // If 'o' and 'node.obj' aren't both AnnotatedApi's then we can't 
aggregate the commands, so fallback to the
+      // If 'o' and 'node.obj' aren't both AnnotatedApi's then we can't 
aggregate the commands, so
+      // fallback to the
       // default behavior

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -273,25 +276,23 @@ final private void replicate(String nodeName, SolrCore 
core, ZkNodeProps leaderp
         log.debug("Error in solrcloud_debug block", e);
       }
     }
-
   }
 
-  final private void commitOnLeader(String leaderUrl) throws 
SolrServerException,
-      IOException {
+  private final void commitOnLeader(String leaderUrl) throws 
SolrServerException, IOException {
     try (HttpSolrClient client = recoverySolrClientBuilder(leaderUrl).build()) 
{
       UpdateRequest ureq = new UpdateRequest();
       ureq.setParams(new ModifiableSolrParams());
       // ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, 
true);
-      // ureq.getParams().set(UpdateParams.OPEN_SEARCHER, 
onlyLeaderIndexes);// Why do we need to open searcher if
+      // ureq.getParams().set(UpdateParams.OPEN_SEARCHER, 
onlyLeaderIndexes);// Why do we need to
+      // open searcher if
       // "onlyLeaderIndexes"?

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/ShardLeaderElectionContextBase.java
##########
@@ -89,10 +95,15 @@ public void cancelElection() throws InterruptedException, 
KeeperException {
         // no problem
         try {
           // We need to be careful and make sure we *only* delete our own 
leader registration node.
-          // We do this by using a multi and ensuring the parent znode of the 
leader registration node
-          // matches the version we expect - there is a setData call that 
increments the parent's znode
+          // We do this by using a multi and ensuring the parent znode of the 
leader registration
+          // node
+          // matches the version we expect - there is a setData call that 
increments the parent's
+          // znode
           // version whenever a leader registers.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -541,15 +563,17 @@ public final void doSyncOrReplicateRecovery(SolrCore 
core) throws Exception {
     }
 
     if (recoveringAfterStartup) {
-      // if we're recovering after startup (i.e. we have been down), then we 
need to know what the last versions were
+      // if we're recovering after startup (i.e. we have been down), then we 
need to know what the
+      // last versions were
       // when we went down. We may have received updates since then.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -332,27 +336,33 @@ final public void doRecovery(SolrCore core) throws 
Exception {
     }
   }
 
-  final private void doReplicateOnlyRecovery(SolrCore core) throws 
InterruptedException {
+  private final void doReplicateOnlyRecovery(SolrCore core) throws 
InterruptedException {
     final RTimer timer = new RTimer();
     boolean successfulRecovery = false;
 
     // if (core.getUpdateHandler().getUpdateLog() != null) {
-    // SolrException.log(log, "'replicate-only' recovery strategy should only 
be used if no update logs are present, but
+    // SolrException.log(log, "'replicate-only' recovery strategy should only 
be used if no update
+    // logs are present, but
     // this core has one: "
     // + core.getUpdateHandler().getUpdateLog());
     // return;
     // }
-    while (!successfulRecovery && !Thread.currentThread().isInterrupted() && 
!isClosed()) { // don't use interruption or
-                                                                               
             // it will close channels
-                                                                               
             // though
+    while (!successfulRecovery
+        && !Thread.currentThread().isInterrupted()
+        && !isClosed()) { // don't use interruption or
+      // it will close channels
+      // though
       try {
         CloudDescriptor cloudDesc = this.coreDescriptor.getCloudDescriptor();
-        ZkNodeProps leaderprops = 
zkStateReader.getLeaderRetry(cloudDesc.getCollectionName(), 
cloudDesc.getShardId());
+        ZkNodeProps leaderprops =
+            zkStateReader.getLeaderRetry(cloudDesc.getCollectionName(), 
cloudDesc.getShardId());
         final String leaderUrl = ZkCoreNodeProps.getCoreUrl(leaderprops);
         final String ourUrl = ZkCoreNodeProps.getCoreUrl(baseUrl, coreName);
 
-        boolean isLeader = ourUrl.equals(leaderUrl); // TODO: We can probably 
delete most of this code if we say this
-                                                     // strategy can only be 
used for pull replicas
+        boolean isLeader =
+            ourUrl.equals(
+                leaderUrl); // TODO: We can probably delete most of this code 
if we say this
+        // strategy can only be used for pull replicas

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java
##########
@@ -310,30 +336,44 @@ else if (LeaderStatus.YES != isLeader) {
                 byte[] data = head.second();
                 final ZkNodeProps message = ZkNodeProps.load(data);
                 if (log.isDebugEnabled()) {
-                  log.debug("processMessage: queueSize: {}, message = {}", 
stateUpdateQueue.getZkStats().getQueueLength(), message);
+                  log.debug(
+                      "processMessage: queueSize: {}, message = {}",
+                      stateUpdateQueue.getZkStats().getQueueLength(),
+                      message);
                 }
 
                 processedNodes.add(head.first());
                 fallbackQueueSize = processedNodes.size();
-                // force flush to ZK after each message because there is no 
fallback if workQueue items
+                // force flush to ZK after each message because there is no 
fallback if workQueue
+                // items
                 // are removed from workQueue but fail to be written to ZK
                 while (unprocessedMessages.size() > 0) {
                   clusterState = zkStateWriter.writePendingUpdates();
                   Message m = unprocessedMessages.remove(0);
                   clusterState = m.run(clusterState, Overseer.this);
                 }
                 // The callback always be called on this thread
-                clusterState = processQueueItem(message, clusterState, 
zkStateWriter, true, () -> {
-                  stateUpdateQueue.remove(processedNodes);
-                  processedNodes.clear();
-                });
+                clusterState =
+                    processQueueItem(
+                        message,
+                        clusterState,
+                        zkStateWriter,
+                        true,
+                        () -> {
+                          stateUpdateQueue.remove(processedNodes);
+                          processedNodes.clear();
+                        });
               }
               if (isClosed) break;
               // if an event comes in the next 100ms batch it together
-              queue = new LinkedList<>(stateUpdateQueue.peekElements(1000, 
100, node -> !processedNodes.contains(node)));
+              queue =
+                  new LinkedList<>(
+                      stateUpdateQueue.peekElements(
+                          1000, 100, node -> !processedNodes.contains(node)));
             }
             fallbackQueueSize = processedNodes.size();
-            // we should force write all pending updates because the next 
iteration might sleep until there
+            // we should force write all pending updates because the next 
iteration might sleep
+            // until there
             // are more items in the main queue

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java
##########
@@ -627,49 +707,73 @@ public Overseer(HttpShardHandler shardHandler,
     this.zkController = zkController;
     this.stats = new Stats();
     this.config = config;
-    this.distributedClusterStateUpdater = new 
DistributedClusterStateUpdater(config.getDistributedClusterStateUpdates());
-
-    this.solrMetricsContext = new 
SolrMetricsContext(zkController.getCoreContainer().getMetricManager(), 
SolrInfoBean.Group.overseer.toString(), metricTag);
+    this.distributedClusterStateUpdater =
+        new 
DistributedClusterStateUpdater(config.getDistributedClusterStateUpdates());
+
+    this.solrMetricsContext =
+        new SolrMetricsContext(
+            zkController.getCoreContainer().getMetricManager(),
+            SolrInfoBean.Group.overseer.toString(),
+            metricTag);
   }
 
   public synchronized void start(String id) {
-    MDCLoggingContext.setNode(zkController == null ?
-        null :
-        zkController.getNodeName());
+    MDCLoggingContext.setNode(zkController == null ? null : 
zkController.getNodeName());
     this.id = id;
     closed = false;
     doClose();
     stats = new Stats();
     log.info("Overseer (id={}) starting", id);
     createOverseerNode(reader.getZkClient());
-    //launch cluster state updater thread
+    // launch cluster state updater thread
     ThreadGroup tg = new ThreadGroup("Overseer state updater.");
-    updaterThread = new OverseerThread(tg, new ClusterStateUpdater(reader, id, 
stats), "OverseerStateUpdate-" + id);
+    updaterThread =
+        new OverseerThread(
+            tg, new ClusterStateUpdater(reader, id, stats), 
"OverseerStateUpdate-" + id);
     updaterThread.setDaemon(true);
 
     ThreadGroup ccTg = new ThreadGroup("Overseer collection creation 
process.");
 
-    // Below is the only non test usage of the "cluster state update" queue 
even when distributed cluster state updates are enabled.
-    // That queue is used to tell the Overseer to quit. As long as we have an 
Overseer, we need to support this.
-    OverseerNodePrioritizer overseerPrioritizer = new 
OverseerNodePrioritizer(reader, this, adminPath, 
shardHandler.getShardHandlerFactory());
-    overseerCollectionConfigSetProcessor = new 
OverseerCollectionConfigSetProcessor(reader, id, shardHandler, adminPath, 
stats, Overseer.this, overseerPrioritizer, solrMetricsContext);
-    ccThread = new OverseerThread(ccTg, overseerCollectionConfigSetProcessor, 
"OverseerCollectionConfigSetProcessor-" + id);
+    // Below is the only non test usage of the "cluster state update" queue 
even when distributed
+    // cluster state updates are enabled.
+    // That queue is used to tell the Overseer to quit. As long as we have an 
Overseer, we need to
+    // support this.
+    OverseerNodePrioritizer overseerPrioritizer =

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -365,27 +411,37 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
 
       // Note we DO NOT track nor use the live nodes in the cluster state.
       // That may means the two abstractions (collection metadata vs. nodes) 
should be separated.
-      // For now trying to diverge as little as possible from existing data 
structures and code given the need to
-      // support both the old way (Overseer) and new way (distributed) of 
handling cluster state update.
+      // For now trying to diverge as little as possible from existing data 
structures and code
+      // given the need to
+      // support both the old way (Overseer) and new way (distributed) of 
handling cluster state
+      // update.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java
##########
@@ -1047,19 +1172,26 @@ private void createOverseerNode(final SolrZkClient 
zkClient) {
       throw new RuntimeException(e);
     }
   }
-  
+
   public ZkStateReader getZkStateReader() {
     return reader;
   }
 
   public void offerStateUpdate(byte[] data) throws KeeperException, 
InterruptedException {
-    // When cluster state update is distributed, the Overseer cluster state 
update queue should only ever receive QUIT messages.
+    // When cluster state update is distributed, the Overseer cluster state 
update queue should only
+    // ever receive QUIT messages.
     // These go to sendQuitToOverseer for execution path clarity.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -365,27 +411,37 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
 
       // Note we DO NOT track nor use the live nodes in the cluster state.
       // That may means the two abstractions (collection metadata vs. nodes) 
should be separated.
-      // For now trying to diverge as little as possible from existing data 
structures and code given the need to
-      // support both the old way (Overseer) and new way (distributed) of 
handling cluster state update.
+      // For now trying to diverge as little as possible from existing data 
structures and code
+      // given the need to
+      // support both the old way (Overseer) and new way (distributed) of 
handling cluster state
+      // update.
       final Set<String> liveNodes = Collections.emptySet();
 
-      // Per Replica States updates are done before all other updates and not 
subject to the number of attempts of CAS
-      // made here, given they have their own CAS strategy and implementation 
(see PerReplicaStatesOps.persist()).
+      // Per Replica States updates are done before all other updates and not 
subject to the number
+      // of attempts of CAS
+      // made here, given they have their own CAS strategy and implementation 
(see
+      // PerReplicaStatesOps.persist()).
       boolean firstAttempt = true;
 
-      // When there are multiple retries of state.json write and the cluster 
state gets updated over and over again with
+      // When there are multiple retries of state.json write and the cluster 
state gets updated over
+      // and over again with
       // the changes done in the per replica states, we avoid refetching those 
multiple times.
       PerReplicaStates fetchedPerReplicaStates = null;
 
-      // Later on (when Collection API commands are distributed) we will have 
to rely on the version of state.json
-      // to implement the replacement of Collection API locking. Then we 
should not blindly retry cluster state updates
-      // as we do here but instead intelligently fail (or retry completely) 
the Collection API call when seeing that
+      // Later on (when Collection API commands are distributed) we will have 
to rely on the version
+      // of state.json
+      // to implement the replacement of Collection API locking. Then we 
should not blindly retry
+      // cluster state updates
+      // as we do here but instead intelligently fail (or retry completely) 
the Collection API call
+      // when seeing that
       // state.json was changed by a concurrent command execution.
-      // The loop below is ok for distributing cluster state updates from 
Overseer to all nodes while Collection API
+      // The loop below is ok for distributing cluster state updates from 
Overseer to all nodes
+      // while Collection API
       // commands are still executed on the Overseer and manage their locking 
the old fashioned way.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -394,9 +450,12 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
           initialClusterState = fetchStateForCollection();
         }
 
-        // Apply the desired changes. Note that the cluster state passed to 
the chain of mutators is totally up to date
-        // (it's read from ZK just above). So assumptions made in the mutators 
(like SliceMutator.removeReplica() deleting
-        // the whole collection if it's not found) are ok. Actually in the 
removeReplica case, the collection will always
+        // Apply the desired changes. Note that the cluster state passed to 
the chain of mutators is
+        // totally up to date
+        // (it's read from ZK just above). So assumptions made in the mutators 
(like
+        // SliceMutator.removeReplica() deleting
+        // the whole collection if it's not found) are ok. Actually in the 
removeReplica case, the
+        // collection will always
         // exist otherwise the call to fetchStateForCollection() above would 
have failed.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -365,27 +411,37 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
 
       // Note we DO NOT track nor use the live nodes in the cluster state.
       // That may means the two abstractions (collection metadata vs. nodes) 
should be separated.
-      // For now trying to diverge as little as possible from existing data 
structures and code given the need to
-      // support both the old way (Overseer) and new way (distributed) of 
handling cluster state update.
+      // For now trying to diverge as little as possible from existing data 
structures and code
+      // given the need to
+      // support both the old way (Overseer) and new way (distributed) of 
handling cluster state
+      // update.
       final Set<String> liveNodes = Collections.emptySet();
 
-      // Per Replica States updates are done before all other updates and not 
subject to the number of attempts of CAS
-      // made here, given they have their own CAS strategy and implementation 
(see PerReplicaStatesOps.persist()).
+      // Per Replica States updates are done before all other updates and not 
subject to the number
+      // of attempts of CAS
+      // made here, given they have their own CAS strategy and implementation 
(see
+      // PerReplicaStatesOps.persist()).
       boolean firstAttempt = true;
 
-      // When there are multiple retries of state.json write and the cluster 
state gets updated over and over again with
+      // When there are multiple retries of state.json write and the cluster 
state gets updated over
+      // and over again with
       // the changes done in the per replica states, we avoid refetching those 
multiple times.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -441,58 +509,83 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
           return; // state.json updated successfully.
         } catch (KeeperException.BadVersionException bve) {
           if (updater.isCollectionCreation()) {
-            // Not expecting to see this exception when creating new 
state.json fails, so throwing it up the food chain.
+            // Not expecting to see this exception when creating new 
state.json fails, so throwing
+            // it up the food chain.
             throw bve;
           }
         }
-        // We've tried to update an existing state.json and got a 
BadVersionException. We'll try again a few times.
-        // When only two threads compete, no point in waiting: if we lost this 
time we'll get it next time right away.
-        // But if more threads compete, then waiting a bit (random delay) can 
improve our chances. The delay should in
-        // theory grow as the number of concurrent threads attempting updates 
increase, but we don't know that number, so
+        // We've tried to update an existing state.json and got a 
BadVersionException. We'll try
+        // again a few times.
+        // When only two threads compete, no point in waiting: if we lost this 
time we'll get it
+        // next time right away.
+        // But if more threads compete, then waiting a bit (random delay) can 
improve our chances.
+        // The delay should in
+        // theory grow as the number of concurrent threads attempting updates 
increase, but we don't
+        // know that number, so
         // doing exponential backoff instead.
-        // With "per replica states" collections, concurrent attempts of even 
just two threads are expected to be extremely rare.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -441,58 +509,83 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
           return; // state.json updated successfully.
         } catch (KeeperException.BadVersionException bve) {
           if (updater.isCollectionCreation()) {
-            // Not expecting to see this exception when creating new 
state.json fails, so throwing it up the food chain.
+            // Not expecting to see this exception when creating new 
state.json fails, so throwing
+            // it up the food chain.
             throw bve;
           }
         }
-        // We've tried to update an existing state.json and got a 
BadVersionException. We'll try again a few times.
-        // When only two threads compete, no point in waiting: if we lost this 
time we'll get it next time right away.
-        // But if more threads compete, then waiting a bit (random delay) can 
improve our chances. The delay should in
-        // theory grow as the number of concurrent threads attempting updates 
increase, but we don't know that number, so
+        // We've tried to update an existing state.json and got a 
BadVersionException. We'll try
+        // again a few times.
+        // When only two threads compete, no point in waiting: if we lost this 
time we'll get it
+        // next time right away.
+        // But if more threads compete, then waiting a bit (random delay) can 
improve our chances.
+        // The delay should in
+        // theory grow as the number of concurrent threads attempting updates 
increase, but we don't
+        // know that number, so
         // doing exponential backoff instead.
-        // With "per replica states" collections, concurrent attempts of even 
just two threads are expected to be extremely rare.
-        Thread.sleep(CollectionHandlingUtils.RANDOM.nextInt(attempt < 13 ? 1 
<< attempt : 1 << 13)); // max wait 2^13ms=8.192 sec
+        // With "per replica states" collections, concurrent attempts of even 
just two threads are
+        // expected to be extremely rare.
+        Thread.sleep(
+            CollectionHandlingUtils.RANDOM.nextInt(
+                attempt < 13 ? 1 << attempt : 1 << 13)); // max wait 
2^13ms=8.192 sec
       }
 
-      // We made quite a few attempts but failed repeatedly. This is pretty 
bad but we can't loop trying forever.
-      // Offering a job to the Overseer wouldn't usually fail if the ZK queue 
can be written to (but the Overseer can then
+      // We made quite a few attempts but failed repeatedly. This is pretty 
bad but we can't loop
+      // trying forever.
+      // Offering a job to the Overseer wouldn't usually fail if the ZK queue 
can be written to (but
+      // the Overseer can then
       // loop forever attempting the update).
-      // We do want whoever called us to fail right away rather than to wait 
for a cluster change and timeout because it
-      // didn't happen. Likely need to review call by call what is the 
appropriate behaviour, especially once Collection
-      // API is distributed (because then the Collection API call will fail if 
the underlying cluster state update cannot
+      // We do want whoever called us to fail right away rather than to wait 
for a cluster change
+      // and timeout because it
+      // didn't happen. Likely need to review call by call what is the 
appropriate behaviour,
+      // especially once Collection
+      // API is distributed (because then the Collection API call will fail if 
the underlying
+      // cluster state update cannot
       // be done, and that's a desirable thing).

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -681,11 +820,17 @@ public void computeUpdates(ClusterState clusterState, 
SolrZkClient client) {
               perReplicaStateOps.add(zkcmd.ops);
             }
           } catch (Exception e) {
-            // Seems weird to skip rather than fail, but that's what Overseer 
is doing (see ClusterStateUpdater.processQueueItem()).
-            // Maybe in the new distributed update world we should make the 
caller fail? (something Overseer cluster state updater can't do)
-            // To be reconsidered once Collection API commands are distributed 
because then cluster updates are done synchronously and
+            // Seems weird to skip rather than fail, but that's what Overseer 
is doing (see
+            // ClusterStateUpdater.processQueueItem()).
+            // Maybe in the new distributed update world we should make the 
caller fail? (something
+            // Overseer cluster state updater can't do)
+            // To be reconsidered once Collection API commands are distributed 
because then cluster
+            // updates are done synchronously and
             // have the opportunity to make the Collection API call fail 
directly.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -747,36 +910,47 @@ public void executeStateUpdates(SolrCloudManager scm, 
ZkStateReader zkStateReade
     private List<PerReplicaStatesOps> replicaOpsList = null;
 
     /**
-     * Entry point to mark all replicas of all collections present on a single 
node as being DOWN (because the node is down)
+     * Entry point to mark all replicas of all collections present on a single 
node as being DOWN
+     * (because the node is down)
      */
     public static void executeNodeDownStateUpdate(String nodeName, 
ZkStateReader zkStateReader) {
-      // This code does a version of what NodeMutator.downNode() is doing. We 
can't assume we have a cache of the collections,
-      // so we're going to read all of them from ZK, fetch the state.json for 
each and if it has any replicas on the
+      // This code does a version of what NodeMutator.downNode() is doing. We 
can't assume we have a
+      // cache of the collections,
+      // so we're going to read all of them from ZK, fetch the state.json for 
each and if it has any
+      // replicas on the
       // failed node, do an update (conditional of course) of the state.json
 
-      // For Per Replica States collections there is still a need to read 
state.json, but the update of state.json is replaced
-      // by a few znode deletions and creations. Might be faster or slower 
overall, depending on the number of impacted
+      // For Per Replica States collections there is still a need to read 
state.json, but the update
+      // of state.json is replaced
+      // by a few znode deletions and creations. Might be faster or slower 
overall, depending on the
+      // number of impacted
       // replicas of such a collection and the total size of that collection's 
state.json.
 
-      // Note code here also has to duplicate some of the work done in 
ZkStateReader because ZkStateReader couples reading of
-      // the cluster state and maintaining a cached copy of the cluster state. 
Something likely to be refactored later (once
+      // Note code here also has to duplicate some of the work done in 
ZkStateReader because
+      // ZkStateReader couples reading of
+      // the cluster state and maintaining a cached copy of the cluster state. 
Something likely to
+      // be refactored later (once
       // Overseer is totally removed and Zookeeper access patterns become 
clearer).

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -441,58 +509,83 @@ private void applyUpdate() throws KeeperException, 
InterruptedException {
           return; // state.json updated successfully.
         } catch (KeeperException.BadVersionException bve) {
           if (updater.isCollectionCreation()) {
-            // Not expecting to see this exception when creating new 
state.json fails, so throwing it up the food chain.
+            // Not expecting to see this exception when creating new 
state.json fails, so throwing
+            // it up the food chain.
             throw bve;
           }
         }
-        // We've tried to update an existing state.json and got a 
BadVersionException. We'll try again a few times.
-        // When only two threads compete, no point in waiting: if we lost this 
time we'll get it next time right away.
-        // But if more threads compete, then waiting a bit (random delay) can 
improve our chances. The delay should in
-        // theory grow as the number of concurrent threads attempting updates 
increase, but we don't know that number, so
+        // We've tried to update an existing state.json and got a 
BadVersionException. We'll try
+        // again a few times.
+        // When only two threads compete, no point in waiting: if we lost this 
time we'll get it
+        // next time right away.
+        // But if more threads compete, then waiting a bit (random delay) can 
improve our chances.
+        // The delay should in
+        // theory grow as the number of concurrent threads attempting updates 
increase, but we don't
+        // know that number, so
         // doing exponential backoff instead.
-        // With "per replica states" collections, concurrent attempts of even 
just two threads are expected to be extremely rare.
-        Thread.sleep(CollectionHandlingUtils.RANDOM.nextInt(attempt < 13 ? 1 
<< attempt : 1 << 13)); // max wait 2^13ms=8.192 sec
+        // With "per replica states" collections, concurrent attempts of even 
just two threads are
+        // expected to be extremely rare.
+        Thread.sleep(
+            CollectionHandlingUtils.RANDOM.nextInt(
+                attempt < 13 ? 1 << attempt : 1 << 13)); // max wait 
2^13ms=8.192 sec
       }
 
-      // We made quite a few attempts but failed repeatedly. This is pretty 
bad but we can't loop trying forever.
-      // Offering a job to the Overseer wouldn't usually fail if the ZK queue 
can be written to (but the Overseer can then
+      // We made quite a few attempts but failed repeatedly. This is pretty 
bad but we can't loop
+      // trying forever.
+      // Offering a job to the Overseer wouldn't usually fail if the ZK queue 
can be written to (but
+      // the Overseer can then
       // loop forever attempting the update).
-      // We do want whoever called us to fail right away rather than to wait 
for a cluster change and timeout because it
-      // didn't happen. Likely need to review call by call what is the 
appropriate behaviour, especially once Collection
-      // API is distributed (because then the Collection API call will fail if 
the underlying cluster state update cannot
+      // We do want whoever called us to fail right away rather than to wait 
for a cluster change
+      // and timeout because it
+      // didn't happen. Likely need to review call by call what is the 
appropriate behaviour,
+      // especially once Collection
+      // API is distributed (because then the Collection API call will fail if 
the underlying
+      // cluster state update cannot
       // be done, and that's a desirable thing).
-      throw new 
KeeperException.BadVersionException(ZkStateReader.getCollectionPath(updater.getCollectionName()));
+      throw new KeeperException.BadVersionException(
+          ZkStateReader.getCollectionPath(updater.getCollectionName()));
     }
 
     /**
-     * After the computing of the new {@link ClusterState} containing all 
needed updates to the collection based on what the
-     * {@link StateChangeCalculator} computed, this method does an update in 
ZK to the collection's {@code state.json}. It is the
-     * equivalent of Overseer's {@link ZkStateWriter#writePendingUpdates} (in 
its actions related to {@code state.json}
-     * as opposed to the per replica states).
-     * <p>
-     * Note that in a similar way to what happens in {@link 
ZkStateWriter#writePendingUpdates}, collection delete is handled
-     * as a special case. (see comment on {@link 
DistributedClusterStateUpdater.StateChangeRecorder.RecordedMutationsPlayer}
-     * on why the code has to be duplicated)<p>
+     * After the computing of the new {@link ClusterState} containing all 
needed updates to the
+     * collection based on what the {@link StateChangeCalculator} computed, 
this method does an
+     * update in ZK to the collection's {@code state.json}. It is the 
equivalent of Overseer's
+     * {@link ZkStateWriter#writePendingUpdates} (in its actions related to 
{@code state.json} as
+     * opposed to the per replica states).
+     *
+     * <p>Note that in a similar way to what happens in {@link 
ZkStateWriter#writePendingUpdates},
+     * collection delete is handled as a special case. (see comment on {@link
+     * 
DistributedClusterStateUpdater.StateChangeRecorder.RecordedMutationsPlayer} on 
why the code
+     * has to be duplicated)
      *
-     * <b>Note for the future:</b> Given this method is where the actually 
write to ZK is done, that's the place where we
-     * can rebuild a DocCollection with updated zk version. Eventually if we 
maintain a cache of recently used collections,
-     * we want to capture the updated collection and put it in the cache to 
avoid reading it again (unless it changed,
-     * the CAS will fail and we will refresh).<p>
+     * <p><b>Note for the future:</b> Given this method is where the actually 
write to ZK is done,
+     * that's the place where we can rebuild a DocCollection with updated zk 
version. Eventually if
+     * we maintain a cache of recently used collections, we want to capture 
the updated collection
+     * and put it in the cache to avoid reading it again (unless it changed, 
the CAS will fail and
+     * we will refresh).
      *
-     * This could serve as the basis for a strategy where each node does not 
need any view of all collections in the cluster
-     * but only a cache of recently used collections (possibly not even 
needing watches on them, but we'll discuss this later).
+     * <p>This could serve as the basis for a strategy where each node does 
not need any view of all
+     * collections in the cluster but only a cache of recently used 
collections (possibly not even
+     * needing watches on them, but we'll discuss this later).
      */
-    private void doStateDotJsonCasUpdate(ClusterState updatedState) throws 
KeeperException, InterruptedException {
+    private void doStateDotJsonCasUpdate(ClusterState updatedState)
+        throws KeeperException, InterruptedException {
       String jsonPath = 
ZkStateReader.getCollectionPath(updater.getCollectionName());
 
       // Collection delete
       if (!updatedState.hasCollection(updater.getCollectionName())) {
-        // We do not have a collection znode version to test we delete the 
right version of state.json. But this doesn't really matter:
-        // if we had one, and the delete failed (because state.json got 
updated in the meantime), we would re-read the collection
-        // state, update our version, run the CAS delete again and it will 
pass. Which means that one way or another, deletes are final.
-        // I hope nobody deletes a collection then creates a new one with the 
same name immediately (although the creation should fail
-        // if the znode still exists, so the creation would only succeed after 
the delete made it, and we're ok).
-        // With Overseer based updates the same behavior can be observed: a 
collection update is enqueued followed by the
+        // We do not have a collection znode version to test we delete the 
right version of
+        // state.json. But this doesn't really matter:
+        // if we had one, and the delete failed (because state.json got 
updated in the meantime), we
+        // would re-read the collection
+        // state, update our version, run the CAS delete again and it will 
pass. Which means that
+        // one way or another, deletes are final.
+        // I hope nobody deletes a collection then creates a new one with the 
same name immediately
+        // (although the creation should fail
+        // if the znode still exists, so the creation would only succeed after 
the delete made it,
+        // and we're ok).
+        // With Overseer based updates the same behavior can be observed: a 
collection update is
+        // enqueued followed by the
         // collection delete before the update was executed.

Review comment:
       Fix this

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/ZkDistributedLockFactory.java
##########
@@ -35,11 +35,15 @@
 
   protected DistributedLock doCreateLock(boolean isWriteLock, String lockPath) 
{
     try {
-      // TODO optimize by first attempting to create the ZkDistributedLock 
without calling makeLockPath() and only call it
-      //  if the lock creation fails. This will be less costly on high 
contention (and slightly more on low contention)
+      // TODO optimize by first attempting to create the ZkDistributedLock 
without calling
+      // makeLockPath() and only call it
+      //  if the lock creation fails. This will be less costly on high 
contention (and slightly more
+      // on low contention)

Review comment:
       Fix this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to