Re: [PR] SOLR-18179: Better highlight and expand upon our Cluster concepts in the Ref Guide [solr]

via GitHub Sun, 29 Mar 2026 14:24:19 -0700


gus-asf commented on code in PR #4246:
URL: https://github.com/apache/solr/pull/4246#discussion_r3006758621



##########
solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc:
##########
@@ -44,28 +44,37 @@ These control the inclusion or exclusion of keywords in a 
query by using operato
 [[SolrGlossary-C]]
 === C
 
-[[cluster]]Cluster::
+[[cluster]]xref:cluster-types.adoc[Cluster]::
 In Solr, a cluster is a set of Solr nodes operating in coordination with each 
other via <<zookeeper,ZooKeeper>>, and managed as a unit.
 A cluster may contain many collections.
-See also <<solrclouddef,SolrCloud>>.
+See also xref:cluster-types.adoc[] and <<solrclouddef,SolrCloud>>.
 
 [[collection]]Collection::
-In Solr, one or more <<document,Documents>> grouped together in a single 
logical index using a single configuration and Schema.
+The complete logical set of searchable documents that share a schema and 
configuration.
 +
-In <<solrclouddef,SolrCloud>> a collection may be divided up into multiple 
logical shards, which may in turn be distributed across many nodes.
+In <<solrclouddef,SolrCloud>>, a collection may be divided up into multiple 
logical <<shard,shards>>, which may in turn be distributed across many 
<<node,nodes>> for scalability and fault tolerance.
+Each collection encompasses all the shards and their <<replica,replicas>>.
 +
-Single-node installations and user-managed clusters use instead the concept of 
a <<core,Core>>.
-"Collection" is most frequently used in the SolrCloud context, but as it 
represents a "logical index", the term may be used to refer to individual cores 
in a user-managed cluster as well.
+Single-node installations and user-managed clusters do not manage collections 
as first-class entities; instead they work directly with individual 
<<core,cores>>.
 +
 [[defcommit]]Commit::
 To make document changes permanent in the index.
 In the case of added documents, they would be searchable after a _commit_.
 
 [[core]]Core::
-An individual Solr instance (represents a logical index).
-Multiple cores can run on a single node.
+In Solr's implementation, a core is the physical instance that represents a 
<<replica,Replica>>.

Review Comment:
   Same comment as in cluster-types.adoc



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+
+In this example, a collection is divided into 2 shards, each shard has 2 
replicas for redundancy, and each replica maintains its own Lucene index on 
disk.
+
+== SolrCloud Mode
+
+SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the 
centralized cluster management that is its main feature.
+ZooKeeper tracks each node of the cluster and the state of each core on each 
node.
+
+In this mode, configuration files are stored in ZooKeeper and not on the file 
system of each node.
+When configuration changes are made, they must be uploaded to ZooKeeper, which 
in turn makes sure each node knows changes have been made.
+
+SolrCloud manages collections as first-class entities.
+A collection represents the entire group of shards and replicas that together 
provide access to a corpus of documents.
+Collections share the same configurations (schema, `solrconfig.xml`, etc.).
+This centralization of cluster management means that operations can be 
performed on the entire collection at one time.
+
+When changes are made to configurations, a single command to reload the 
collection will automatically reload each individual core (replica) that is a 
member of the collection.
+
+Sharding is handled automatically, simply by telling Solr during collection 
creation how many shards you'd like the collection to have.
+Document updates are then generally balanced between each shard automatically.
+Some degree of control over what documents are stored in which shards is also 
available, if needed.
+
+ZooKeeper also handles load balancing and failover.
+Incoming requests, either to index documents or for user queries, can be sent 
to any node of the cluster and ZooKeeper will route the request to an 
appropriate replica of each shard.
+
+In SolrCloud, the leader is flexible, with built-in mechanisms for automatic 
leader election in case the current leader fails.

Review Comment:
   "the leader replica within a shard is flexible" - let's keep the leader's 
domain crystal clear.



##########
solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc:
##########
@@ -179,12 +201,17 @@ Logic and configuration parameters that tell Solr how to 
handle incoming "reques
 Logic and configuration parameters used by request handlers to process query 
requests.
 Examples of search components include faceting, highlighting, and "more like 
this" functionality.
 
+[[server]]Server::
+The hardware or virtual machine that hosts Solr software.
+A server may run one or more Solr <<node,Nodes>>.
+
 [[shard]]Shard::
-In SolrCloud, a logical partition of a single <<collection,Collection>>.
-Every shard consists of at least one physical <<replica,Replica>>, but there 
may be multiple Replicas distributed across multiple <<node,Nodes>> for fault 
tolerance.
+A logical slice of a <<collection,Collection>>.
+Each shard represents a logical partition containing a subset of the 
collection's documents.

Review Comment:
   ```suggestion
   Each shard represents a partition containing a subset of the collection's 
documents.
   ```



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+
+In this example, a collection is divided into 2 shards, each shard has 2 
replicas for redundancy, and each replica maintains its own Lucene index on 
disk.
+
+== SolrCloud Mode
+
+SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the 
centralized cluster management that is its main feature.
+ZooKeeper tracks each node of the cluster and the state of each core on each 
node.
+
+In this mode, configuration files are stored in ZooKeeper and not on the file 
system of each node.

Review Comment:
   I'd skip the "like most things" sentence. This will be dealt with in the 
section on user managed clusters.



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.

Review Comment:
   Or maybe:
   
   Shards slice a collection of documents into discrete non-overlapping 
subsets, and may be based on data values you specify or ranges of a hash on the 
document ID.



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+
+In this example, a collection is divided into 2 shards, each shard has 2 
replicas for redundancy, and each replica maintains its own Lucene index on 
disk.
+
+== SolrCloud Mode
+
+SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the 
centralized cluster management that is its main feature.
+ZooKeeper tracks each node of the cluster and the state of each core on each 
node.
+
+In this mode, configuration files are stored in ZooKeeper and not on the file 
system of each node.
+When configuration changes are made, they must be uploaded to ZooKeeper, which 
in turn makes sure each node knows changes have been made.
+
+SolrCloud manages collections as first-class entities.
+A collection represents the entire group of shards and replicas that together 
provide access to a corpus of documents.
+Collections share the same configurations (schema, `solrconfig.xml`, etc.).
+This centralization of cluster management means that operations can be 
performed on the entire collection at one time.
+
+When changes are made to configurations, a single command to reload the 
collection will automatically reload each individual core (replica) that is a 
member of the collection.
+
+Sharding is handled automatically, simply by telling Solr during collection 
creation how many shards you'd like the collection to have.
+Document updates are then generally balanced between each shard automatically.
+Some degree of control over what documents are stored in which shards is also 
available, if needed.
+
+ZooKeeper also handles load balancing and failover.
+Incoming requests, either to index documents or for user queries, can be sent 
to any node of the cluster and ZooKeeper will route the request to an 
appropriate replica of each shard.
+
+In SolrCloud, the leader is flexible, with built-in mechanisms for automatic 
leader election in case the current leader fails.
+This means another replica can become the leader, and from that point forward 
it is the source-of-truth for all other replicas of that shard.
+
+As long as one replica of each relevant shard is available, a user query or 
indexing request can still be satisfied when running in SolrCloud mode.
+
+== User-Managed Mode
+
+Solr's user-managed mode requires that cluster coordination activities that 
SolrCloud normally uses ZooKeeper for be performed manually or with local 
scripts.
+
+If the corpus of documents is too large for a single shard, the logic to 
create multiple shards is entirely left to the user.
+There are no automated or programmatic ways for Solr to create shards during 
indexing.
+
+Routing documents to shards is handled manually, either with a simple hashing 
system or a simple round-robin list of shards that sends each document to a 
different shard.

Review Comment:
   ```suggestion
   Routing documents to shards is handled manually, either with a hashing 
system (that you design and implement), assignment of documents to shards based 
on the value of a field (implicit routing), or a simple round-robin list of 
shards that sends each document to a different shard.
   ```
   (do we want to even mention the round robin case since it makes updates 
challenging/slow? Only valid case I can imagine for that is as an optimization 
for super high volume immutable event data with little or no text analysis 
where the cost of calculating a hash might become significant )



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+

Review Comment:
   I like this tree. It represents the conceptual organization. We need one 
somewhere for the physical organization too i.e. Cluster --> Node --> Replica .



##########
solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc:
##########
@@ -96,6 +105,11 @@ The arrangement of search results into categories based on 
indexed terms.
 [[field]]Field::
 The content to be indexed/searched along with metadata defining how the 
content should be processed by Solr.
 
+[[follower]]Follower::
+A <<replica,Replica>> that is not the <<leader,Leader>> for its 
<<shard,Shard>>.

Review Comment:
   This is replica level in cloud and node level in standalone, which probably 
should be called out and clarified.



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.

Review Comment:
   no need to use "core" here, just say replica



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores

Review Comment:
   I think I would write this section somewhat differently...
   
   > Historically the term "core" has _mostly_ been used as a synonym for 
replica, but the term "core" can be confusing because in everyday English it 
implies something central and singular. Since there may be many replicas in 
Solr, and they are distributed across the cluster "Replica" is the preferred 
term. Core is mostly only used for historical reasons in the code base and 
other places where renaming things would be disruptive.



##########
solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc:
##########
@@ -213,6 +240,12 @@ Synonyms generally are terms which are near to each other 
in meaning and may sub
 In a search engine implementation, synonyms may be abbreviations as well as 
words, or terms that are not consistently hyphenated.
 Examples of synonyms in this context would be "Inc." and "Incorporated" or 
"iPod" and "i-pod".
 
+[[standalone]]Standalone::
+An informal term referring to Solr deployments that do not use 
<<solrclouddef,SolrCloud>> mode.

Review Comment:
   ```suggestion
   An informal term referring to Solr deployments that do not utilize Apache 
Zookeeper and thus do not provide the centralized configuration management that 
is available in <<solrclouddef,SolrCloud>> mode.
   ```



##########
solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc:
##########
@@ -163,7 +181,11 @@ The ability of a search engine to retrieve _all_ of the 
possible matches to a us
 The appropriateness of a document to the search conducted by the user.
 
 [[replica]]Replica::
-A <<core,Core>> that acts as a physical copy of a <<shard,Shard>> in a 
<<solrclouddef,SolrCloud>> <<collection,Collection>>.
+The physical manifestation of a logical <<shard,Shard>>.
+A replica is the actual running instance (represented as a <<core,Core>>) that 
holds and serves the documents belonging to that shard.

Review Comment:
   No need to mention core here, we should promote one favored name for each 
entity.



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.

Review Comment:
   I don't think the word logical actually ads clarity.



##########
solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc:
##########
@@ -114,8 +132,8 @@ Since users search using terms they expect to be in 
documents, finding the term
 === L
 
 [[leader]]Leader::
-A single <<replica,Replica>> for each <<shard,Shard>> that takes charge of 
coordinating index updates (document additions or deletions) to other replicas 
in the same shard.
-This is a transient responsibility assigned to a node via an election, if the 
current Shard Leader goes down, a new node will automatically be elected to 
take its place.
+A single <<replica,Replica>> for each <<shard,Shard>> that serves as the 
source-of-truth and coordinates index updates (document additions or deletions) 
to the <<follower,follower>> replicas in the same shard.

Review Comment:
   Again cloud/standalone differences



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+

Review Comment:
   (maybe put cluster over the top of this one too)



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+
+In this example, a collection is divided into 2 shards, each shard has 2 
replicas for redundancy, and each replica maintains its own Lucene index on 
disk.
+
+== SolrCloud Mode
+
+SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the 
centralized cluster management that is its main feature.
+ZooKeeper tracks each node of the cluster and the state of each core on each 
node.
+
+In this mode, configuration files are stored in ZooKeeper and not on the file 
system of each node.
+When configuration changes are made, they must be uploaded to ZooKeeper, which 
in turn makes sure each node knows changes have been made.
+
+SolrCloud manages collections as first-class entities.
+A collection represents the entire group of shards and replicas that together 
provide access to a corpus of documents.
+Collections share the same configurations (schema, `solrconfig.xml`, etc.).
+This centralization of cluster management means that operations can be 
performed on the entire collection at one time.
+
+When changes are made to configurations, a single command to reload the 
collection will automatically reload each individual core (replica) that is a 
member of the collection.
+
+Sharding is handled automatically, simply by telling Solr during collection 
creation how many shards you'd like the collection to have.
+Document updates are then generally balanced between each shard automatically.
+Some degree of control over what documents are stored in which shards is also 
available, if needed.
+
+ZooKeeper also handles load balancing and failover.
+Incoming requests, either to index documents or for user queries, can be sent 
to any node of the cluster and ZooKeeper will route the request to an 
appropriate replica of each shard.
+
+In SolrCloud, the leader is flexible, with built-in mechanisms for automatic 
leader election in case the current leader fails.
+This means another replica can become the leader, and from that point forward 
it is the source-of-truth for all other replicas of that shard.
+
+As long as one replica of each relevant shard is available, a user query or 
indexing request can still be satisfied when running in SolrCloud mode.
+
+== User-Managed Mode
+
+Solr's user-managed mode requires that cluster coordination activities that 
SolrCloud normally uses ZooKeeper for be performed manually or with local 
scripts.

Review Comment:
   "...thus they have no concept of collections, or shards and Zookeeper is not 
used as a centralized storage for any configuration or real-time state"
   
   Leaving out replica on purpose there because there are followers which can 
wind up looking sort of similar and also, I think we can sell/clarify the 
benefit of zookeeper here too.



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+
+In this example, a collection is divided into 2 shards, each shard has 2 
replicas for redundancy, and each replica maintains its own Lucene index on 
disk.
+
+== SolrCloud Mode
+
+SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the 
centralized cluster management that is its main feature.
+ZooKeeper tracks each node of the cluster and the state of each core on each 
node.
+
+In this mode, configuration files are stored in ZooKeeper and not on the file 
system of each node.
+When configuration changes are made, they must be uploaded to ZooKeeper, which 
in turn makes sure each node knows changes have been made.
+
+SolrCloud manages collections as first-class entities.
+A collection represents the entire group of shards and replicas that together 
provide access to a corpus of documents.
+Collections share the same configurations (schema, `solrconfig.xml`, etc.).
+This centralization of cluster management means that operations can be 
performed on the entire collection at one time.
+
+When changes are made to configurations, a single command to reload the 
collection will automatically reload each individual core (replica) that is a 
member of the collection.

Review Comment:
   Actually re-open seems vague to me. Reload has a clear sense of out with the 
old, in with the new (for me at least). But maybe it should say 
   
   "...will automatically reload the configuration for each replica that is a 
member..." 
   
   Since it's not really messing with all the data, just the config?



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.

Review Comment:
   +1 and hyperlink "collection" to the section below



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.

Review Comment:
   Or just be specific?
   
   In special cases where oversized **_pre-existing_** hardware must be 
utilized, a server might host two or more nodes. Note that such configurations 
are typically sub-optimal.



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.

Review Comment:
   "likely" raises the question when does it not have an update log? Can we 
clarify when or omit?
   
   Discussion of what a node does seems better placed in a the node section? 
(with the word replica as a hyperlink to this section). 
   
   "SolrCore" is a class in the code, details like that are developer 
documentation, not relevant to the user. No need to say anything other than 
"replica" here?
   
   I do like the idea of noting that there is one Lucene index per replica 
here, but it seems better (to me) to remain focused on the idea, behind 
"replica" not the implementation.



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+
+In this example, a collection is divided into 2 shards, each shard has 2 
replicas for redundancy, and each replica maintains its own Lucene index on 
disk.
+
+== SolrCloud Mode
+
+SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the 
centralized cluster management that is its main feature.
+ZooKeeper tracks each node of the cluster and the state of each core on each 
node.
+
+In this mode, configuration files are stored in ZooKeeper and not on the file 
system of each node.
+When configuration changes are made, they must be uploaded to ZooKeeper, which 
in turn makes sure each node knows changes have been made.
+
+SolrCloud manages collections as first-class entities.
+A collection represents the entire group of shards and replicas that together 
provide access to a corpus of documents.
+Collections share the same configurations (schema, `solrconfig.xml`, etc.).
+This centralization of cluster management means that operations can be 
performed on the entire collection at one time.
+
+When changes are made to configurations, a single command to reload the 
collection will automatically reload each individual core (replica) that is a 
member of the collection.
+
+Sharding is handled automatically, simply by telling Solr during collection 
creation how many shards you'd like the collection to have.
+Document updates are then generally balanced between each shard automatically.

Review Comment:
   This is optional... implicit routing can still be used with cloud. Perhaps 
this and the previous sentence about automatic sharding could be combined into 
   
   "Collections may also be configured to provide automatic routing of 
documents to shards by hashing document ids and automatically assigning ranges 
of the possible hash values to shards."
   
   Hopefully that captures/sells the value of the automation without 
overstating it as a requirement.



##########
solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc:
##########
@@ -0,0 +1,158 @@
+= Solr Cluster Types
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+A Solr cluster is a group of servers that each run one or more Solr _nodes_.
+
+There are two general modes of operating a cluster of Solr nodes.
+One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), 
while the other allows you to operate a cluster without this central 
coordination (<<User-Managed Mode>>).
+
+TIP: "User Managed" and "Single Node" are sometimes referred to as 
"Standalone", especially in source code.
+
+Both modes share general concepts, but ultimately differ in how those concepts 
are reflected in functionality and features.
+
+First let's cover a few general concepts and then outline the differences 
between the two modes.
+
+== Cluster Concepts
+
+=== Servers and Nodes
+
+A _server_ is the hardware or virtual machine that hosts Solr software.
+A _node_ is an instance of a running Solr process that services search and 
indexing requests.
+Large servers may run multiple Solr nodes, though typically one node per 
server is most common.
+
+=== Shards
+
+In both cluster modes, a logical collection of documents can be divided across 
nodes as _shards_.
+Each shard represents a logical slice of the overall collection and contains a 
subset of the documents.
+
+The number of shards determines the theoretical limit to the number of 
documents that can be stored.
+It also dictates the amount of parallelization possible for an individual 
search request.
+
+=== Replicas
+
+A shard is a logical concept—a slice of your collection.
+A _replica_ is the physical manifestation of that logical shard.
+It is the actual running instance that holds and serves the documents 
belonging to that shard.
+
+A shard must have at least one replica to exist physically.
+If you have one shard with one physical copy, you have one replica.
+If you add redundancy by creating additional copies of that shard, you have 
multiple replicas—each is equally a replica, including the first one.
+
+IMPORTANT: There is no "original shard" separate from its replicas.
+The replicas ARE how the shard exists.
+This is why we say "a shard with 2 replicas" has 2 total physical copies, not 
an original plus 2 additional copies.
+
+All replicas of the same shard contain the same subset of documents and share 
the same configuration.
+
+The number of replicas determines the level of fault tolerance the cluster has 
in the event of a node failure.
+It also dictates the theoretical limit on the number of concurrent search 
requests that can be processed under heavy load.
+
+=== Leaders and Followers
+
+Among the replicas for a given shard, one replica is designated as the 
_leader_.
+The leader serves as the source-of-truth for its shard.
+When document updates are made, they are first processed by the leader replica 
and then propagated to the other replicas (the exact mechanism varies by 
cluster mode).
+
+The replicas which are not leaders are called _followers_.
+
+=== Cores
+
+In Solr's implementation, each replica is represented as a _core_.
+The term "core" is primarily an internal implementation detail—when you create 
a replica, Solr creates a core to represent it.
+Multiple cores can be hosted on any one node.
+
+NOTE: The term "core" can be confusing because in everyday English it implies 
something central and singular, but in Solr it actually refers to one of 
potentially many replicas distributed across the cluster.
+In most contexts, thinking of "core" as synonymous with "replica" will help 
clarify discussions about Solr's architecture.
+
+=== Collections and Indexes
+
+A _collection_ is the complete logical set of searchable documents that share 
a schema and configuration.
+In SolrCloud mode (described below), a collection encompasses all the shards 
and their replicas.
+
+An _index_ refers to the physical data structures written to disk by Apache 
Lucene.
+Each core (replica) maintains exactly one Lucene index on disk, containing the 
actual inverted indexes, stored fields, and other data structures that enable 
search.
+
+This creates a clear hierarchy from logical concepts to physical storage:
+
+[source,text]
+----
+Collection (logical grouping of all searchable documents)
+  └─> Shard 1 (logical partition)
+  │     └─> Replica 1 / Core 1 (physical instance)
+  │     │     └─> Lucene Index (disk structures)
+  │     └─> Replica 2 / Core 2 (physical instance)
+  │           └─> Lucene Index (disk structures)
+  └─> Shard 2 (logical partition)
+        └─> Replica 1 / Core 3 (physical instance)
+        │     └─> Lucene Index (disk structures)
+        └─> Replica 2 / Core 4 (physical instance)
+              └─> Lucene Index (disk structures)
+----
+
+In this example, a collection is divided into 2 shards, each shard has 2 
replicas for redundancy, and each replica maintains its own Lucene index on 
disk.
+
+== SolrCloud Mode
+
+SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the 
centralized cluster management that is its main feature.
+ZooKeeper tracks each node of the cluster and the state of each core on each 
node.
+
+In this mode, configuration files are stored in ZooKeeper and not on the file 
system of each node.
+When configuration changes are made, they must be uploaded to ZooKeeper, which 
in turn makes sure each node knows changes have been made.
+
+SolrCloud manages collections as first-class entities.
+A collection represents the entire group of shards and replicas that together 
provide access to a corpus of documents.
+Collections share the same configurations (schema, `solrconfig.xml`, etc.).
+This centralization of cluster management means that operations can be 
performed on the entire collection at one time.
+
+When changes are made to configurations, a single command to reload the 
collection will automatically reload each individual core (replica) that is a 
member of the collection.
+
+Sharding is handled automatically, simply by telling Solr during collection 
creation how many shards you'd like the collection to have.
+Document updates are then generally balanced between each shard automatically.
+Some degree of control over what documents are stored in which shards is also 
available, if needed.
+
+ZooKeeper also handles load balancing and failover.
+Incoming requests, either to index documents or for user queries, can be sent 
to any node of the cluster and ZooKeeper will route the request to an 
appropriate replica of each shard.

Review Comment:
   Right. Zookeeper merely records the range of hash values for the shard. Once 
Solr has read those values zookeeper isn't (or at least shouldn't be!) 
consulted again unless the node containing the shard information zk was updated 
for some reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SOLR-18179: Better highlight and expand upon our Cluster concepts in the Ref Guide [solr]

Reply via email to