[
https://issues.apache.org/jira/browse/SOLR-18179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069381#comment-18069381
]
Eric Pugh commented on SOLR-18179:
----------------------------------
I asked for some help to reconcile Gus's email and the text and this is waht I
got back:
Looking at the `cluster-types.adoc` document in light of this feedback, I can
see several areas where the terminology could be clarified to align with the
more precise definitions provided. Let me analyze the key issues:
## Major Issues Identified
### 1. **Shard/Replica Relationship is Backwards**
The current document says:
> "each shard can be copied as a _replica_"
This perpetuates the confusion that a shard exists independently of replicas.
According to the feedback, **a replica IS a physical manifestation of a
shard**, not a copy of it. A shard with one copy has **one replica**, not zero
replicas.
**Current problematic text (lines 38-42):**
```solr-epugh/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc#L38-42
=== Replicas
In order to provide some failover for each shard, each shard can be copied as a
_replica_.
A replica has the same configuration as the shard and any other replicas for
the same index.
```
### 2. **"Index" vs "Collection" Confusion**
The document uses "index" frequently where "collection" would be clearer:
- Line 32: "a single logical index can be split"
- Line 42: "replicas for the same index"
- Line 45: "each replica would be a full copy of the entire index"
According to the feedback, "Collection" is better because it's clearly distinct
from what Lucene writes to disk.
### 3. **Core Definition Needs Context**
Lines 58-60 say:
```solr-epugh/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc#L58-60
=== Cores
Each replica, whether it is a leader or a follower, is called a _core_.
Multiple cores can be hosted on any one node.
```
While this is more accurate than some uses, the feedback suggests that "core"
is the most problematic term in Solr's vocabulary and should be better
explained or contextualized as a legacy/implementation term.
### 4. **Missing Definitions**
- **Server vs Node**: The document doesn't distinguish between the hardware/VM
(server) and the running process (node)
- **Corpus**: Not mentioned
- **Index (Lucene sense)**: Not clearly defined as "the disk representation"
### 5. **Confusing Statement about Replicas without Shards**
Lines 44-46:
```solr-epugh/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc#L44-46
It's possible to have replicas without having created shards.
In this case, each replica would be a full copy of the entire index, instead of
being only a copy of a part of the entire index.
```
This is confusing because if you have one shard, you still have shards—just one
of them. This might perpetuate the "zero replica" confusion.
## Suggested Improvements
Would you like me to propose a revised version of this document that:
1. **Clarifies the shard/replica relationship** - emphasizing that replicas are
physical manifestations of logical shards
2. **Uses "collection" instead of "index"** where referring to the logical set
of searchable documents
3. **Reserves "index" for the Lucene disk representation**
4. **Adds a clearer "core" explanation** - positioning it as an
implementation/legacy term that essentially means "replica"
5. **Distinguishes server from node**
6. **Optionally adds "corpus"** if relevant to cluster types discussion
7. **Removes confusing language** about "replicas without shards"
> Update Concepts and Glossary Ref Guide pages around Cloud/Cluster concepts
> --------------------------------------------------------------------------
>
> Key: SOLR-18179
> URL: https://issues.apache.org/jira/browse/SOLR-18179
> Project: Solr
> Issue Type: Improvement
> Components: documentation
> Reporter: Eric Pugh
> Priority: Minor
>
> Following up on this message in a dev list thread
> [https://lists.apache.org/thread/rcm372jwzmnmpjno9ygto82z8hptvkw1] about the
> names of things, I noticed that we bury our "These are what cluster types"
> are page deep in ref guide, not up front. Yet choosing Cloud versus User
> Managed is a key decision right at the beginning, not a "and we are going to
> prod" decision. At least with todays split in the apis etc ;)
> This PR is to work on getting consistency of terms across the Ref Guide.
> Bonus points if it helps with consistency in both the new Solr UI, and maybe
> in the old Solr Admin.
> Less concerned about any naming ramifications in source code...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]