[ 
https://issues.apache.org/jira/browse/SOLR-18179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069381#comment-18069381
 ] 

Eric Pugh commented on SOLR-18179:
----------------------------------

I asked for some help to reconcile Gus's email and the text and this is waht I 
got back:

 

Looking at the `cluster-types.adoc` document in light of this feedback, I can 
see several areas where the terminology could be clarified to align with the 
more precise definitions provided. Let me analyze the key issues:

## Major Issues Identified

### 1. **Shard/Replica Relationship is Backwards**
The current document says:
> "each shard can be copied as a _replica_"

This perpetuates the confusion that a shard exists independently of replicas. 
According to the feedback, **a replica IS a physical manifestation of a 
shard**, not a copy of it. A shard with one copy has **one replica**, not zero 
replicas.

**Current problematic text (lines 38-42):**
```solr-epugh/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc#L38-42
=== Replicas

In order to provide some failover for each shard, each shard can be copied as a 
_replica_.
A replica has the same configuration as the shard and any other replicas for 
the same index.
```

### 2. **"Index" vs "Collection" Confusion**
The document uses "index" frequently where "collection" would be clearer:
- Line 32: "a single logical index can be split"
- Line 42: "replicas for the same index"
- Line 45: "each replica would be a full copy of the entire index"

According to the feedback, "Collection" is better because it's clearly distinct 
from what Lucene writes to disk.

### 3. **Core Definition Needs Context**
Lines 58-60 say:
```solr-epugh/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc#L58-60
=== Cores

Each replica, whether it is a leader or a follower, is called a _core_.
Multiple cores can be hosted on any one node.
```

While this is more accurate than some uses, the feedback suggests that "core" 
is the most problematic term in Solr's vocabulary and should be better 
explained or contextualized as a legacy/implementation term.

### 4. **Missing Definitions**
- **Server vs Node**: The document doesn't distinguish between the hardware/VM 
(server) and the running process (node)
- **Corpus**: Not mentioned
- **Index (Lucene sense)**: Not clearly defined as "the disk representation"

### 5. **Confusing Statement about Replicas without Shards**
Lines 44-46:
```solr-epugh/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc#L44-46
It's possible to have replicas without having created shards.
In this case, each replica would be a full copy of the entire index, instead of 
being only a copy of a part of the entire index.
```

This is confusing because if you have one shard, you still have shards—just one 
of them. This might perpetuate the "zero replica" confusion.

## Suggested Improvements

Would you like me to propose a revised version of this document that:

1. **Clarifies the shard/replica relationship** - emphasizing that replicas are 
physical manifestations of logical shards
2. **Uses "collection" instead of "index"** where referring to the logical set 
of searchable documents
3. **Reserves "index" for the Lucene disk representation**
4. **Adds a clearer "core" explanation** - positioning it as an 
implementation/legacy term that essentially means "replica"
5. **Distinguishes server from node** 
6. **Optionally adds "corpus"** if relevant to cluster types discussion
7. **Removes confusing language** about "replicas without shards"

> Update Concepts and Glossary Ref Guide pages around Cloud/Cluster concepts
> --------------------------------------------------------------------------
>
>                 Key: SOLR-18179
>                 URL: https://issues.apache.org/jira/browse/SOLR-18179
>             Project: Solr
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Eric Pugh
>            Priority: Minor
>
> Following up on this message in a dev list thread 
> [https://lists.apache.org/thread/rcm372jwzmnmpjno9ygto82z8hptvkw1] about the 
> names of things, I noticed that we bury our "These are what cluster types" 
> are page deep in ref guide, not up front.  Yet choosing Cloud versus User 
> Managed is a key decision right at the beginning, not a  "and we are going to 
> prod" decision.  At least with todays split in the apis etc ;)
> This PR is to work on getting consistency of terms across the Ref Guide.  
> Bonus points if it helps with consistency in both the new Solr UI, and maybe 
> in the old Solr Admin.
> Less concerned about any naming ramifications in source code...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to