[ 
https://issues.apache.org/jira/browse/SOLR-18190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071346#comment-18071346
 ] 

Rahul Goswami commented on SOLR-18190:
--------------------------------------

[~dsmiley] Thanks for taking a look and for your inputs.

Reworded to "read-only" instead of "write freeze".

It is true that for a given shard with older index, most/all segments may need 
to be rewritten and I did initially consider relying on recovery for NRT 
non-leaders to catch up.

However that would mean multiple replicas replicating the full index from the 
leader simultaneously, causing heavy IO on the leader. The proposed approach 
helps parallelize the upgrade - each NRT replica rewrites only the old-format 
segments locally with zero network I/O. Yes, TLOG/PULL replicas still converge 
via normal replication from leader, but in case of NRT we could leverage their 
local index update mechanism to be more efficient. 

On the understanding around the core level upgrade API, I have now added a 
brief section explaining the same: "Background: Core-Level Index Upgrade 
(UPGRADECOREINDEX){*}"{*}

> Collection-Level Index Upgrade API in SolrCloud (UPRGRADECOLLECTIONINDEX)
> -------------------------------------------------------------------------
>
>                 Key: SOLR-18190
>                 URL: https://issues.apache.org/jira/browse/SOLR-18190
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Rahul Goswami
>            Assignee: Rahul Goswami
>            Priority: Major
>
> *+Objective+*
> Expose index-upgrade functionality at collection scope in SolrCloud as a new 
> "UPGRADECOLLECTIONINDEX" Collections API with async support and 
> `REQUESTSTATUS` tracking.
> *+Approach+*
> _Make collection read-only_ *+* _hybrid local upgrade_ - Collection is set to 
> `readOnly` for the duration. Each replica type is upgraded via its designed 
> index-update mechanism. Which means
>  * Leader is upgraded using the recently introduced CoreAdmin 
> UPGRADECOREINDEX API
>  * Each NRT replica gets individually upgraded using the same 
> UPGRADECOREINDEX API 
>  * TLOG/PULL replicas converge via the usual replication mechanism. 
> Coordinator waits for replicas to converge in a timebound manner before 
> declaring success.
> Why not  upgrade only the leader and rely on distributed forwarding to NRT 
> replicas ? `DistributedZkUpdateProcessor` enforces the collection-level 
> `readOnly` on every node, including replicas receiving forwarded updates. 
> Forwarding is blocked due to read-only status.{+}**{+}
>  
> +*Background: Core-Level Index Upgrade (UPGRADECOREINDEX)*+
> Solr's UPGRADECOREINDEX CoreAdmin command rewrites segments written by older 
> Lucene versions into the current format (as long as the fields are 
> stored=true or docValues=true). This makes it possible to use the same index 
> across multiple major versions without requiring reindexing from the original 
> data source .
>   For each core, it:
>   1. Opens the existing index and sets 
> [LatestVersionMergePolicy|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/index/LatestVersionMergePolicy.java]
>  on the IndexWriter to prevent older-format segments from merging with 
> latest-format segments.
>   2. Identifies segments written by an older Lucene major version 
> ([shouldUpgradeSegment()|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/admin/api/UpgradeCoreIndex.java#L250]
>  – any segment whose minVersion predates the current major version).
>   3. Reads every document from those old-format segments, reconstructing 
> SolrInputDocuments with all stored fields.
>   4. Re-adds each document through an update processor chain, which writes it 
> as a new current-format segment.
>   5. Commits and runs expungeDeletes to remove the now-obsolete old segments. 
> Also restores the original merge policy. 
>   Documents that already reside in current-format segments are untouched. The 
> operation is idempotent – re-running it skips segments that are already up to 
> date.
>  
> +*Operational Flow*+
> 1. Coordinator sets `readOnly=true` on the collection via `MODIFYCOLLECTION` 
> (blocks all external writes at `DistributedZkUpdateProcessor`).
> 2. For each shard (sequentially):
>   a. Identify the current leader 
>   b. Upgrade the leader via CoreAdmin `UPGRADECOREINDEX` action. The leader 
> uses a stripped-down chain (`LogUpdateProcessor` → `RunUpdateProcessor`, no 
> `DistributedUpdateProcessor`) to rewrite old segments locally. No version 
> reassignment, no distributed forwarding. The original `{_}version{_}` is 
> preserved both in the indexed document and on the `AddUpdateCommand` (for 
> tlog consistency). After rewriting, the original merge policy is restored 
> before commit.
>   c. Upgrade NRT non-leader replicas in parallel using the same mechanism. 
> Each NRT replica independently rewrites its own segments with zero network 
> I/O.
>   d. TLOG/PULL replicas converge via their normal background replication from 
> the now-upgraded leader.
>   e. Convergence polling: Poll all replicas with `checkOnly=true` until every 
> replica reports no old-format segments remaining. ("checkOnly" is a new 
> lightweight param introduced on the UPGRADECOREINDEX CoreAdmin API to check 
> for presence of any old-format segments)
> 3. Clear `readOnly=false` only after all shards validate. On any failure, the 
> collection remains read-only for operator intervention. 
>  
> *+Limitations+*
>  - Nested documents: Not supported (existing limitation in 
> `UpgradeCoreIndex`). 
>  - Write downtime: The collection is unavailable for external writes for the 
> duration of the upgrade. {_}Note{_}: The CoreAdmin API doesn't have this 
> limitation currently in standalone mode, but in SolrCloud mode with the 
> possibility of leader election mid-upgrade and considering overall cluster 
> stability/state correctness factors, reducing another variable by blocking 
> writes makes the design simpler to reason about.
>  - Leader election resilience: If a leader election occurs during the 
> upgrade, progress may be lost and the command must be re-run. This should be 
> fine since the operation is designed to be idempotent.
>  - Co-located replica IO: NRT replicas on the same node are upgraded in 
> parallel, which may cause IO contention. Node-aware throttling deferred to a 
> future version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to