[ 
https://issues.apache.org/jira/browse/SOLR-18190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Goswami updated SOLR-18190:
---------------------------------
    Summary: Collection-Level Index Upgrade API in SolrCloud 
(UPGRADECOLLECTIONINDEX)  (was: Collection-Level Index Upgrade API in SolrCloud 
(UPRGRADECOLLECTIONINDEX))

> Collection-Level Index Upgrade API in SolrCloud (UPGRADECOLLECTIONINDEX)
> ------------------------------------------------------------------------
>
>                 Key: SOLR-18190
>                 URL: https://issues.apache.org/jira/browse/SOLR-18190
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Rahul Goswami
>            Assignee: Rahul Goswami
>            Priority: Major
>
> *+Objective+*
> Expose index-upgrade functionality at collection scope in SolrCloud as a new 
> "UPGRADECOLLECTIONINDEX" Collections API with async support and 
> `REQUESTSTATUS` tracking.
> +*Background: Core-Level Index Upgrade (UPGRADECOREINDEX)*+
> Solr's UPGRADECOREINDEX CoreAdmin command rewrites segments written by older 
> Lucene versions into the current format (as long as the fields are 
> stored=true or docValues=true). This makes it possible to use the same index 
> across multiple major versions without requiring reindexing from the original 
> data source .
>   For each core, it:
>   1. Opens the existing index and sets 
> [LatestVersionMergePolicy|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/index/LatestVersionMergePolicy.java]
>  on the IndexWriter to prevent older-format segments from merging with 
> latest-format segments.
>   2. Identifies segments written by an older Lucene major version 
> ([shouldUpgradeSegment()|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/admin/api/UpgradeCoreIndex.java#L250]
>  – any segment whose minVersion predates the current major version).
>   3. Reads every document from those old-format segments, reconstructing 
> SolrInputDocuments with all stored fields.
>   4. Re-adds each document through an update processor chain, which writes it 
> as a new current-format segment.
>   5. Commits and runs expungeDeletes to remove the now-obsolete old segments. 
> Also restores the original merge policy. 
>   Documents that already reside in current-format segments are untouched. The 
> operation is idempotent – re-running it skips segments that are already up to 
> date.
> +*Approach*+
>  * Leader is upgraded using the recently introduced CoreAdmin 
> UPGRADECOREINDEX API
>  * Thereafter UPGRADECOREINDEX API is called on each NRT replica to rewrite 
> any segments leftover after upgrading the leader (since leader upgrade would 
> have forwarded writes to the NRT replicas to rewrite most, if not all, of the 
> older segments)   
>  * TLOG/PULL replicas converge via the usual replication mechanism. 
> Coordinator waits for replicas to converge in a timebound manner before 
> declaring success.
>  
> +*Operational Flow (UPGRADECOLLECTIONINDEX SolrCloud Collections API)*+
> For each shard (sequentially):
>  #   Coordinator sets LatestVersionMergePolicy on the IndexWriter of each 
> replica
>  #   Identify the current leader and upgrade the leader via CoreAdmin 
> `UPGRADECOREINDEX` action. This would also cause the updates to be forwarded 
> to the replicas, thereby upgrading majority, if not all, of older segments in 
> each replica. 
>  #   Upgrade NRT non-leader replicas in sequence using the same mechanism. We 
> expect very less version churn since most of the older segments should have 
> been rewritten already with the leader upgrade in step#2.
>  #   TLOG/PULL replicas converge via their normal background replication from 
> the now-upgraded leader.
>  #   Convergence polling: Poll all replicas with `checkOnly=true` until every 
> replica reports no old-format segments remaining. ("checkOnly" is a new 
> lightweight param introduced on the UPGRADECOREINDEX CoreAdmin API to check 
> for presence of any old-format segments)
>  #   Reset the older/original merge policy on each replica
> *+Limitations+*
>  - Child/nested documents: Not supported (existing limitation in 
> `UpgradeCoreIndex`). 
>  - Leader election resilience: If a leader election occurs during the 
> upgrade, progress may be lost and the command must be re-run. This should be 
> fine since the operation is designed to be idempotent and ensures that the 
> state remains consistent albeit at the cost of additional re-work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to