[
https://issues.apache.org/jira/browse/SOLR-18190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rahul Goswami updated SOLR-18190:
---------------------------------
Description:
*+Objective+*
Expose index-upgrade functionality at collection scope in SolrCloud as a new
"UPGRADECOLLECTIONINDEX" Collections API with async support and `REQUESTSTATUS`
tracking.
*+Approach+*
_Write freeze_ *+* _hybrid local upgrade_ - Collection is set to `readOnly` for
the duration. Each replica type is upgraded via its designed index-update
mechanism. Which means
* Leader is upgraded using the recently introduced CoreAdmin UPGRADECOREINDEX
API
* Each NRT replica gets individually upgraded using the same UPGRADECOREINDEX
API
* TLOG/PULL replicas converge via the usual replication mechanism.
Coordinator waits for replicas to converge in a timebound manner before
declaring success.
Why not upgrade only the leader and rely on distributed forwarding to NRT
replicas ? `DistributedZkUpdateProcessor` enforces the collection-level
`readOnly` on every node, including replicas receiving forwarded updates.
Forwarding is blocked by the same write freeze that protects against external
writes.
+*Operational Flow*+
1. Coordinator sets `readOnly=true` on the collection via `MODIFYCOLLECTION`
(blocks all external writes at `DistributedZkUpdateProcessor`).
2. For each shard (sequentially):
a. Identify the current leader
b. Upgrade the leader via CoreAdmin `UPGRADECOREINDEX` . The leader uses a
stripped-down chain (`LogUpdateProcessor` → `RunUpdateProcessor`, no
`DistributedUpdateProcessor`) to rewrite old segments locally. No version
reassignment, no distributed forwarding. The original `{_}version{_}` is
preserved both in the indexed document and on the `AddUpdateCommand` (for tlog
consistency). After rewriting, the original merge policy is restored before
commit.
c. Upgrade NRT non-leader replicas in parallel using the same mechanism. Each
NRT replica independently rewrites its own segments with zero network I/O.
d. TLOG/PULL replicas converge via their normal background replication from
the now-upgraded leader.
e. Convergence polling: Poll all replicas with `checkOnly=true` until every
replica reports no old-format segments remaining. ("checkOnly" is a new
lightweight param introduced on the UPGRADECOREINDEX CoreAdmin API to check for
presence of any old-format segments)
3. Clear `readOnly=false` only after all shards validate. On any failure, the
collection remains read-only for operator intervention.
*+Limitations+*
- Nested documents: Not supported (existing limitation in `UpgradeCoreIndex`).
- Write downtime: The collection is unavailable for external writes for the
duration of the upgrade. {_}Note{_}: The CoreAdmin API doesn't have this
limitation currently in standalone mode, but in SolrCloud mode with the
possibility of leader election mid-upgrade and considering overall cluster
stability/state correctness factors, reducing another variable by blocking
writes makes the design simpler to reason about.
- Leader election resilience: If a leader election occurs during the upgrade,
progress may be lost and the command must be re-run. This should be fine since
the operation is designed to be idempotent.
- Co-located replica IO: NRT replicas on the same node are upgraded in
parallel, which may cause IO contention. Node-aware throttling deferred to a
future version.
was:
*+Objective+*
Expose index-upgrade functionality at collection scope in SolrCloud as a new
"UPGRADECOLLECTIONINDEX" Collections API command with async support and
`REQUESTSTATUS` tracking.
*+Approach+*
_Write freeze_ *+* _hybrid local upgrade_ - Collection is set to `readOnly` for
the duration. Each replica type is upgraded via its designed index-update
mechanism. Which means
* Leader is upgraded using the recently introduced CoreAdmin UPGRADECOREINDEX
API
* Each NRT replica gets individually upgraded using the same UPGRADECOREINDEX
API
* TLOG/PULL replicas converge via the usual replication mechanism.
Coordinator waits for replicas to converge in a timebound manner before
declaring success.
Why not upgrade only the leader and rely on distributed forwarding to NRT
replicas ? `DistributedZkUpdateProcessor` enforces the collection-level
`readOnly` on every node, including replicas receiving forwarded updates.
Forwarding is blocked by the same write freeze that protects against external
writes.
+*Operational Flow*+
1. Coordinator sets `readOnly=true` on the collection via `MODIFYCOLLECTION`
(blocks all external writes at `DistributedZkUpdateProcessor`).
2. For each shard (sequentially):
a. Identify the current leader
b. Upgrade the leader via CoreAdmin `UPGRADECOREINDEX` . The leader uses a
stripped-down chain (`LogUpdateProcessor` → `RunUpdateProcessor`, no
`DistributedUpdateProcessor`) to rewrite old segments locally. No version
reassignment, no distributed forwarding. The original `{_}version{_}` is
preserved both in the indexed document and on the `AddUpdateCommand` (for tlog
consistency). After rewriting, the original merge policy is restored before
commit.
c. Upgrade NRT non-leader replicas in parallel using the same mechanism. Each
NRT replica independently rewrites its own segments with zero network I/O.
d. TLOG/PULL replicas converge via their normal background replication from
the now-upgraded leader.
e. Convergence polling: Poll all replicas with `checkOnly=true` until every
replica reports no old-format segments remaining. ("checkOnly" is a new
lightweight param introduced on the UPGRADECOREINDEX CoreAdmin API to check for
presence of any old-format segments)
3. Clear `readOnly=false` only after all shards validate. On any failure, the
collection remains read-only for operator intervention.
*+Limitations+*
- Nested documents: Not supported (existing limitation in `UpgradeCoreIndex`).
- Write downtime: The collection is unavailable for external writes for the
duration of the upgrade. {_}Note{_}: The CoreAdmin API doesn't have this
limitation currently in standalone mode, but in SolrCloud mode with the
possibility of leader election mid-upgrade and considering overall cluster
stability/state correctness factors, reducing another variable by blocking
writes makes the design simpler to reason about.
- Leader election resilience: If a leader election occurs during the upgrade,
progress may be lost and the command must be re-run. This should be fine since
the operation is designed to be idempotent.
- Co-located replica IO: NRT replicas on the same node are upgraded in
parallel, which may cause IO contention. Node-aware throttling deferred to a
future version.
> Collection-Level Index Upgrade API in SolrCloud (UPRGADECOLLECTIONINDEX)
> ------------------------------------------------------------------------
>
> Key: SOLR-18190
> URL: https://issues.apache.org/jira/browse/SOLR-18190
> Project: Solr
> Issue Type: Improvement
> Reporter: Rahul Goswami
> Assignee: Rahul Goswami
> Priority: Major
>
> *+Objective+*
> Expose index-upgrade functionality at collection scope in SolrCloud as a new
> "UPGRADECOLLECTIONINDEX" Collections API with async support and
> `REQUESTSTATUS` tracking.
> *+Approach+*
> _Write freeze_ *+* _hybrid local upgrade_ - Collection is set to `readOnly`
> for the duration. Each replica type is upgraded via its designed index-update
> mechanism. Which means
> * Leader is upgraded using the recently introduced CoreAdmin
> UPGRADECOREINDEX API
> * Each NRT replica gets individually upgraded using the same
> UPGRADECOREINDEX API
> * TLOG/PULL replicas converge via the usual replication mechanism.
> Coordinator waits for replicas to converge in a timebound manner before
> declaring success.
> Why not upgrade only the leader and rely on distributed forwarding to NRT
> replicas ? `DistributedZkUpdateProcessor` enforces the collection-level
> `readOnly` on every node, including replicas receiving forwarded updates.
> Forwarding is blocked by the same write freeze that protects against external
> writes.
> +*Operational Flow*+
> 1. Coordinator sets `readOnly=true` on the collection via `MODIFYCOLLECTION`
> (blocks all external writes at `DistributedZkUpdateProcessor`).
> 2. For each shard (sequentially):
> a. Identify the current leader
> b. Upgrade the leader via CoreAdmin `UPGRADECOREINDEX` . The leader uses a
> stripped-down chain (`LogUpdateProcessor` → `RunUpdateProcessor`, no
> `DistributedUpdateProcessor`) to rewrite old segments locally. No version
> reassignment, no distributed forwarding. The original `{_}version{_}` is
> preserved both in the indexed document and on the `AddUpdateCommand` (for
> tlog consistency). After rewriting, the original merge policy is restored
> before commit.
> c. Upgrade NRT non-leader replicas in parallel using the same mechanism.
> Each NRT replica independently rewrites its own segments with zero network
> I/O.
> d. TLOG/PULL replicas converge via their normal background replication from
> the now-upgraded leader.
> e. Convergence polling: Poll all replicas with `checkOnly=true` until every
> replica reports no old-format segments remaining. ("checkOnly" is a new
> lightweight param introduced on the UPGRADECOREINDEX CoreAdmin API to check
> for presence of any old-format segments)
> 3. Clear `readOnly=false` only after all shards validate. On any failure, the
> collection remains read-only for operator intervention.
>
> *+Limitations+*
> - Nested documents: Not supported (existing limitation in
> `UpgradeCoreIndex`).
> - Write downtime: The collection is unavailable for external writes for the
> duration of the upgrade. {_}Note{_}: The CoreAdmin API doesn't have this
> limitation currently in standalone mode, but in SolrCloud mode with the
> possibility of leader election mid-upgrade and considering overall cluster
> stability/state correctness factors, reducing another variable by blocking
> writes makes the design simpler to reason about.
> - Leader election resilience: If a leader election occurs during the
> upgrade, progress may be lost and the command must be re-run. This should be
> fine since the operation is designed to be idempotent.
> - Co-located replica IO: NRT replicas on the same node are upgraded in
> parallel, which may cause IO contention. Node-aware throttling deferred to a
> future version.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]