[
https://issues.apache.org/jira/browse/SOLR-18190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rahul Goswami updated SOLR-18190:
---------------------------------
Description:
# Collection-Level Index Upgrade in SolrCloud
## Problem
The existing `UPGRADECOREINDEX` CoreAdmin action rewrites old-format Lucene
segments in-place by reconstructing documents and re-adding them as
current-format segments. This operation is blocked in SolrCloud mode. There is
no way to upgrade the index format of a SolrCloud collection without full
reindexing from source.
## Goal
Expose index-upgrade functionality at collection scope in SolrCloud as a new
`UPGRADECOLLECTIONINDEX` Collections API command with async support and
`REQUESTSTATUS` tracking. The design must handle mixed replica types (NRT,
TLOG, PULL), prevent index corruption, and minimize operational disruption.
## Design Decision: Write Freeze Required
Two approaches were evaluated:
**Approach A (Zero write downtime)** — Rejected. The upgrader replays documents
while external writes continue, relying on `_version_`-based optimistic
concurrency. This has a fatal flaw: **Delete-By-Query resurrection**. Code
analysis confirms that `UpdateLog.lookupVersion()` does not consult the
`deleteByQueries` list — it only checks the tlog map, the live index, and the
`oldDeletes` LRU (which is populated only by delete-by-id, never by DBQ). When
a document is deleted by DBQ and then replayed by the upgrader, `lookupVersion`
returns either `null` (doc not found) or a stale tlog entry, allowing the
re-add to succeed. The document is silently resurrected. This bug exists in
standalone mode as well but is less likely to trigger.
**Approach B — Selected: Write freeze + hybrid local upgrade.** Collection is
set to `readOnly` for the duration. Each replica type is upgraded via its
designed index-update mechanism. All concurrency edge cases are eliminated.
An alternative within Approach B — upgrading only the leader and relying on
distributed forwarding to NRT replicas — was also rejected.
`DistributedZkUpdateProcessor` enforces the collection-level `readOnly` on
every node, including replicas receiving forwarded updates. Forwarding is
blocked by the same write freeze that protects against external writes.
## Operational Flow
1. Coordinator sets `readOnly=true` on the collection via `MODIFYCOLLECTION`
(blocks all external writes at `DistributedZkUpdateProcessor`).
2. For each shard (sequentially):
a. Identify the current leader via `getLeaderRetry()`.
b. **Upgrade the leader** via CoreAdmin `UPGRADECOREINDEX` with
`cloudMode=true`. The leader uses a stripped-down chain (`LogUpdateProcessor` →
`RunUpdateProcessor`, no `DistributedUpdateProcessor`) to rewrite old segments
locally. No version reassignment, no distributed forwarding. The original
`_version_` is preserved both in the indexed document and on the
`AddUpdateCommand` (for tlog consistency). After rewriting, the original merge
policy is restored before commit, followed by `expungeDeletes` to clean
tombstone segments.
c. **Upgrade NRT non-leader replicas** in parallel using the same mechanism.
Each NRT replica independently rewrites its own segments with zero network I/O.
d. **TLOG/PULL replicas** converge via their normal background replication
from the now-upgraded leader.
e. **Convergence polling**: Poll all replicas with `checkOnly=true` until
every replica reports no old-format segments remaining (see Convergence Polling
below).
3. Clear `readOnly=false` only after all shards validate. On any failure, the
collection remains read-only for operator intervention.
> Collection-Level Index Upgrade API in SolrCloud (UPRGADECOLLECTIONINDEX)
> ------------------------------------------------------------------------
>
> Key: SOLR-18190
> URL: https://issues.apache.org/jira/browse/SOLR-18190
> Project: Solr
> Issue Type: Improvement
> Reporter: Rahul Goswami
> Priority: Major
>
> # Collection-Level Index Upgrade in SolrCloud
> ## Problem
> The existing `UPGRADECOREINDEX` CoreAdmin action rewrites old-format Lucene
> segments in-place by reconstructing documents and re-adding them as
> current-format segments. This operation is blocked in SolrCloud mode. There
> is no way to upgrade the index format of a SolrCloud collection without full
> reindexing from source.
> ## Goal
> Expose index-upgrade functionality at collection scope in SolrCloud as a new
> `UPGRADECOLLECTIONINDEX` Collections API command with async support and
> `REQUESTSTATUS` tracking. The design must handle mixed replica types (NRT,
> TLOG, PULL), prevent index corruption, and minimize operational disruption.
> ## Design Decision: Write Freeze Required
> Two approaches were evaluated:
> **Approach A (Zero write downtime)** — Rejected. The upgrader replays
> documents while external writes continue, relying on `_version_`-based
> optimistic concurrency. This has a fatal flaw: **Delete-By-Query
> resurrection**. Code analysis confirms that `UpdateLog.lookupVersion()` does
> not consult the `deleteByQueries` list — it only checks the tlog map, the
> live index, and the `oldDeletes` LRU (which is populated only by
> delete-by-id, never by DBQ). When a document is deleted by DBQ and then
> replayed by the upgrader, `lookupVersion` returns either `null` (doc not
> found) or a stale tlog entry, allowing the re-add to succeed. The document is
> silently resurrected. This bug exists in standalone mode as well but is less
> likely to trigger.
> **Approach B — Selected: Write freeze + hybrid local upgrade.** Collection is
> set to `readOnly` for the duration. Each replica type is upgraded via its
> designed index-update mechanism. All concurrency edge cases are eliminated.
> An alternative within Approach B — upgrading only the leader and relying on
> distributed forwarding to NRT replicas — was also rejected.
> `DistributedZkUpdateProcessor` enforces the collection-level `readOnly` on
> every node, including replicas receiving forwarded updates. Forwarding is
> blocked by the same write freeze that protects against external writes.
> ## Operational Flow
> 1. Coordinator sets `readOnly=true` on the collection via `MODIFYCOLLECTION`
> (blocks all external writes at `DistributedZkUpdateProcessor`).
> 2. For each shard (sequentially):
> a. Identify the current leader via `getLeaderRetry()`.
> b. **Upgrade the leader** via CoreAdmin `UPGRADECOREINDEX` with
> `cloudMode=true`. The leader uses a stripped-down chain (`LogUpdateProcessor`
> → `RunUpdateProcessor`, no `DistributedUpdateProcessor`) to rewrite old
> segments locally. No version reassignment, no distributed forwarding. The
> original `_version_` is preserved both in the indexed document and on the
> `AddUpdateCommand` (for tlog consistency). After rewriting, the original
> merge policy is restored before commit, followed by `expungeDeletes` to clean
> tombstone segments.
> c. **Upgrade NRT non-leader replicas** in parallel using the same
> mechanism. Each NRT replica independently rewrites its own segments with zero
> network I/O.
> d. **TLOG/PULL replicas** converge via their normal background replication
> from the now-upgraded leader.
> e. **Convergence polling**: Poll all replicas with `checkOnly=true` until
> every replica reports no old-format segments remaining (see Convergence
> Polling below).
> 3. Clear `readOnly=false` only after all shards validate. On any failure, the
> collection remains read-only for operator intervention.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]