mrproliu opened a new pull request, #1180:
URL: https://github.com/apache/skywalking-banyandb/pull/1180
## Summary
During lifecycle tier migration (hot → warm → cold), the migration
re-resolves every row's measure/stream schema from the registry to rebuild its
write request. If that schema was **deleted from the registry** — for example
when an upstream OAP metric is renamed or removed — the row can no longer be
written to any target, and the migration previously **aborted the whole
group**: the healthy data in that group never moved forward either.
This PR makes those **orphan** rows non-fatal. The migration now detects a
row whose schema is gone, skips it so the rest of the group still migrates, and
— by default — **archives** the row to a self-describing file so an operator
can still recover it. The orphan policy is configurable: `archive` (default)
keeps the data, `discard` drops it.
## Motivation
A real production migration failed with:
```
measure schema sw_metricsHour/meter_banyandb_instance_disk_usage_all_hour
not found in group snapshot
```
The metric had been renamed upstream and its schema removed from the
registry, but its on-disk data still lived in a source segment. Rather than
moving the rest of the (healthy) group forward, the migration aborted the
entire group. Orphan handling turns this localized, expected condition into a
recoverable, observable event instead of a hard stop.
## What changed
### Detection
- A row whose schema is absent from the registry is classified as an orphan
(`errOrphanSchema`):
- **Measure**: the replayer pre-fetches all measure schemas once at
construction (`ListMeasure`); a name missing from that consistent snapshot is
an orphan.
- **Stream**: `loadSchema` resolves per subject via `GetStream`; only a
genuine `schema.ErrGRPCResourceNotFound` is treated as orphan. **Any other
error (network, closed registry, context cancellation) stays fatal**, so a
transient registry failure is never mistaken for a droppable orphan.
- Orphan skips and series-index-gap skips (`errSkipSeries`) are distinct,
mutually-exclusive sentinels routed via a `skipError.kind`, and are tallied in
separate counters.
### Policy: `--migration-orphan-policy` (`archive` | `discard`, default
`archive`)
- **`archive`**: each orphan row is written as one JSON line,
**self-describing** — decoded from the part's own column types with no registry
schema needed. It carries the group, catalog, measure/stream name, source
location (stage/segment/shard/part), series id, entity, timestamp (RFC3339 +
epoch-nanos), tags, and — for measures — `version`, `indexed_tags`, and
`fields`; streams carry `element_id` instead and omit `version`. Per-part JSONL
is **gzip-compressed** on disk (highly repetitive rows compress ~37×). A
per-segment `manifest.json` indexes which deleted subjects were archived and
their row counts; it is written atomically (write-tmp + rename) so a crash
never leaves a truncated manifest. Re-replaying a part rewrites its file
idempotently (no double-counting on resume).
- **`discard`**: orphan rows are dropped and surfaced only in the report;
nothing is written to disk.
### Archive location: `--migration-orphan-archive-subdir` (relative, default
`archive`)
The archive lives in a **relative subdirectory under each catalog's own root
path**, not a separate absolute directory:
```
<catalog-root-path>/<subdir>/<group>/seg-<segment-suffix>/shard-<id>/part-<part-id>.jsonl.gz
<catalog-root-path>/<subdir>/<group>/seg-<segment-suffix>/manifest.json
```
So measure orphans land under `<measure-root-path>/archive/...` and stream
orphans under `<stream-root-path>/archive/...`. The catalog is **not** a path
level (it is recorded inside every record and manifest instead). This means the
archive shares the durability of the volume that already holds the catalog data
— no separate path to provision, and no ephemeral `/tmp` default. The flag must
be relative (validated at startup).
### Source-segment retention decoupling
The migration deletes source segments after a successful run. This PR
separates two skip reasons:
- **series-index gap** (`errSkipSeries`): a series could not be
resolved/rebuilt, so its rows remain **only** in the source — the source
segment is **retained** (excluded from the post-migration delete set) to avoid
permanent data loss.
- **orphan** (`errOrphanSchema`): the rows are archived (or discarded) and
the schema is gone, so the source segment is **deleted normally**.
`excludeRetainedSuffixes` removes the retained segment suffixes from the
delete candidate list at the end of migration.
### Reporting
Orphan handling is **expected behavior, not a migration error**. Instead of
polluting the report's `errors` buckets (and inflating part-error counts), the
migration report now carries a dedicated `orphans` section:
```json
"orphans": {
"policy": "archive",
"measure": { "sw_metricsHour": { "meter_..._hour": 1234 } },
"stream": { "<group>": { "<stream>": 56 } }
}
```
Counts are tracked per deleted subject, persisted in progress, and
accumulate across resume cycles.
### Safety
- A failed archive write under the `archive` policy is **fatal**: the part
aborts, the source segment is retained, and resume retries the whole part. An
orphan row is never silently dropped due to an archive I/O error.
- **Trace is not covered**: a trace group has a single schema, so a deleted
trace schema is a whole-group concern rather than a per-series orphan within a
surviving group.
## Configuration
| Flag | Description | Default |
| --- | --- | --- |
| `--migration-orphan-policy` | What to do with rows whose schema was
deleted from the registry: `archive` or `discard` | `archive` |
| `--migration-orphan-archive-subdir` | Relative subdirectory, under each
catalog's root path, where orphan rows are archived when policy is `archive` |
`archive` |
## Reading the archive
The `manifest.json` files are plain text. The per-part data is
gzip-compressed JSON Lines:
```bash
# inspect one part's rows (Linux: zcat; macOS: gzcat)
gunzip -c
<measure-root-path>/archive/<group>/seg-20260601/shard-0/part-000000000000003b.jsonl.gz
| jq .
# what was archived for a segment (counts per deleted subject)
jq '{total_rows, total_series, measures:[.measures[]|{measure,rows}]}'
.../seg-20260601/manifest.json
```
## Testing
**Unit tests** (`banyand/backup/lifecycle`):
- Archive write + gzip round-trip + manifest tallies (rows, distinct-series
union across parts).
- Idempotent resume (re-replay rewrites the part file and does not
double-count).
- `discard` policy writes nothing to disk.
- Archive write/open failure is fatal and retains the source segment.
- Orphan vs sidx-gap counter separation; per-subject orphan counts.
- **Transient registry error is NOT classified as orphan** (stream) — guards
the data-loss boundary.
- Report `orphans` section shape and `Progress.AddOrphanRows` aggregation.
- Path layout (`<root>/<group>/seg-.../...`, no catalog level).
**Distributed e2e** (`test/cases/lifecycle/orphan.go`, measure + stream):
Runs the real lifecycle command with the archive policy on the distributed
lifecycle cluster. Each spec creates a group with two resources (one to delete,
one to keep), writes rows that straddle the segment boundary, deletes one
schema, runs the migration, then asserts:
1. every archived record is correct (measure carries `fields`; stream
carries `element_id` and no fields), and the manifest row count matches;
2. the orphan source segment is deleted;
3. the kept resource migrated and is queryable on the warm stage.
A pre-migration registry-settle gate (poll until the kept resource is
resolvable and the deleted one is gone) avoids a flaky race where a
not-yet-propagated kept schema could be misread as orphan.
## Limitations and operations notes
- The archive directory is **not** cleaned up automatically; prune it once
the data is no longer needed.
- [ ] If this pull request closes/resolves/fixes an existing issue, replace
the issue number. Fixes apache/skywalking#<issue number>.
- [ ] Update the [`CHANGES`
log](https://github.com/apache/skywalking-banyandb/blob/main/CHANGES.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]