[PR] Consolidate IVF and HNSW vector indexes into segment columns.psf (opt-in via storeInSegmentFile) [pinot]

via GitHub Thu, 25 Jun 2026 14:36:51 -0700


raghavyadav01 opened a new pull request, #18852:
URL: https://github.com/apache/pinot/pull/18852


   ## Motivation
   
   Today a vector index is written as a per-column sidecar in the segment 
directory: a flat
   `.vector.ivfflat.index` / `.vector.ivfpq.index` file for IVF, or a Lucene 
HNSW directory for
   HNSW (plus the lazily-built `.vector.hnsw.mapping` file). Every vector 
column adds extra
   inodes per segment, and on large clusters with many vector columns the file 
count balloons.
   
   The text index already solved this by packing its Lucene directory into 
`columns.psf` (the
   segment's single-file index buffer) via the V2→V3 converter. This PR adds 
the same pattern
   to IVF and HNSW vector indexes — opt-in via a new `storeInSegmentFile` flag 
on
   `VectorIndexConfig`, default `false` so behaviour is unchanged.
   
   End-to-end verification (against MinIO via an S3-tier offline build, same 
vector data
   indexed twice — once with the flag, once without):
   
   | Form | Files per segment on S3 | Cold-mmap query | HNSW prune 
(numDocsScanned/total) |
   |---|---|---|---|
   | Sidecar (`storeInSegmentFile=false`) | 9 | ~22 ms | 10 / 400 |
   | Consolidated (`storeInSegmentFile=true`) | 4 | ~6 ms | 10 / 400 |
   
   The 5 Lucene HNSW files (`_0.cfe`, `_0.cfs`, `_0.si`, `segments_1`, 
`write.lock`) and the
   mapping file are gone in the consolidated form — those bytes are a typed 
entry inside
   `columns.psf` (verified via `index_map`: 
`embedding.vector_index.startOffset=48, size=12034`).
   
   ## What changes
   
   ### IVF (flat file)
   
   - **Readers consume `PinotDataBuffer`.** `IvfFlatVectorIndexReader`, 
`IvfOnDiskVectorIndexReader`,
     and `IvfPqVectorIndexReader` now read from a `PinotDataBuffer` slice. New 
`IvfCombinedBuffers`
     helper opens the buffer for both the consolidated path (slice owned by the 
segment directory)
     and the legacy path (mmap of the sidecar). Readers honour an `ownsBuffer` 
flag so borrowed
     slices (`columns.psf` views) are not closed by the reader's `close()`.
   - **Build path.** `IvfFlatVectorIndexCreator`, `IvfPqVectorIndexCreator` 
write a transient
     `.combined.index` file when the flag is on. The V2→V3 converter picks it 
up via the standard
     `copyIndexIfExists` loop and packs it into `columns.psf`.
   
   ### HNSW (Lucene directory)
   
   - **Build-time consolidation** via new `HnswVectorIndexCombined`: packs the 
Lucene HNSW
     directory's files plus the docId mapping (when present) into a single
     `.vector.hnsw.combined.index` file using the same `LUCENE_V2` layout the 
text index uses.
     Reuses `LuceneCombinedTextIndexConstants` — no duplicate format definition.
   - **Read path** via new `HnswVectorIndexBufferReader`: parses the combined 
buffer and returns
     a Lucene `Directory` built on `PinotBufferLuceneDirectory` + 
`PinotBufferIndexInput` (reused
     from the text-index package — `KnnVectorsReader` uses the same `Directory` 
interface, so no
     HNSW-specific Directory adapter is needed).
   - `HnswVectorIndexReader` gains a `PinotDataBuffer`-based constructor. The 
`DocIdTranslator`
     reads its mapping from inside the combined buffer when available, or 
rebuilds it in heap
     when not.
   
   ### Shared
   
   - **Bidirectional migration** in `VectorIndexHandler`: forward (sidecar → 
`columns.psf`) and
     reverse (`columns.psf` → sidecar) on a flag toggle, without rebuilding the 
index. For HNSW
     the absorb step first packs the Lucene directory into a transient combined 
file; the extract
     step unpacks the combined entry back into a Lucene directory.
   - **Crash recovery.** `absorbCombinedIntoColumnsPsf` detects a 
previously-crashed absorb via
     `hasIndexFor` + size check on the typed entry — no exception-message 
matching. Size mismatch
     refuses with `IOException` rather than guessing which copy is 
authoritative.
   - **Orphan cleanup.** A crash with `storeInSegmentFile=true` followed by a 
retry with the
     flag off leaves a stale combined file + `.inprogress` marker; the next 
build sweeps both
     before starting. Symmetric in the other direction. Works for both IVF 
(flat orphan) and HNSW
     (legacy Lucene directory orphan, recursive delete).
   - **HNSW extract ordering.** Unpack into Lucene directory runs BEFORE 
`removeIndex`. A crash
     mid-unpack leaves the typed entry in `columns.psf` and the next handler 
run retries — no
     permanent loss.
   - **Mixed-state convert.** If a V1/V2 segment somehow has both a legacy 
sidecar and a
     combined file for the same column, the V2→V3 converter packs the combined 
file and drops
     the legacy sibling — combined wins.
   
   ### SPI / utility plumbing
   
   - `V1Constants.Indexes` gains 
`VECTOR_IVF_FLAT_COMBINED_INDEX_FILE_EXTENSION`,
     `VECTOR_IVF_PQ_COMBINED_INDEX_FILE_EXTENSION`, 
`VECTOR_HNSW_COMBINED_INDEX_FILE_EXTENSION`.
   - `SegmentDirectoryPaths.findVectorIndexIndexFile` probes both legacy and 
combined
     extensions for IVF and HNSW. Legacy wins on tie (stable behaviour for 
unchanged segments).
   - `VectorIndexUtils.hasCombinedFormVectorIndex` is a new predicate the 
converter uses;
     `hasVectorIndex` deliberately excludes the combined form so the converter 
knows to pack it
     rather than preserve it as a sibling.
   - `SingleFileIndexDirectory` symmetry fixes in `hasIndexFor`, 
`getColumnsWithIndex`,
     `removeIndex` — vector entries are now treated uniformly whether they live 
as a sidecar or
     as a `_columnEntries` slot.
   
   ## Backward compatibility
   
   - Default behaviour unchanged. `storeInSegmentFile=false` preserves the 
legacy sidecar /
     Lucene-directory layout exactly. Existing segments load and serve 
identically.
   - No segment-format or wire-protocol break. The `.combined.index` extensions 
are build-time
     transient and never appear in a V3 segment after the converter runs.
   - Mixed-version safe: a server running this code reads a segment built 
without the flag
     exactly as before; the new code only fires when the flag is set in the 
table config.
   
   ## Tests
   
   - **132 tests green** (130 unit + 2 integration). New classes:
     - `IvfFlatConsolidatedVectorTest` — end-to-end integration: build with 
flag=true, load,
       run `vectorSimilarity` query, including a `MAP<STRING,STRING>` column to 
exercise the
       S3A read path.
     - `IvfDiskFormatInspectionTest` — byte-level header dumps validate creator 
output
       matches the documented IVF format unchanged in both extensions.
     - `IvfReaderFailurePathTest` — mmap-count tracking proves borrowed buffers 
are not
       closed by the reader (`ownsBuffer=false`) and that constructor failures 
still release
       owned buffers.
     - `HnswVectorIndexCombinedTest` — pack/unpack round-trip plus an 
end-to-end HNSW build
       read via the buffer-backed `Directory`.
     - `VectorIndexHandlerTest` — bidirectional migration for both IVF and HNSW,
       crash-recovery, size-mismatch refusal, 
extract-refusal-when-sidecar-exists, orphan
       sweep in both directions.
   - Spotless / checkstyle / license clean.
   
   ## Related
   
   The buffer-backed Lucene `Directory` plumbing (`PinotBufferLuceneDirectory`,
   `PinotBufferIndexInput`, `LuceneCombinedTextIndexConstants`) is reused as-is 
from the
   text-index implementation — this PR adds zero new code in those generic 
helpers.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Consolidate IVF and HNSW vector indexes into segment columns.psf (opt-in via storeInSegmentFile) [pinot]

Reply via email to