[jira] [Commented] (HDDS-14967) S3 Vector Support

Chu Cheng Li (Jira) Fri, 08 May 2026 10:21:52 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079554#comment-18079554
 ]


Chu Cheng Li commented on HDDS-14967:
-------------------------------------

Ceph is developing the s3 vector too, but seems like they built it on top of 
Lance Format or LanceDB.

https://github.com/ceph/ceph/pull/66066

> S3 Vector Support
> -----------------
>
>                 Key: HDDS-14967
>                 URL: https://issues.apache.org/jira/browse/HDDS-14967
>             Project: Apache Ozone
>          Issue Type: New Feature
>          Components: OM, s3gateway, SCM
>            Reporter: Chu Cheng Li
>            Assignee: Chu Cheng Li
>            Priority: Major
>
> h2. Background
> Amazon S3 Vectors introduces a new S3 API family for vector buckets,
> vector indexes, vector upserts, deletes, listing, and approximate nearest
> neighbor queries. AWS positions it as a low-cost, durable vector storage
> service with strong write consistency and sub-second query latency for
> infrequent queries.
> That API surface is a good match for Apache Ozone:
>  - Ozone already has a stateless `s3gateway` tier that can scale out
> horizontally.
>  - Ozone Manager (OM) already provides a strongly consistent metadata plane
> backed by Raft and RocksDB.
>  - SCM and DataNodes already provide a durable distributed block layer for
> large immutable artifacts.
> The public architecture discussions around object-storage-native vector
> systems are also instructive:
>  - AWS documents S3 Vectors as strongly consistent and built around vector
> buckets, vector indexes, float32 vectors, `cosine` / `euclidean`
> distance, and metadata filtering.
>  - turbopuffer documents a stateless compute layer, an object-storage WAL,
> NVMe / memory cache, and a split between recent unindexed data and
> asynchronously indexed data.
>  - The [SPFresh paper|https://arxiv.org/abs/2410.14452] shows that
> incremental updates on a centroid-based ANN index can avoid full
> rebuilds.
>  - The OpenData Vector RFCs describe a practical SPANN-style storage model
> with centroids in memory, posting lists on disk, exact metadata indexes,
> delete bitmaps, and background split / merge / reassign maintenance.
> Ozone can combine these ideas in a way that fits its own architecture:
>  - Keep the online compute, cache, query planning, and SPFresh build work in
> s3gateway.
>  - Use OM only for coordination, durability, and cross-gateway visibility.
>  - Use SCM and DataNodes only for the immutable flushed and compacted
> vector-storage artifacts.
> This should let Ozone provide an S3 Vectors-compatible API while improving
> on two areas that are especially important for production systems:
>  - stronger read-your-own-write and query-session visibility across multiple
> gateways
>  - exact metadata filtering over both recent inline data and flushed data
> h2. Goals
>  - Support the core Amazon S3 Vectors resource model:
>  -- vector buckets
>  -- vector indexes
>  -- _*PutVectors*_
>  -- _*DeleteVectors*_
>  -- _*GetVectors*_
>  -- _*ListVectors*_
>  -- _*QueryVectors*_
>  - Keep the hot write and query path gateway-centric.
>  - Use OM RocksDB as the durable inline write layer for recent updates.
>  - Use OM Raft log and RocksDB WAL as the durability mechanism for the
> inline path.
>  - Flush and compact inline data into immutable artifacts stored on Ozone's
> distributed block layer.
>  - Use SPFresh as the on-disk ANN index for flushed data.
>  - Support memory and local-NVMe cache in `s3gateway`.
>  - Support query-session visibility across multiple gateways without
> requiring gateway affinity.
>  - Support union reads from:
>  -- visible inline data in OM
>  -- visible flushed data in the distributed block layer
>  - Preserve strong default semantics for write visibility and listing.
>  - Leave room for an eventual-consistency mode as an Ozone extension for
> lower-latency warm queries.
> h2. Non-Goals
>  - DataNode-native vector indexing or vector-aware SCM scheduling in the
> first phase.
>  - Hybrid BM25 + vector search in the first phase.
>  - Cross-index transactions.
>  - Full server-side SQL-style query planning.
>  - Rebuilding SCM or DN storage formats specifically for vectors in the
> first phase.
>  - Requiring a dedicated indexing service outside of `s3gateway`.
> h2. Use Cases
> h3. AWS S3 Vectors Compatibility
>  - Use the `s3vectors` API family with SigV4 authentication.
>  - Create vector buckets and indexes with the same high-level semantics as
> AWS.
>  - Upsert and delete vectors using float32 embeddings.
>  - Query vectors by ANN search with filterable and non-filterable metadata.
> h3. Read-Your-Own-Write Across Multiple Gateways
>  - A client writes vectors through gateway A.
>  - A follow-up query lands on gateway B.
>  - The query must still see the write without waiting for background flush.
> h3. Repeatable Query Sessions
>  - A client performs paginated `ListVectors`.
>  - A client performs multiple `QueryVectors` calls in one logical search
> session.
>  - Each call should be able to reuse a stable snapshot token so that the
> visible state does not move underneath the client.
> h3. High-Rate Ingest Without DN Allocation on Every Write
>  - Small and medium upserts should become durable after OM Raft commit,
> without paying block allocation and DN replication on each request.
>  - Background flush should amortize the cost of block-layer writes.
> h3. Exact Metadata Filtering
>  - Filterable metadata should be indexed exactly.
>  - Queries should not rely on post-filtering only.
>  - Filters should work across both recent inline data and flushed data.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-14967) S3 Vector Support

Reply via email to