GitHub user chrevanthreddy created a discussion: RFC-104 - Add native Vector 
Search Index type for Hudi Vector Type

This RFC proposes native vector similarity search support in Apache Hudi, 
enabling approximate nearest neighbor (ANN) queries on embedding columns stored 
in Hudi tables. The design extends the new VECTOR type in the main data table. 
A lightweight cluster-routing index in the Hudi metadata table provides 
file-group pruning, while future hidden columns in the main Parquet files may 
store RaBitQ binary codes and scalars for fast within-file ANN scanning. 
Cluster assignment itself stays in MDT.
https://github.com/chrevanthreddy/hudi/blob/f67ef0972f9a8a2928a92cf8b61ad2a81ab3cd72/rfc/rfc-104/rfc-104.md

https://github.com/chrevanthreddy/hudi/pull/1/changes#diff-e8be007ae70221ad49cd11b305f4dbe6d1d09b344549a877555a7eb376d16f2d

Starting off the discussion for Vector Search Index. Numbers regarding the 
initial testing can be added here to verify effectiveness of adding the index. 
And also cost of the adding index and maintaining vs brute force vector search

GitHub link: https://github.com/apache/hudi/discussions/18500

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to