[
https://issues.apache.org/jira/browse/CALCITE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wgcn007 updated CALCITE-7353:
-----------------------------
Description:
h2. Background
The exponential growth of AI and large model applications has driven a surge in
demand for vector similarity search. Databases like PostgreSQL, Redis, Doris,
and Elasticsearch have already added vector retrieval support. Milvus, as a
high-performance, cloud-native vector database designed for scalable
Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The goal
of this Jira is to make Milvus more accessible by creating a full SQL
abstraction layer.
h2. Implementation
I have completed a feasibility-validated demo implementation in [my repository
|https://github.com/wg1026688210/calcite/commits/dev/], building a Calcite
adapter with operator push-down to ensure computation executes on the Milvus
side. Key capabilities include:
* Vector Search Push-down
** The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan patterns.
When ORDER BY contains a vector distance function, it fuses the entire query
(filtering, projection, vector search, sorting, LIMIT) into a single
MilvusVectorSearch operator pushed to Milvus, with SQL Hint support for search
parameters:
SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS d
FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
WHERE book_name <> '小王子'
ORDER BY d
LIMIT 3
* Predicate Push-down:
Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and
logical operators (AND, OR, NOT) into Milvus expression strings for server-side
filtering:
SELECT * FROM milvus.vector_table WHERE id > 1
* Projection Push-down:
Supports constant and column name projection push-down to minimize data
transfer:
SELECT book_name, 'content' FROM milvus.vector_table
* Vector UDF Fallback: Added distance calculation UDFs: L2_DISTANCE(),
COSINE_DISTANCE(), and INNER_PRODUCT(). For complex operators (JOIN/UNION/AGG)
or non-pushdown scenarios (unsupported UDFs, least-similar vector queries),
vector computation falls back to in-memory execution, ensuring complete
functionality. Example:
SELECT book_name, l2_distance(vector_field, '[...]') d
FROM milvus.vector_table
ORDER BY d DESC LIMIT 5 – Find least similar Top N
Core Component:
* Milvus Metadata Layer:
* MilvusSchema: Manages all collections in a Milvus Database, responsible for
automatic collection discovery
* MilvusTranslatableTable (Core): Table abstraction bridging Milvus
collections and Calcite tables, providing field metadata, type mapping, and
creating MilvusTableScan nodes in toRel()
* Milvus SQL Operators & Rules Layer:
* MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
* MilvusFilter / MilvusFilterRule: Implements predicate push-down, converting
SQL conditions to Milvus expressions
* MilvusProject / MilvusProjectRule: Supports constant and column name
projection push-down
* MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule):
Identifies vector query patterns, validates sorting direction against distance
type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a single operator
for push-down
* MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code
generation layer that converts Milvus physical operators to executable Java
code, generating table.vectorSearch() or table.scan() calls
* MilvusRel: Defines Milvus adapter calling convention, the unified interface
for all Milvus operators
* Milvus Query Execution Layer:
* MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to
the Search (vector retrieval) and Query (paginated scanning) operations in
Milvus.
* Vector UDF Layer:
* MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements
registration, declaration and computation logic for L2/Cosine/IP distance
functions
h2. Use Cases:
* Build Milvus SQL Gateway Service: Provides standard SQL interface for Milvus
data read/write and collection management, significantly improving usability
and compatibility with existing SQL ecosystem tools.
* Internal developed Multimodal Compute Engine Integration: Serves as a vector
search execution engine integrated into internal compute platforms, enhancing
the engine's functionality and improving the performance of vector retrieval.
was:
h2. Background
The exponential growth of AI and large model applications has driven a surge in
demand for vector similarity search. Databases like PostgreSQL, Redis, Doris,
and Elasticsearch have already added vector retrieval support. Milvus, as a
high-performance, cloud-native vector database designed for scalable
Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The goal
of this Jira is to make Milvus more accessible by creating a full SQL
abstraction layer.
h2. Implementation
I have completed a feasibility-validated demo implementation in [my repository
|https://github.com/wg1026688210/calcite/commits/dev/], building a Calcite
adapter with operator push-down to ensure computation executes on the Milvus
side. Key capabilities include:
* Vector Search Push-down
The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan patterns. When
ORDER BY contains a vector distance function, it fuses the entire query
(filtering, projection, vector search, sorting, LIMIT) into a single
MilvusVectorSearch operator pushed to Milvus, with SQL Hint support for search
parameters:
SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS d
FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
WHERE book_name <> '小王子'
ORDER BY d
LIMIT 3
* Predicate Push-down:
Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and
logical operators (AND, OR, NOT) into Milvus expression strings for server-side
filtering:
SELECT * FROM milvus.vector_table WHERE id > 1
* Projection Push-down:
Supports constant and column name projection push-down to minimize data
transfer:
SELECT book_name, 'content' FROM milvus.vector_table
* Vector UDF Fallback: Added distance calculation UDFs: L2_DISTANCE(),
COSINE_DISTANCE(), and INNER_PRODUCT(). For complex operators (JOIN/UNION/AGG)
or non-pushdown scenarios (unsupported UDFs, least-similar vector queries),
vector computation falls back to in-memory execution, ensuring complete
functionality. Example:
SELECT book_name, l2_distance(vector_field, '[...]') d
FROM milvus.vector_table
ORDER BY d DESC LIMIT 5 – Find least similar Top N
Core Component:
* Milvus Metadata Layer:
* MilvusSchema: Manages all collections in a Milvus Database, responsible for
automatic collection discovery
* MilvusTranslatableTable (Core): Table abstraction bridging Milvus
collections and Calcite tables, providing field metadata, type mapping, and
creating MilvusTableScan nodes in toRel()
* Milvus SQL Operators & Rules Layer:
* MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
* MilvusFilter / MilvusFilterRule: Implements predicate push-down, converting
SQL conditions to Milvus expressions
* MilvusProject / MilvusProjectRule: Supports constant and column name
projection push-down
* MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule):
Identifies vector query patterns, validates sorting direction against distance
type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a single operator
for push-down
* MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code
generation layer that converts Milvus physical operators to executable Java
code, generating table.vectorSearch() or table.scan() calls
* MilvusRel: Defines Milvus adapter calling convention, the unified interface
for all Milvus operators
* Milvus Query Execution Layer:
* MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to
the Search (vector retrieval) and Query (paginated scanning) operations in
Milvus.
* Vector UDF Layer:
* MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements
registration, declaration and computation logic for L2/Cosine/IP distance
functions
h2. Use Cases:
* Build Milvus SQL Gateway Service: Provides standard SQL interface for Milvus
data read/write and collection management, significantly improving usability
and compatibility with existing SQL ecosystem tools.
* Internal developed Multimodal Compute Engine Integration: Serves as a vector
search execution engine integrated into internal compute platforms, enhancing
the engine's functionality and improving the performance of vector retrieval.
> Support Milvus Calcite Adapter
> ---------------------------------
>
> Key: CALCITE-7353
> URL: https://issues.apache.org/jira/browse/CALCITE-7353
> Project: Calcite
> Issue Type: New Feature
> Reporter: wgcn007
> Priority: Major
> Fix For: 1.42.0
>
>
> h2. Background
> The exponential growth of AI and large model applications has driven a surge
> in demand for vector similarity search. Databases like PostgreSQL, Redis,
> Doris, and Elasticsearch have already added vector retrieval support. Milvus,
> as a high-performance, cloud-native vector database designed for scalable
> Approximate Nearest Neighbor (ANN) search, has been is widely adopted. The
> goal of this Jira is to make Milvus more accessible by creating a full SQL
> abstraction layer.
> h2. Implementation
> I have completed a feasibility-validated demo implementation in [my
> repository |https://github.com/wg1026688210/calcite/commits/dev/], building a
> Calcite adapter with operator push-down to ensure computation executes on the
> Milvus side. Key capabilities include:
> * Vector Search Push-down
> ** The MilvusVectorSearchRule identifies Sort→Project→[Filter]→Scan
> patterns. When ORDER BY contains a vector distance function, it fuses the
> entire query (filtering, projection, vector search, sorting, LIMIT) into a
> single MilvusVectorSearch operator pushed to Milvus, with SQL Hint support
> for search parameters:
> SELECT book_name, l2_distance(VectorFieldAutoTest, '[0.1, 0.2, 0.3, 0.4]') AS
> d
> FROM milvus.test_vector_search /*+ MILVUS_OPTIONS(nprobe='100000') */
> WHERE book_name <> '小王子'
> ORDER BY d
> LIMIT 3
> * Predicate Push-down:
> Converts scalar field comparison operators (=, <>, >, <, >=, <=, LIKE) and
> logical operators (AND, OR, NOT) into Milvus expression strings for
> server-side filtering:
> SELECT * FROM milvus.vector_table WHERE id > 1
> * Projection Push-down:
> Supports constant and column name projection push-down to minimize data
> transfer:
> SELECT book_name, 'content' FROM milvus.vector_table
> * Vector UDF Fallback: Added distance calculation UDFs: L2_DISTANCE(),
> COSINE_DISTANCE(), and INNER_PRODUCT(). For complex operators
> (JOIN/UNION/AGG) or non-pushdown scenarios (unsupported UDFs, least-similar
> vector queries), vector computation falls back to in-memory execution,
> ensuring complete functionality. Example:
> SELECT book_name, l2_distance(vector_field, '[...]') d
> FROM milvus.vector_table
> ORDER BY d DESC LIMIT 5 – Find least similar Top N
> Core Component:
> * Milvus Metadata Layer:
> * MilvusSchema: Manages all collections in a Milvus Database, responsible
> for automatic collection discovery
> * MilvusTranslatableTable (Core): Table abstraction bridging Milvus
> collections and Calcite tables, providing field metadata, type mapping, and
> creating MilvusTableScan nodes in toRel()
> * Milvus SQL Operators & Rules Layer:
> * MilvusTableScan: Handles Milvus data scanning and SQL Hint storage
> * MilvusFilter / MilvusFilterRule: Implements predicate push-down,
> converting SQL conditions to Milvus expressions
> * MilvusProject / MilvusProjectRule: Supports constant and column name
> projection push-down
> * MilvusVectorSearch / MilvusVectorSearchRule (Core Optimization Rule):
> Identifies vector query patterns, validates sorting direction against
> distance type (L2/ASC, IP/COSINE/DESC), and fuses entire queries into a
> single operator for push-down
> * MilvusToEnumerableConverterRule / MilvusToEnumerableConverter: Code
> generation layer that converts Milvus physical operators to executable Java
> code, generating table.vectorSearch() or table.scan() calls
> * MilvusRel: Defines Milvus adapter calling convention, the unified
> interface for all Milvus operators
> * Milvus Query Execution Layer:
> * MilvusSearchEnumerator / MilvusQueryEnumerator: Respectively correspond to
> the Search (vector retrieval) and Query (paginated scanning) operations in
> Milvus.
> * Vector UDF Layer:
> * MilvusOperatorTable / MilvusVectorFunction / MilvusVectorUdfs: Implements
> registration, declaration and computation logic for L2/Cosine/IP distance
> functions
> h2. Use Cases:
> * Build Milvus SQL Gateway Service: Provides standard SQL interface for
> Milvus data read/write and collection management, significantly improving
> usability and compatibility with existing SQL ecosystem tools.
> * Internal developed Multimodal Compute Engine Integration: Serves as a
> vector search execution engine integrated into internal compute platforms,
> enhancing the engine's functionality and improving the performance of vector
> retrieval.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)