Hi Shengkai,

thank you for proposing this FLIP. Also, thank you for considering my thoughts from FLIP-517, even though I haven't managed to finalize the discussion/voting yet.

It looks mostly good to me. However, I would like to discuss the semantics of the `on_time` parameter:

1) Proctime

I truly believe we should avoid the need for a `proctime` attribute. Teaching the rowtime attributes to users is already painful enough, but additionally teaching proctime is worse. For PTFs of FLIP-440, only rowtime attributes can be used in f(on_time => ...) and we should do the same for future built-in PTFs. Not specifying `on_time` can be equal to proctime.

So users can just naturally use the PTF, with the mental model of LITERAL being a foreach loop where each invocation happens instantly (in processing time).

2) Rowtime

All PTFs should follow the SystemTypeInference:

https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/inference/SystemTypeInference.java#L239

It assumes that when an `on_time` parameter is passed, the result appends a `rowtime` column that can be used in subsequent time based operations. Can we add such a column in the output for VECTOR_SEARCH as well?

3) Naming

Just a general note, feel free to ignore: A function or operationshould use a verb not a noun. E.g. JOIN, SEARCH, SELECT. Vector search is a concept. The function should rather be called `SEARCH_VECTOR`. This was also explained in FLIP-517.

Thanks,
Timo


On 14.08.25 03:31, Shengkai Fang wrote:
Hi, all.

There has been no feedback for a while. I plan to close this FLIP tomorrow
unless there are further comments. Thank you all for the discussion.

Best,
Shengkai

Yash Anand <[email protected]> 于2025年7月31日周四 15:47写道:

Hi Shengkai,

Thanks for the FLIP, this will be a great addition to flink AI
capabilities. +1 for this feature.

Best,
Yash Anand

On Tue, Jul 29, 2025 at 7:23 PM Jacky Lau <[email protected]> wrote:

Hi Shengkai,

Thanks for the FLIP and enhancement for AI capabilities in Flink. +1 for
this feature

Best,
Jacky Lau

Hao Li <[email protected]> 于2025年7月30日周三 01:03写道:

Hi Shengkai,

Thanks for the FLIP and enhancement for AI capabilities in Flink. +1.

Thanks,
Hao

On Tue, Jul 29, 2025 at 2:16 AM Shengkai Fang <[email protected]>
wrote:

Hi,
I'd like to start a discussion of FLIP-540: Support VECTOR_SEARCH in
Flink
SQL[1].

In FLIP-437/FLIP-525, Apache Flink has initially integrated Large
Language
Model (LLM) capabilities, enabling semantic understanding and
real-time
processing of streaming data pipelines. This integration has been
technically validated in scenarios such as log classification and
real-time
question-answering systems. However, the current architecture allows
Flink
to only use embedding models to convert unstructured data (e.g.,
text,
images) into high-dimensional vector features, which are then
persisted
to
downstream storage systems (e.g., Milvus, Mongodb). It lacks
real-time
online querying and similarity analysis capabilities for vector
spaces.
To
address this limitation, we propose introducing the VECTOR_SEARCH
function
in this FLIP, enabling users to perform streaming vector similarity
searches and real-time context retrieval (e.g., Retrieval-Augmented
Generation, RAG) directly within Flink.

Looking forward to comments and suggestions for improvements!

Best,
Shengkai

[1]




https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL






Reply via email to