Hi Shengkai,
thank you for proposing this FLIP. Also, thank you for considering my
thoughts from FLIP-517, even though I haven't managed to finalize the
discussion/voting yet.
It looks mostly good to me. However, I would like to discuss the
semantics of the `on_time` parameter:
1) Proctime
I truly believe we should avoid the need for a `proctime` attribute.
Teaching the rowtime attributes to users is already painful enough, but
additionally teaching proctime is worse. For PTFs of FLIP-440, only
rowtime attributes can be used in f(on_time => ...) and we should do the
same for future built-in PTFs. Not specifying `on_time` can be equal to
proctime.
So users can just naturally use the PTF, with the mental model of
LITERAL being a foreach loop where each invocation happens instantly (in
processing time).
2) Rowtime
All PTFs should follow the SystemTypeInference:
https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/inference/SystemTypeInference.java#L239
It assumes that when an `on_time` parameter is passed, the result
appends a `rowtime` column that can be used in subsequent time based
operations. Can we add such a column in the output for VECTOR_SEARCH as
well?
3) Naming
Just a general note, feel free to ignore: A function or operationshould
use a verb not a noun. E.g. JOIN, SEARCH, SELECT. Vector search is a
concept. The function should rather be called `SEARCH_VECTOR`. This was
also explained in FLIP-517.
Thanks,
Timo
On 14.08.25 03:31, Shengkai Fang wrote:
Hi, all.
There has been no feedback for a while. I plan to close this FLIP tomorrow
unless there are further comments. Thank you all for the discussion.
Best,
Shengkai
Yash Anand <[email protected]> 于2025年7月31日周四 15:47写道:
Hi Shengkai,
Thanks for the FLIP, this will be a great addition to flink AI
capabilities. +1 for this feature.
Best,
Yash Anand
On Tue, Jul 29, 2025 at 7:23 PM Jacky Lau <[email protected]> wrote:
Hi Shengkai,
Thanks for the FLIP and enhancement for AI capabilities in Flink. +1 for
this feature
Best,
Jacky Lau
Hao Li <[email protected]> 于2025年7月30日周三 01:03写道:
Hi Shengkai,
Thanks for the FLIP and enhancement for AI capabilities in Flink. +1.
Thanks,
Hao
On Tue, Jul 29, 2025 at 2:16 AM Shengkai Fang <[email protected]>
wrote:
Hi,
I'd like to start a discussion of FLIP-540: Support VECTOR_SEARCH in
Flink
SQL[1].
In FLIP-437/FLIP-525, Apache Flink has initially integrated Large
Language
Model (LLM) capabilities, enabling semantic understanding and
real-time
processing of streaming data pipelines. This integration has been
technically validated in scenarios such as log classification and
real-time
question-answering systems. However, the current architecture allows
Flink
to only use embedding models to convert unstructured data (e.g.,
text,
images) into high-dimensional vector features, which are then
persisted
to
downstream storage systems (e.g., Milvus, Mongodb). It lacks
real-time
online querying and similarity analysis capabilities for vector
spaces.
To
address this limitation, we propose introducing the VECTOR_SEARCH
function
in this FLIP, enabling users to perform streaming vector similarity
searches and real-time context retrieval (e.g., Retrieval-Augmented
Generation, RAG) directly within Flink.
Looking forward to comments and suggestions for improvements!
Best,
Shengkai
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL