wenjin272 commented on PR #332: URL: https://github.com/apache/flink-agents/pull/332#issuecomment-3604787435
> > Hi, @alnzng. There's a design issue related to the vector store that I'd appreciate your help reviewing. > > As describe in the design doc #339, long-term memory of flink-agents is also based on vector store. Currently, I provide an implementation based on chroma. In this implementation, I directly use chroma client rather than flink-agents `BaseVectorStore`, because there are some long-term memory needed interface not provided in `BaseVectorStore`. > > @xintongsong believes that we can directly build upon the Flink-Agents `BaseVectorStore`. Thus, we can support using any already supported vector store as the backend for long-term memory. > > I think this make sense, but it requires add some interfaces to `BaseVectorStore`, which maybe look like: > > ``` > > def get_or_create_collection(self, name: str, metadata: Dict[str, Any]) -> Dict[str, Any]: > > """Get a collection, create if it doesn't already exist.""" > > > > def get_collection(self, name: str) -> None: > > """Get an existing collection.""" > > > > def update_collection(self, name: str, metadata: Dict[str, Any]) -> None: > > """Update an existing collection.""" > > > > def delete_collection(self, name: str) -> bool: > > """Delete a collection.""" > > > > def add(self, document: Document, collection_name: str | None = None) -> None: > > """Add a document to the collection.""" > > > > def update(self, document: Document, collection_name: str | None = None) -> None: > > """Update a document, can only update metadata.""" > > > > def delete(self, offset: int | None, limit: int | None, ids: List[int] | None = None, **kwargs: Any) -> bool: > > """Delete documents from collection.""" > > > > def get(self, offset: int | None, limit: int | None, collection_name: str | None = None, **kwargs: Any) -> List[Document]: > > """Get documents from collection.""" > > ``` > > > > > > > > > > > > > > > > > > > > > > > > These interface may not be achievable for each vector store, I will conduct research and refinement afterward. WDTY? > > Thanks for raising this design question @wenjin272! I agree with @xintongsong that building long-term memory on top of `BaseVectorStore` would be beneficial for consistency and reusability. > > The current `BaseVectorStore` is intentionally minimal, focusing on read-only semantic search. This design works well for RAG retrieval use cases but indeed lacks the CRUD operations needed for long-term memory management. Adding document-level CRUD operations to `BaseVectorStore` makes sense, as these operations are generally supported across all major vector stores. > > A few thoughts on the proposed methods: > > * add(), delete(), get() - Widely supported, should be included > * update() - I'm not certain about common use cases for updating existing long-term memory entries, but it shouldn't be > problematic to include. > > However, I have concerns about including collection management in the base interface. This concept is vector store specific and may cause integration issues. "Collection" is most common term (e.g. Chroma, Milvus, Qdrant, etc), but Pinecone and Weaviate use different terms(e.g. Class, Namespace, etc). > > My suggestion: Keep collection management implementation specific rather than in the base interface. Each vector store integration can expose collection management using its native terminology and patterns. > > This approach aligns with how `LlamaIndex` and `LangChain` handle vector stores. I'd recommend you reviewing these implementations, they provide good precedents for balancing abstraction with flexibility. > > * LlamaIndex BasePydanticVectorStore: > https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/vector_stores/types.py#L334 > * LangChain VectorStore: https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/base.py#L43 Thanks for your suggestion @alnzng. The reason I want to provide collection level operation in `BaseVectorStore` is `LongTermMemory` can dynamic create or delete `MemorySet`, which is correspond to a `Collection` in vector store. I have reviewed the vector store design in langChain, it indeed doesn't provide collection management in base interface. But if we keep collection management implementation specific rather than in the base interface, the `LongTermMemory` must be aware of what vector store implementation is used when create or get a `MemorySet`. I think all the vector store provide concept like `collection`, they may called `index`, `namespace` or `class`. The collection in `BaseVectorStore` is represent a flink-agents collection, and can be correspond to `collection` in milvus, chroma, quota , `index` in opensearchpy and `class` or `namespace` in pinecone or weaviate. Because collection is the most used name, so we use collection also. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
