wenjin272 commented on PR #332:
URL: https://github.com/apache/flink-agents/pull/332#issuecomment-3604787435

   > > Hi, @alnzng. There's a design issue related to the vector store that I'd 
appreciate your help reviewing.
   > > As describe in the design doc #339, long-term memory of flink-agents is 
also based on vector store. Currently, I provide an implementation based on 
chroma. In this implementation, I directly use chroma client rather than 
flink-agents `BaseVectorStore`, because there are some long-term memory needed 
interface not provided in `BaseVectorStore`.
   > > @xintongsong believes that we can directly build upon the Flink-Agents 
`BaseVectorStore`. Thus, we can support using any already supported vector 
store as the backend for long-term memory.
   > > I think this make sense, but it requires add some interfaces to 
`BaseVectorStore`, which maybe look like:
   > > ```
   > > def get_or_create_collection(self, name: str, metadata: Dict[str, Any]) 
-> Dict[str, Any]:
   > >     """Get a collection, create if it doesn't already exist."""
   > >     
   > > def get_collection(self, name: str) -> None:
   > >     """Get an existing collection."""
   > >     
   > > def update_collection(self, name: str, metadata: Dict[str, Any]) -> None:
   > >     """Update an existing collection."""
   > > 
   > > def delete_collection(self, name: str) -> bool:
   > >     """Delete a collection."""
   > >     
   > > def add(self, document: Document, collection_name: str | None = None) -> 
None:
   > >     """Add a document to the collection."""
   > > 
   > > def update(self, document: Document, collection_name: str | None = None) 
-> None:
   > >     """Update a document, can only update metadata."""
   > > 
   > > def delete(self, offset: int | None, limit: int | None, ids: List[int] | 
None = None, **kwargs: Any) -> bool:
   > >     """Delete documents from collection."""
   > > 
   > > def get(self, offset: int | None, limit: int | None, collection_name: 
str | None = None, **kwargs: Any) -> List[Document]:
   > >     """Get documents from collection."""
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > These interface may not be achievable for each vector store, I will 
conduct research and refinement afterward. WDTY?
   > 
   > Thanks for raising this design question @wenjin272! I agree with 
@xintongsong that building long-term memory on top of `BaseVectorStore` would 
be beneficial for consistency and reusability.
   > 
   > The current `BaseVectorStore` is intentionally minimal, focusing on 
read-only semantic search. This design works well for RAG retrieval use cases 
but indeed lacks the CRUD operations needed for long-term memory management. 
Adding document-level CRUD operations to `BaseVectorStore` makes sense, as 
these operations are generally supported across all major vector stores.
   > 
   > A few thoughts on the proposed methods:
   > 
   > * add(), delete(), get() - Widely supported, should be included
   > * update() - I'm not certain about common use cases for updating existing 
long-term memory entries, but it shouldn't be
   >   problematic to include.
   > 
   > However, I have concerns about including collection management in the base 
interface. This concept is vector store specific and may cause integration 
issues. "Collection" is most common term (e.g. Chroma, Milvus, Qdrant, etc), 
but Pinecone and Weaviate use different terms(e.g. Class, Namespace, etc).
   > 
   > My suggestion: Keep collection management implementation specific rather 
than in the base interface. Each vector store integration can expose collection 
management using its native terminology and patterns.
   > 
   > This approach aligns with how `LlamaIndex` and `LangChain` handle vector 
stores. I'd recommend you reviewing these implementations, they provide good 
precedents for balancing abstraction with flexibility.
   > 
   > * LlamaIndex BasePydanticVectorStore:
   >   
https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/vector_stores/types.py#L334
   > * LangChain VectorStore: 
https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/base.py#L43
   
   Thanks for your suggestion @alnzng. 
   
   The reason I want to provide collection level operation in `BaseVectorStore` 
is `LongTermMemory` can dynamic create or delete `MemorySet`, which is 
correspond to a `Collection` in vector store.
   
   I have reviewed the vector store design in langChain, it indeed doesn't 
provide collection management in base interface. But if we keep collection 
management implementation specific rather than in the base interface, the 
`LongTermMemory` must be aware of what vector store implementation is used when 
create or get a `MemorySet`.
   
   I think all the vector store provide concept like `collection`, they may 
called `index`, `namespace` or `class`. The collection in `BaseVectorStore` is 
represent a flink-agents collection, and can be correspond to `collection` in 
milvus, chroma, quota , `index` in opensearchpy and `class` or `namespace` in  
pinecone or weaviate. Because collection is the most used name, so we use 
collection also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to