MrJs133 opened a new pull request, #243: URL: https://github.com/apache/incubator-hugegraph-ai/pull/243
Bug Report After deleting vertices, running `update_vid_embedding` does **not** successfully remove the corresponding vectors. --- ### Initial State * 13 vertices * 13 `vid` embeddings --- ### After Clearing Graph Data * 0 vertices --- ### After `update_vid_embedding` * Output shows: “Removed 13 vectors”  However, calling `get_vector_index_info` still shows 13 `vid` embeddings.  --- ### Problem Location Function `get_vector_index_info`: ```python def get_vector_index_info(): chunk_vector_index = VectorIndex.from_index_file( os.path.join(resource_path, huge_settings.graph_name, "chunks") ) graph_vid_vector_index = VectorIndex.from_index_file( os.path.join(resource_path, huge_settings.graph_name, "graph_vids") ) return json.dumps({ "embed_dim": chunk_vector_index.index.d, "vector_info": { "chunk_vector_num": chunk_vector_index.index.ntotal, "graph_vid_vector_num": graph_vid_vector_index.index.ntotal, "graph_properties_vector_num": len(chunk_vector_index.properties) } }, ensure_ascii=False, indent=2) ``` This logic is correct. It reads `graph_vid_vector_num` from `graph_vid_vector_index.index.ntotal`. --- ### `update_vid_embedding` Code Analysis ```python past_vids = self.vid_index.properties present_vids = context["vertices"] removed_vids = set(past_vids) - set(present_vids) removed_num = self.vid_index.remove(removed_vids) added_vids = list(set(present_vids) - set(past_vids)) ``` This correctly identifies vectors to be removed and added. `self.vid_index.remove()` implementation: ```python def remove(self, props: Union[Set[Any], List[Any]]) -> int: if isinstance(props, list): props = set(props) indices = [] remove_num = 0 for i, p in enumerate(self.properties): if p in props: indices.append(i) remove_num += 1 self.index.remove_ids(np.array(indices)) self.properties = [p for i, p in enumerate(self.properties) if i not in indices] return remove_num ``` This also seems correct. --- ### Debug Output ```python log.debug("before %s", self.vid_index.index.ntotal) removed_num = self.vid_index.remove(removed_vids) log.debug("after %s", self.vid_index.index.ntotal) ``` Output: ``` [05/20/25 13:51:20] DEBUG before 13 [05/20/25 13:51:20] DEBUG after 0 ``` → This confirms that in-memory deletion is successful. However, re-running `update_vid_embedding` again shows: ``` [05/20/25 13:53:23] DEBUG before 13 [05/20/25 13:53:23] DEBUG after 0 ``` → Confirms that the vector index file still contains 13 vectors (i.e., deletion was not persisted). And this is **verified by loading the index from file** via: ```python self.index_dir = os.path.join(resource_path, huge_settings.graph_name, "graph_vids") self.vid_index = VectorIndex.from_index_file(self.index_dir) log.debug("after %s", self.vid_index.index.ntotal) ``` Note: The result of this deletion was not saved. --- ### Root Cause In the full `BuildSemanticIndex.run()` implementation: ```python removed_vids = set(past_vids) - set(present_vids) removed_num = self.vid_index.remove(removed_vids) added_vids = list(set(present_vids) - set(past_vids)) if added_vids: ... self.vid_index.add(...) self.vid_index.to_index_file(self.index_dir) else: log.debug("No update vertices to build vector index.") ``` The call to `self.vid_index.to_index_file(self.index_dir)` only happens **if `added_vids` is non-empty**. So if you're only removing embeddings (i.e., no new vertices), the deletion is never persisted to disk. --- ### Fix ```python removed_num = self.vid_index.remove(removed_vids) self.vid_index.to_index_file(self.index_dir) # <-- Add this line ``` --- ### Verification * **Remove only**: works ✅ * **Add only**: works ✅ * **No change**: works ✅ Problem solved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
