Copilot commented on code in PR #240:
URL:
https://github.com/apache/incubator-hugegraph-ai/pull/240#discussion_r2107107539
##########
hugegraph-llm/src/hugegraph_llm/operators/index_op/semantic_id_query.py:
##########
@@ -75,18 +78,57 @@ def _exact_match_vids(self, keywords: List[str]) ->
Tuple[List[str], List[str]]:
def _fuzzy_match_vids(self, keywords: List[str]) -> List[str]:
fuzzy_match_result = []
for keyword in keywords:
- keyword_vector = self.embedding.get_text_embedding(keyword)
- results = self.vector_index.search(keyword_vector,
top_k=self.topk_per_keyword,
+ keyword_vector = self.embedding.get_texts_embeddings([keyword])
Review Comment:
Ensure that 'get_texts_embeddings' returns a non-empty list before accessing
its first element to avoid potential IndexError.
```suggestion
keyword_vector = self.embedding.get_texts_embeddings([keyword])
if not keyword_vector: # Ensure the list is non-empty
log.warning("No embeddings found for keyword: %s", keyword)
continue
```
##########
hugegraph-llm/src/hugegraph_llm/operators/index_op/build_semantic_index.py:
##########
@@ -90,4 +168,52 @@ def run(self, context: Dict[str, Any]) -> Dict[str, Any]:
"removed_vid_vector_num": removed_num,
"added_vid_vector_num": len(added_vids)
})
+
+ if context["index_labels"]:
+ present_prop_value_to_propset = self.get_present_props(context)
+ # log.debug("present_prop_value_to_propset: %s",
present_prop_value_to_propset)
+ past_prop_value_to_propset = self.get_past_props()
+ # log.debug("past_prop_value_to_propset: %s",
past_prop_value_to_propset)
+ to_add, to_update, to_remove, to_update_remove =
self.diff_property_sets(
+ present_prop_value_to_propset,
+ past_prop_value_to_propset
+ )
+ log.debug("to_add: %s", to_add)
+ log.debug("to_update: %s", to_update)
+ log.debug("to_remove: %s", to_remove)
+ log.debug("to_update_remove: %s", to_update_remove)
+ log.info("Removing %s outdated property value", len(to_remove))
+ removed_props_num = self.prop_index.remove(to_remove)
+ if removed_props_num:
+ self.prop_index.to_index_file(self.index_dir_prop)
+ all_to_add = to_add + to_update
+ add_propsets = []
+ add_prop_values = []
+ for prop_value, propset in all_to_add:
+ add_propsets.append(propset)
+ add_prop_values.append(prop_value)
+ if add_prop_values:
+ if len(add_prop_values) > 100000:
Review Comment:
[nitpick] Consider refactoring the hardcoded property update limit (100000)
into a configurable parameter to improve maintainability and flexibility.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]