alessandrobenedetti commented on code in PR #12246:
URL: https://github.com/apache/lucene/pull/12246#discussion_r1180201182
##########
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java:
##########
@@ -64,7 +65,15 @@ public Word2VecSynonymProvider(Word2VecModel model) throws
IOException {
this.hnswGraph = builder.build(word2VecModel.copy());
}
- public List<TermAndBoost> getSynonyms(
+ /**
+ * Returns the list of synonyms of a provided term. This method is
synchronized because it uses
+ * the {@link org.apache.lucene.util.hnsw.OnHeapHnswGraph} that is not
thread-safe.
+ *
+ * @param term term to search to find synonyms
+ * @param maxSynonymsPerTerm limit of synonyms returned
+ * @param minAcceptedSimilarity lower similarity threshold to consider
another term as synonym
+ */
+ public synchronized List<TermAndBoost> getSynonyms(
Review Comment:
So, I spent some time this morning trying to think about a more elegant way
to communicate that currently, the OnHeap graph is not thread-safe.
The latest commit is just tentative, to discuss with you, if it's more
complicated than expected I can just revert it and leave the mitigation in the
Word2Vec stuff.
My main intent is to automatically get a benefit for the Word2Vec token
filter if anyone in the future contributes a thread-safe version of the graph.
If we put the synchronized in the word2vec and someone fixes the
thread-safety problem, we may never realize it, and word2bec remain
unnecessarily slower than it should be.
Maybe adding this synchronized block on the graph, may do the trick?
So if someone implements thread safety, then he/she changes that block?
I know synchronized may slow down a bit the things, so to be honest I am
looking for ideas :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]