irvingzhang edited a comment on issue #1295: Lucene-9004: bug fix for searching 
the nearest one neighbor in higher layers
URL: https://github.com/apache/lucene-solr/pull/1295#issuecomment-592287771
 
 
   > I believe in practice that results. max size is always set to ef, so there 
shouldn't be any real issue. I agree that the interface doesn't make that 
plain; we should enforce this invariant by API contract
   
   I agree that the max size is always set to _ef_, but _ef_ has different 
values in different layers. 
   According to **Algorithm 5** of 
[papar](https://arxiv.org/pdf/1603.09320.pdf),  HNSW searches the nearest one 
(namely, _ef_=1) neighbor from top layer to the 1st layer,  and then finds the 
nearest _ef_ (_ef_=topK) neighbors from layer 0. In the implementation of 
Lucene HNSW, the actual size of result queue (Line 63, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java))
 is set to _ef_=topK when searching from top layer to the 1st layer, result in 
finding more neighbors than expected. Even if the parameter _ef_ is set to 1 in 
Line 66, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java),
 the code `if (dist < f.distance() || results.size() < ef)` (Line 87, 
[HNSWGraph](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraph.java))
 allows insert more than 1 neighbor to the "results" when `dist < f.distance()` 
because the max size of "results" is _ef_=topK, which implies that actual size 
of "results" belongs to [1, topK].
   
   The simplest way to check this problem is to print the actual size of 
neighbors. For example, add "System.out.println(neighbors.size());" after 
"visitedCount += hnsw.searchLayer(query, neighbors, 1, l, vectorValues);" (Line 
66, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java)),
 where the nearest one neighbor is expected, but the actual neighbor size would 
be range from 1~topK. Which also applies to 
[HNSWGraphWriter](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphWriter.java).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to