Michael Sokolov created LUCENE-9644:
---------------------------------------
Summary: HNSW diverse neighbor selection heuristic
Key: LUCENE-9644
URL: https://issues.apache.org/jira/browse/LUCENE-9644
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael Sokolov
This will replace the simple nearest neighbor selection with a criterion that
takes into account the distance of the neighbors from each other. It is seen to
provide dramatically improved recall on at least two datasets, and is what is
being used by our reference implementation, hnswlib. The basic idea is that
when selecting the nearest neighbors to associate with a new node added to the
graph, we filter using a diversity criterion. If a candidate neighbor is closer
to an already-added (closer to the new node) neighbor than it is to the new
node, then we pass over it, moving on to more-distant, but presumably more
diverse neighbors. The same criterion is also (re-) applied to the neighbor
nodes' neighbors, since we add the links bidirectionally.
## Results:
### GloVe/Wikipedia
#### baseline
||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index
ms||
|0.643| 0.77| 100000| 50| 32| 64| 1742| 22830|
|0.671| 0.95| 100000| 100| 32| 64| 2141| 0|
|0.704| 1.32| 100000| 200| 32| 64| 2923| 0|
|0.739| 2.04| 100000| 400| 32| 64| 4382| 0|
|0.470| 0.91| 1000000| 50| 32| 64| 2068| 337081|
|0.496| 1.21| 1000000| 100| 32| 64| 2548| 0|
|0.533| 1.77| 1000000| 200| 32| 64| 3479| 0|
|0.573| 2.58| 1000000| 400| 32| 64| 5257| 0|
#### diverse
||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index
ms||
|0.801| 0.57| 100000| 50| 32| 64| 593| 17985|
|0.840| 0.67| 100000| 100| 32| 64| 738| 0|
|0.883| 0.97| 100000| 200| 32| 64| 1018| 0|
|0.921| 1.36| 100000| 400| 32| 64| 1502| 0|
|0.723| 0.71| 1000000| 50| 32| 64| 860| 298383|
|0.761| 0.77| 1000000| 100| 32| 64| 1058| 0|
|0.806| 1.06| 1000000| 200| 32| 64| 1442| 0|
|0.854| 1.67| 1000000| 400| 32| 64| 2159| 0|
### Dataset from work:
#### baseline
||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index
ms||
|0.933| 1.41| 100000| 50| 32| 64| 1496| 35462|
|0.948| 1.39| 100000| 100| 32| 64| 1872| 0|
|0.961| 2.10| 100000| 200| 32| 64| 2591| 0|
|0.972| 3.04| 100000| 400| 32| 64| 3939| 0|
|0.827| 1.34| 1000000| 50| 32| 64| 1676| 535802|
|0.854| 1.76| 1000000| 100| 32| 64| 2056| 0|
|0.887| 2.47| 1000000| 200| 32| 64| 2761| 0|
|0.907| 3.75| 1000000| 400| 32| 64| 4129| 0|
#### diverse
||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index
ms||
|0.966| 1.18| 100000| 50| 32| 64| 1480| 37656|
|0.977| 1.46| 100000| 100| 32| 64| 1832| 0|
|0.988| 2.00| 100000| 200| 32| 64| 2472| 0|
|0.995| 3.14| 100000| 400| 32| 64| 3629| 0|
|0.944| 1.34| 1000000| 50| 32| 64| 1780| 526834|
|0.959| 1.71| 1000000| 100| 32| 64| 2222| 0|
|0.975| 2.30| 1000000| 200| 32| 64| 3041| 0|
|0.986| 3.56| 1000000| 400| 32| 64| 4543| 0|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]