[jira] [Comment Edited] (LUCENE-9937) ann-benchmarks results for HNSW search

Julie Tibshirani (Jira) Wed, 11 Aug 2021 08:11:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397415#comment-17397415
 ]


Julie Tibshirani edited comment on LUCENE-9937 at 8/11/21, 3:10 PM:
--------------------------------------------------------------------

It turns out my benchmarks were way too generous to hnswlib!! I was running 
them in "--batch" mode, which sends all queries at once. I reran hnswlib in the 
default "serial" mode, and now see more similar results to you. Although I had 
disabled multi-threading, some parallelism must have snuck in. We're still 
behind in terms of recall, but the QPS gap is much less.

*Results on sift-128-euclidean*
 Parameters: M=16, efConstruction=500
{code:java}
Approach                 Recall  QPS
LuceneVectorsOnly()      1.000        6.764

LuceneHnsw(n_cands=10)   0.603     9045.574
LuceneHnsw(n_cands=50)   0.890     3789.169
LuceneHnsw(n_cands=100)  0.942     2333.714
LuceneHnsw(n_cands=500)  0.996      579.850 
LuceneHnsw(n_cands=800)  0.998      386.105

hnswlib(n_cands=10)      0.710    22628.980  
hnswlib(n_cands=50)      0.950     8640.336  
hnswlib(n_cands=100)     0.985     5026.611
hnswlib(n_cands=500)     1.000     1289.108
hnswlib(n_cands=800)     1.000      856.515
{code}
*Results on glove-100-angular*
 Parameters: M=32, efConstruction=500
{code:java}
Approach                Recall  QPS
LuceneVectorsOnly()     1.000      6.722

LuceneHnsw(n_cands=10)  0.507   5036.236
LuceneHnsw(n_cands=50)  0.760   2099.850
LuceneHnsw(n_cands=100) 0.833   1233.690
LuceneHnsw(n_cands=500) 0.941    309.077
LuceneHnsw(n_cands=800) 0.961    203.782

hnswlib(n_cands=10)     0.601  12877.861
hnswlib(n_cands=50)     0.833   4553.943
hnswlib(n_cands=100)    0.896   2618.098
hnswlib(n_cands=500)    0.981    634.301
hnswlib(n_cands=800)    0.991    412.333
{code}
I'm going to close out this issue since I think we've learned what we can from 
digging into these benchmarks. My thoughts on conclusions:
 * Lucene HNSW gives a great speed-up over the baseline at good recall.
 * We may be missing important algorithmic details from hnswlib that hurt 
recall and cause us to perform more computations.
 * Hot spots: decoding vectors from their on-disk format and performing 
distance calculations.
  


was (Author: julietibs):
It turns out my benchmarks were way too generous to hnswlib!! I was running 
them in "--batch" mode, which sends all queries at once. I reran hnswlib in the 
default "serial" mode, and now see more similar results to you. Although I had 
disabled multi-threading, some parallelism must have snuck in. We're still 
behind in terms of recall, but the QPS gap is much less.

*Results on sift-128-euclidean*
 Parameters: M=16, efConstruction=500
{code:java}
Approach                 Recall  QPS
LuceneVectorsOnly()     1.000         6.764

LuceneHnsw(n_cands=10)   0.603     9045.574
LuceneHnsw(n_cands=50)   0.890     3789.169
LuceneHnsw(n_cands=100)  0.942     2333.714
LuceneHnsw(n_cands=500)  0.996      579.850 
LuceneHnsw(n_cands=800)  0.998      386.105

hnswlib(n_cands=10)      0.710    22628.980  
hnswlib(n_cands=50)      0.950     8640.336  
hnswlib(n_cands=100)     0.985     5026.611
hnswlib(n_cands=500)     1.000     1289.108
hnswlib(n_cands=800)     1.000      856.515
{code}
*Results on glove-100-angular*
 Parameters: M=32, efConstruction=500
{code:java}
Approach                Recall  QPS
LuceneVectorsOnly()     1.000      6.722

LuceneHnsw(n_cands=10)  0.507   5036.236
LuceneHnsw(n_cands=50)  0.760   2099.850
LuceneHnsw(n_cands=100) 0.833   1233.690
LuceneHnsw(n_cands=500) 0.941    309.077
LuceneHnsw(n_cands=800) 0.961    203.782

hnswlib(n_cands=10)     0.601  12877.861
hnswlib(n_cands=50)     0.833   4553.943
hnswlib(n_cands=100)    0.896   2618.098
hnswlib(n_cands=500)    0.981    634.301
hnswlib(n_cands=800)    0.991    412.333
{code}
I'm going to close out this issue since I think we've learned what we can from 
digging into these benchmarks. My thoughts on conclusions:
 * Lucene HNSW gives a great speed-up over the baseline at good recall.
 * We may be missing important algorithmic details from hnswlib that hurt 
recall and cause us to perform more computations.
 * Hot spots: decoding vectors from their on-disk format and performing 
distance calculations.
  

> ann-benchmarks results for HNSW search
> --------------------------------------
>
>                 Key: LUCENE-9937
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9937
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Julie Tibshirani
>            Priority: Minor
>
> *NOTE: These benchmarks contained an error that made hnswlib perform too 
> well. See the last comment for correct results.*
> I hooked up our HNSW implementation to 
> [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used 
> repo for benchmarking nearest neighbor search libraries against large 
> datasets. I found the results interesting and opened this issue to share and 
> discuss. My benchmarking code can be found 
> [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy 
> to commit but I’m happy to help anyone get set up with it.
> Approaches
>  * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a 
> brute force scan to determine nearest neighbors
>  * LuceneHnsw: our HNSW implementation
>  * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation 
> from the author of the paper
> Datasets
>  * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, 
> comparing euclidean distance
>  * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, 
> comparing cosine similarity
> *Results on sift-128-euclidean*
>  Parameters: M=16, efConstruction=500
> {code:java}
> Approach                Recall  QPS
> LuceneVectorsOnly()     1.000      6.764
> LuceneHnsw(n_cands=10)  0.603   7736.968
> LuceneHnsw(n_cands=50)  0.890   3605.000
> LuceneHnsw(n_cands=100) 0.953   2237.429
> LuceneHnsw(n_cands=500) 0.996    570.900
> LuceneHnsw(n_cands=800) 0.998    379.589
> hnswlib(n_cands=10)     0.713  69662.756
> hnswlib(n_cands=50)     0.950  28021.582
> hnswlib(n_cands=100)    0.985  16108.538
> hnswlib(n_cands=500)    1.000   4115.886
> hnswlib(n_cands=800)    1.000   2729.680
> {code}
> *Results on glove-100-angular*
>  Parameters: M=32, efConstruction=500
> {code:java}
> Approach                Recall  QPS
> LuceneVectorsOnly()     1.000      6.722
> LuceneHnsw(n_cands=10)  0.507   5036.236
> LuceneHnsw(n_cands=50)  0.760   2099.850
> LuceneHnsw(n_cands=100) 0.833   1233.690
> LuceneHnsw(n_cands=500) 0.941    309.077
> LuceneHnsw(n_cands=800) 0.961    203.782
> hnswlib(n_cands=10)     0.597  43543.345
> hnswlib(n_cands=50)     0.832  14719.165
> hnswlib(n_cands=100)    0.897   8299.948
> hnswlib(n_cands=500)    0.981   1931.985
> hnswlib(n_cands=800)    0.991    881.752
> {code}
> Notes on benchmark:
>  * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and 
> computes the recall against the true neighbors. The recall calculation has a 
> small 'fudge factor' that allows neighbors that are within a small epsilon of 
> the best distance. Queries are executed serially to obtain the QPS.
>  * I chose parameters where hnswlib performed well, then passed these same 
> parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and 
> beamWidth as efConstruction. For search parameters, I set k to k, and fanout 
> as (num_cands - k) so that the beam search is of size num_cands. Note that 
> our default value for beamWidth is 16, which is really low – I wasn’t able to 
> obtain acceptable recall until I bumped it to closer to 500 to match the 
> hnswlib default.
>  * I force-merged to one segment before running searches since this gives the 
> best recall + QPS, and also to match hnswlib.
> Some observations:
>  * It'd be really nice to extend luceneutil to measure vector search recall 
> in addition to latency. That would help ensure we’re benchmarking a more 
> realistic scenario, instead of accidentally indexing/ searching at a very low 
> recall. Tracking recall would also guard against subtle, unintentional 
> changes to the algorithm. It's easy to make an accidental change while 
> refactoring, and with approximate algorithms, unit tests don't guard as well 
> against this.
>  * Lucene HNSW gives a great speed-up over the baseline without sacrificing 
> too much recall. But it doesn't perform as well as hnswlib in terms of both 
> recall and QPS. We wouldn’t expect the results to line up perfectly, since 
> Lucene doesn't actually implement HNSW – the current algorithm isn't actually 
> hierarchical and only uses a single graph layer. Does this difference might 
> indicate we're leaving performance 'on the table' by not using layers, which 
> (I don't think) adds much index time or space? Or are there other algorithmic 
> improvements would help close the gap?
>  * Setting beamWidth to 500 *really* slowed down indexing. I'll open a 
> separate issue with indexing speed results, keeping this one focused on 
> search.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9937) ann-benchmarks results for HNSW search

Reply via email to