[ https://issues.apache.org/jira/browse/LUCENE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397415#comment-17397415 ]
Julie Tibshirani edited comment on LUCENE-9937 at 8/11/21, 3:10 PM: -------------------------------------------------------------------- It turns out my benchmarks were way too generous to hnswlib!! I was running them in "--batch" mode, which sends all queries at once. I reran hnswlib in the default "serial" mode, and now see more similar results to you. Although I had disabled multi-threading, some parallelism must have snuck in. We're still behind in terms of recall, but the QPS gap is much less. *Results on sift-128-euclidean* Parameters: M=16, efConstruction=500 {code:java} Approach Recall QPS LuceneVectorsOnly() 1.000 6.764 LuceneHnsw(n_cands=10) 0.603 9045.574 LuceneHnsw(n_cands=50) 0.890 3789.169 LuceneHnsw(n_cands=100) 0.942 2333.714 LuceneHnsw(n_cands=500) 0.996 579.850 LuceneHnsw(n_cands=800) 0.998 386.105 hnswlib(n_cands=10) 0.710 22628.980 hnswlib(n_cands=50) 0.950 8640.336 hnswlib(n_cands=100) 0.985 5026.611 hnswlib(n_cands=500) 1.000 1289.108 hnswlib(n_cands=800) 1.000 856.515 {code} *Results on glove-100-angular* Parameters: M=32, efConstruction=500 {code:java} Approach Recall QPS LuceneVectorsOnly() 1.000 6.722 LuceneHnsw(n_cands=10) 0.507 5036.236 LuceneHnsw(n_cands=50) 0.760 2099.850 LuceneHnsw(n_cands=100) 0.833 1233.690 LuceneHnsw(n_cands=500) 0.941 309.077 LuceneHnsw(n_cands=800) 0.961 203.782 hnswlib(n_cands=10) 0.601 12877.861 hnswlib(n_cands=50) 0.833 4553.943 hnswlib(n_cands=100) 0.896 2618.098 hnswlib(n_cands=500) 0.981 634.301 hnswlib(n_cands=800) 0.991 412.333 {code} I'm going to close out this issue since I think we've learned what we can from digging into these benchmarks. My thoughts on conclusions: * Lucene HNSW gives a great speed-up over the baseline at good recall. * We may be missing important algorithmic details from hnswlib that hurt recall and cause us to perform more computations. * Hot spots: decoding vectors from their on-disk format and performing distance calculations. was (Author: julietibs): It turns out my benchmarks were way too generous to hnswlib!! I was running them in "--batch" mode, which sends all queries at once. I reran hnswlib in the default "serial" mode, and now see more similar results to you. Although I had disabled multi-threading, some parallelism must have snuck in. We're still behind in terms of recall, but the QPS gap is much less. *Results on sift-128-euclidean* Parameters: M=16, efConstruction=500 {code:java} Approach Recall QPS LuceneVectorsOnly() 1.000 6.764 LuceneHnsw(n_cands=10) 0.603 9045.574 LuceneHnsw(n_cands=50) 0.890 3789.169 LuceneHnsw(n_cands=100) 0.942 2333.714 LuceneHnsw(n_cands=500) 0.996 579.850 LuceneHnsw(n_cands=800) 0.998 386.105 hnswlib(n_cands=10) 0.710 22628.980 hnswlib(n_cands=50) 0.950 8640.336 hnswlib(n_cands=100) 0.985 5026.611 hnswlib(n_cands=500) 1.000 1289.108 hnswlib(n_cands=800) 1.000 856.515 {code} *Results on glove-100-angular* Parameters: M=32, efConstruction=500 {code:java} Approach Recall QPS LuceneVectorsOnly() 1.000 6.722 LuceneHnsw(n_cands=10) 0.507 5036.236 LuceneHnsw(n_cands=50) 0.760 2099.850 LuceneHnsw(n_cands=100) 0.833 1233.690 LuceneHnsw(n_cands=500) 0.941 309.077 LuceneHnsw(n_cands=800) 0.961 203.782 hnswlib(n_cands=10) 0.601 12877.861 hnswlib(n_cands=50) 0.833 4553.943 hnswlib(n_cands=100) 0.896 2618.098 hnswlib(n_cands=500) 0.981 634.301 hnswlib(n_cands=800) 0.991 412.333 {code} I'm going to close out this issue since I think we've learned what we can from digging into these benchmarks. My thoughts on conclusions: * Lucene HNSW gives a great speed-up over the baseline at good recall. * We may be missing important algorithmic details from hnswlib that hurt recall and cause us to perform more computations. * Hot spots: decoding vectors from their on-disk format and performing distance calculations. > ann-benchmarks results for HNSW search > -------------------------------------- > > Key: LUCENE-9937 > URL: https://issues.apache.org/jira/browse/LUCENE-9937 > Project: Lucene - Core > Issue Type: Task > Reporter: Julie Tibshirani > Priority: Minor > > *NOTE: These benchmarks contained an error that made hnswlib perform too > well. See the last comment for correct results.* > I hooked up our HNSW implementation to > [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used > repo for benchmarking nearest neighbor search libraries against large > datasets. I found the results interesting and opened this issue to share and > discuss. My benchmarking code can be found > [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy > to commit but I’m happy to help anyone get set up with it. > Approaches > * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a > brute force scan to determine nearest neighbors > * LuceneHnsw: our HNSW implementation > * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation > from the author of the paper > Datasets > * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, > comparing euclidean distance > * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, > comparing cosine similarity > *Results on sift-128-euclidean* > Parameters: M=16, efConstruction=500 > {code:java} > Approach Recall QPS > LuceneVectorsOnly() 1.000 6.764 > LuceneHnsw(n_cands=10) 0.603 7736.968 > LuceneHnsw(n_cands=50) 0.890 3605.000 > LuceneHnsw(n_cands=100) 0.953 2237.429 > LuceneHnsw(n_cands=500) 0.996 570.900 > LuceneHnsw(n_cands=800) 0.998 379.589 > hnswlib(n_cands=10) 0.713 69662.756 > hnswlib(n_cands=50) 0.950 28021.582 > hnswlib(n_cands=100) 0.985 16108.538 > hnswlib(n_cands=500) 1.000 4115.886 > hnswlib(n_cands=800) 1.000 2729.680 > {code} > *Results on glove-100-angular* > Parameters: M=32, efConstruction=500 > {code:java} > Approach Recall QPS > LuceneVectorsOnly() 1.000 6.722 > LuceneHnsw(n_cands=10) 0.507 5036.236 > LuceneHnsw(n_cands=50) 0.760 2099.850 > LuceneHnsw(n_cands=100) 0.833 1233.690 > LuceneHnsw(n_cands=500) 0.941 309.077 > LuceneHnsw(n_cands=800) 0.961 203.782 > hnswlib(n_cands=10) 0.597 43543.345 > hnswlib(n_cands=50) 0.832 14719.165 > hnswlib(n_cands=100) 0.897 8299.948 > hnswlib(n_cands=500) 0.981 1931.985 > hnswlib(n_cands=800) 0.991 881.752 > {code} > Notes on benchmark: > * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and > computes the recall against the true neighbors. The recall calculation has a > small 'fudge factor' that allows neighbors that are within a small epsilon of > the best distance. Queries are executed serially to obtain the QPS. > * I chose parameters where hnswlib performed well, then passed these same > parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and > beamWidth as efConstruction. For search parameters, I set k to k, and fanout > as (num_cands - k) so that the beam search is of size num_cands. Note that > our default value for beamWidth is 16, which is really low – I wasn’t able to > obtain acceptable recall until I bumped it to closer to 500 to match the > hnswlib default. > * I force-merged to one segment before running searches since this gives the > best recall + QPS, and also to match hnswlib. > Some observations: > * It'd be really nice to extend luceneutil to measure vector search recall > in addition to latency. That would help ensure we’re benchmarking a more > realistic scenario, instead of accidentally indexing/ searching at a very low > recall. Tracking recall would also guard against subtle, unintentional > changes to the algorithm. It's easy to make an accidental change while > refactoring, and with approximate algorithms, unit tests don't guard as well > against this. > * Lucene HNSW gives a great speed-up over the baseline without sacrificing > too much recall. But it doesn't perform as well as hnswlib in terms of both > recall and QPS. We wouldn’t expect the results to line up perfectly, since > Lucene doesn't actually implement HNSW – the current algorithm isn't actually > hierarchical and only uses a single graph layer. Does this difference might > indicate we're leaving performance 'on the table' by not using layers, which > (I don't think) adds much index time or space? Or are there other algorithmic > improvements would help close the gap? > * Setting beamWidth to 500 *really* slowed down indexing. I'll open a > separate issue with indexing speed results, keeping this one focused on > search. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org