[ https://issues.apache.org/jira/browse/LUCENE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julie Tibshirani updated LUCENE-9937: ------------------------------------- Description: *NOTE: These benchmarks contained an error that made hnswlib perform too well. See the last comment for correct results.* I hooked up our HNSW implementation to [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used repo for benchmarking nearest neighbor search libraries against large datasets. I found the results interesting and opened this issue to share and discuss. My benchmarking code can be found [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy to commit but I’m happy to help anyone get set up with it. Approaches * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a brute force scan to determine nearest neighbors * LuceneHnsw: our HNSW implementation * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation from the author of the paper Datasets * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, comparing euclidean distance * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, comparing cosine similarity *Results on sift-128-euclidean* Parameters: M=16, efConstruction=500 {code:java} Approach Recall QPS LuceneVectorsOnly() 1.000 6.764 LuceneHnsw(n_cands=10) 0.603 7736.968 LuceneHnsw(n_cands=50) 0.890 3605.000 LuceneHnsw(n_cands=100) 0.953 2237.429 LuceneHnsw(n_cands=500) 0.996 570.900 LuceneHnsw(n_cands=800) 0.998 379.589 hnswlib(n_cands=10) 0.713 69662.756 hnswlib(n_cands=50) 0.950 28021.582 hnswlib(n_cands=100) 0.985 16108.538 hnswlib(n_cands=500) 1.000 4115.886 hnswlib(n_cands=800) 1.000 2729.680 {code} *Results on glove-100-angular* Parameters: M=32, efConstruction=500 {code:java} Approach Recall QPS LuceneVectorsOnly() 1.000 6.722 LuceneHnsw(n_cands=10) 0.507 5036.236 LuceneHnsw(n_cands=50) 0.760 2099.850 LuceneHnsw(n_cands=100) 0.833 1233.690 LuceneHnsw(n_cands=500) 0.941 309.077 LuceneHnsw(n_cands=800) 0.961 203.782 hnswlib(n_cands=10) 0.597 43543.345 hnswlib(n_cands=50) 0.832 14719.165 hnswlib(n_cands=100) 0.897 8299.948 hnswlib(n_cands=500) 0.981 1931.985 hnswlib(n_cands=800) 0.991 881.752 {code} Notes on benchmark: * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and computes the recall against the true neighbors. The recall calculation has a small 'fudge factor' that allows neighbors that are within a small epsilon of the best distance. Queries are executed serially to obtain the QPS. * I chose parameters where hnswlib performed well, then passed these same parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and beamWidth as efConstruction. For search parameters, I set k to k, and fanout as (num_cands - k) so that the beam search is of size num_cands. Note that our default value for beamWidth is 16, which is really low – I wasn’t able to obtain acceptable recall until I bumped it to closer to 500 to match the hnswlib default. * I force-merged to one segment before running searches since this gives the best recall + QPS, and also to match hnswlib. Some observations: * It'd be really nice to extend luceneutil to measure vector search recall in addition to latency. That would help ensure we’re benchmarking a more realistic scenario, instead of accidentally indexing/ searching at a very low recall. Tracking recall would also guard against subtle, unintentional changes to the algorithm. It's easy to make an accidental change while refactoring, and with approximate algorithms, unit tests don't guard as well against this. * Lucene HNSW gives a great speed-up over the baseline without sacrificing too much recall. But it doesn't perform as well as hnswlib in terms of both recall and QPS. We wouldn’t expect the results to line up perfectly, since Lucene doesn't actually implement HNSW – the current algorithm isn't actually hierarchical and only uses a single graph layer. Does this difference might indicate we're leaving performance 'on the table' by not using layers, which (I don't think) adds much index time or space? Or are there other algorithmic improvements would help close the gap? * Setting beamWidth to 500 *really* slowed down indexing. I'll open a separate issue with indexing speed results, keeping this one focused on search. was: I hooked up our HNSW implementation to [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used repo for benchmarking nearest neighbor search libraries against large datasets. I found the results interesting and opened this issue to share and discuss. My benchmarking code can be found [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy to commit but I’m happy to help anyone get set up with it. Approaches * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a brute force scan to determine nearest neighbors * LuceneHnsw: our HNSW implementation * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation from the author of the paper Datasets * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, comparing euclidean distance * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, comparing cosine similarity *Results on sift-128-euclidean* Parameters: M=16, efConstruction=500 {code:java} Approach Recall QPS LuceneVectorsOnly() 1.000 6.764 LuceneHnsw(n_cands=10) 0.603 7736.968 LuceneHnsw(n_cands=50) 0.890 3605.000 LuceneHnsw(n_cands=100) 0.953 2237.429 LuceneHnsw(n_cands=500) 0.996 570.900 LuceneHnsw(n_cands=800) 0.998 379.589 hnswlib(n_cands=10) 0.713 69662.756 hnswlib(n_cands=50) 0.950 28021.582 hnswlib(n_cands=100) 0.985 16108.538 hnswlib(n_cands=500) 1.000 4115.886 hnswlib(n_cands=800) 1.000 2729.680 {code} *Results on glove-100-angular* Parameters: M=32, efConstruction=500 {code:java} Approach Recall QPS LuceneVectorsOnly() 1.000 6.722 LuceneHnsw(n_cands=10) 0.507 5036.236 LuceneHnsw(n_cands=50) 0.760 2099.850 LuceneHnsw(n_cands=100) 0.833 1233.690 LuceneHnsw(n_cands=500) 0.941 309.077 LuceneHnsw(n_cands=800) 0.961 203.782 hnswlib(n_cands=10) 0.597 43543.345 hnswlib(n_cands=50) 0.832 14719.165 hnswlib(n_cands=100) 0.897 8299.948 hnswlib(n_cands=500) 0.981 1931.985 hnswlib(n_cands=800) 0.991 881.752 {code} Notes on benchmark: * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and computes the recall against the true neighbors. The recall calculation has a small 'fudge factor' that allows neighbors that are within a small epsilon of the best distance. Queries are executed serially to obtain the QPS. * I chose parameters where hnswlib performed well, then passed these same parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and beamWidth as efConstruction. For search parameters, I set k to k, and fanout as (num_cands - k) so that the beam search is of size num_cands. Note that our default value for beamWidth is 16, which is really low – I wasn’t able to obtain acceptable recall until I bumped it to closer to 500 to match the hnswlib default. * I force-merged to one segment before running searches since this gives the best recall + QPS, and also to match hnswlib. Some observations: * It'd be really nice to extend luceneutil to measure vector search recall in addition to latency. That would help ensure we’re benchmarking a more realistic scenario, instead of accidentally indexing/ searching at a very low recall. Tracking recall would also guard against subtle, unintentional changes to the algorithm. It's easy to make an accidental change while refactoring, and with approximate algorithms, unit tests don't guard as well against this. * Lucene HNSW gives a great speed-up over the baseline without sacrificing too much recall. But it doesn't perform as well as hnswlib in terms of both recall and QPS. We wouldn’t expect the results to line up perfectly, since Lucene doesn't actually implement HNSW – the current algorithm isn't actually hierarchical and only uses a single graph layer. Does this difference might indicate we're leaving performance 'on the table' by not using layers, which (I don't think) adds much index time or space? Or are there other algorithmic improvements would help close the gap? * Setting beamWidth to 500 *really* slowed down indexing. I'll open a separate issue with indexing speed results, keeping this one focused on search. > ann-benchmarks results for HNSW search > -------------------------------------- > > Key: LUCENE-9937 > URL: https://issues.apache.org/jira/browse/LUCENE-9937 > Project: Lucene - Core > Issue Type: Task > Reporter: Julie Tibshirani > Priority: Minor > > *NOTE: These benchmarks contained an error that made hnswlib perform too > well. See the last comment for correct results.* > I hooked up our HNSW implementation to > [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used > repo for benchmarking nearest neighbor search libraries against large > datasets. I found the results interesting and opened this issue to share and > discuss. My benchmarking code can be found > [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy > to commit but I’m happy to help anyone get set up with it. > Approaches > * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a > brute force scan to determine nearest neighbors > * LuceneHnsw: our HNSW implementation > * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation > from the author of the paper > Datasets > * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128, > comparing euclidean distance > * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100, > comparing cosine similarity > *Results on sift-128-euclidean* > Parameters: M=16, efConstruction=500 > {code:java} > Approach Recall QPS > LuceneVectorsOnly() 1.000 6.764 > LuceneHnsw(n_cands=10) 0.603 7736.968 > LuceneHnsw(n_cands=50) 0.890 3605.000 > LuceneHnsw(n_cands=100) 0.953 2237.429 > LuceneHnsw(n_cands=500) 0.996 570.900 > LuceneHnsw(n_cands=800) 0.998 379.589 > hnswlib(n_cands=10) 0.713 69662.756 > hnswlib(n_cands=50) 0.950 28021.582 > hnswlib(n_cands=100) 0.985 16108.538 > hnswlib(n_cands=500) 1.000 4115.886 > hnswlib(n_cands=800) 1.000 2729.680 > {code} > *Results on glove-100-angular* > Parameters: M=32, efConstruction=500 > {code:java} > Approach Recall QPS > LuceneVectorsOnly() 1.000 6.722 > LuceneHnsw(n_cands=10) 0.507 5036.236 > LuceneHnsw(n_cands=50) 0.760 2099.850 > LuceneHnsw(n_cands=100) 0.833 1233.690 > LuceneHnsw(n_cands=500) 0.941 309.077 > LuceneHnsw(n_cands=800) 0.961 203.782 > hnswlib(n_cands=10) 0.597 43543.345 > hnswlib(n_cands=50) 0.832 14719.165 > hnswlib(n_cands=100) 0.897 8299.948 > hnswlib(n_cands=500) 0.981 1931.985 > hnswlib(n_cands=800) 0.991 881.752 > {code} > Notes on benchmark: > * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and > computes the recall against the true neighbors. The recall calculation has a > small 'fudge factor' that allows neighbors that are within a small epsilon of > the best distance. Queries are executed serially to obtain the QPS. > * I chose parameters where hnswlib performed well, then passed these same > parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and > beamWidth as efConstruction. For search parameters, I set k to k, and fanout > as (num_cands - k) so that the beam search is of size num_cands. Note that > our default value for beamWidth is 16, which is really low – I wasn’t able to > obtain acceptable recall until I bumped it to closer to 500 to match the > hnswlib default. > * I force-merged to one segment before running searches since this gives the > best recall + QPS, and also to match hnswlib. > Some observations: > * It'd be really nice to extend luceneutil to measure vector search recall > in addition to latency. That would help ensure we’re benchmarking a more > realistic scenario, instead of accidentally indexing/ searching at a very low > recall. Tracking recall would also guard against subtle, unintentional > changes to the algorithm. It's easy to make an accidental change while > refactoring, and with approximate algorithms, unit tests don't guard as well > against this. > * Lucene HNSW gives a great speed-up over the baseline without sacrificing > too much recall. But it doesn't perform as well as hnswlib in terms of both > recall and QPS. We wouldn’t expect the results to line up perfectly, since > Lucene doesn't actually implement HNSW – the current algorithm isn't actually > hierarchical and only uses a single graph layer. Does this difference might > indicate we're leaving performance 'on the table' by not using layers, which > (I don't think) adds much index time or space? Or are there other algorithmic > improvements would help close the gap? > * Setting beamWidth to 500 *really* slowed down indexing. I'll open a > separate issue with indexing speed results, keeping this one focused on > search. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org