msokolov commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1561247988
I ran luceneutil with GloVe 300-dim floating point (fp32) vectors over 1M
wikipedia documents:
```
TaskQPS baseline StdDevQPS candidate
StdDev Pct diff p-value
PKLookup 196.01 (3.8%) 192.14
(3.8%) -2.0% ( -9% - 5%) 0.099
LowTermVector 213.57 (7.2%) 252.31
(3.6%) 18.1% ( 6% - 31%) 0.000
AndHighLowVector 185.28 (6.8%) 221.08
(3.5%) 19.3% ( 8% - 31%) 0.000
AndHighMedVector 125.91 (5.7%) 152.52
(2.5%) 21.1% ( 12% - 31%) 0.000
HighTermVector 171.95 (7.3%) 208.94
(3.3%) 21.5% ( 10% - 34%) 0.000
AndHighHighVector 123.87 (5.0%) 151.81
(2.9%) 22.6% ( 14% - 32%) 0.000
MedTermVector 119.07 (7.5%) 148.07
(2.8%) 24.4% ( 13% - 37%) 0.000
```
and with GloVe 100-dim 8-bit vectors
``` TaskQPS baseline StdDevQPS candidate
StdDev Pct diff p-value
PKLookup 190.59 (7.4%) 193.25
(5.1%) 1.4% ( -10% - 14%) 0.486
LowTermVector 291.71 (24.0%) 341.91
(14.3%) 17.2% ( -17% - 73%) 0.006
AndHighMedVector 230.40 (22.6%) 274.26
(13.0%) 19.0% ( -13% - 70%) 0.001
MedTermVector 245.36 (22.7%) 292.35
(11.9%) 19.2% ( -12% - 69%) 0.001
HighTermVector 296.45 (25.6%) 357.02
(9.8%) 20.4% ( -11% - 75%) 0.001
AndHighLowVector 252.70 (23.2%) 308.05
(13.7%) 21.9% ( -12% - 76%) 0.000
AndHighHighVector 150.54 (21.0%) 185.45
(13.4%) 23.2% ( -9% - 72%) 0.00
```
I also tried getting some vectors using a different model that produces
384-dim fp32 vectors (`all-MiniLM-L6-v2` from
https://www.sbert.net/docs/pretrained_models.html). The methodology here is a
bit sus because we compute embedding vectors per-word and then sum them over
larger docs, whereas these models are really designed to be computed on larger
passages so they can make use of word context. Still I think the performance
measurements will be valid.
```
TaskQPS baseline StdDevQPS candidate StdDev Pct
diff p-value
PKLookup 173.59 (8.5%) 176.41
(5.7%) 1.6% ( -11% - 17%) 0.477
AndHighHighVector 309.15 (26.1%) 346.54
(18.1%) 12.1% ( -25% - 76%) 0.089
LowTermVector 305.52 (26.4%) 343.83
(15.9%) 12.5% ( -23% - 74%) 0.069
MedTermVector 312.58 (26.6%) 352.51
(18.5%) 12.8% ( -25% - 78%) 0.078
HighTermVector 300.84 (30.4%) 345.35
(18.8%) 14.8% ( -26% - 92%) 0.064
AndHighMedVector 303.15 (27.8%) 349.09
(18.2%) 15.2% ( -24% - 84%) 0.041
AndHighLowVector 233.11 (21.9%) 285.00
(12.5%) 22.3% ( -9% - 72%) 0.000
```
I was surprised this showed less improvement than the smaller vectors but
there is a lot of noise in these benchmarks. I see the results vary quite a bit
from run to run (even averaging over 20 JVMs). I'm currently training up some
768-dim vectors using `all-mpnet-base-v` and I'll see if I can get measurements
from KnnGraphTester that should be more focused. These tests were run with
609fc9b63f61954a7408faa1669e807a6bbf1da9 so maybe a few commits back.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]