jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712779097
Wikibigall. Less space spent on doc valuse this time since I did not enable
indexing of facets. There is a more significant size reduction of postings this
time (-10.5%). This is not misaligned with the reproducibility paper which
observered size reductions of 18% with partitioned Elias-Fano and 5% with
SVByte on the Wikipedia dataset. I would expect PFor to be somewhere in between
as it's better able to take advantage of small gaps between docs than SVByte,
but less than partioned Elias-Fano.
| File | before (MB) | after (MB) |
| - | - | - |
| terms (tim) | 767 |766 |
| postings (doc) | 2779 | 2489 |
| positions (pos) | 11356 | 10569 |
| points (kdd) | 100 | 99 |
| doc values (dvd) | 456 | 461 |
| stored fields (fdt) | 249 | 257 |
| norms (nvd) | 13 | 13 |
| total | 15734 |14669 |
Benchmarks still show slowdowns on phrase queries and speedups on
conjunctions, though it's less spectacular than on wikimedium10m.
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
MedTerm 652.41 (7.5%) 493.97
(2.6%) -24.3% ( -31% - -15%) 0.000
HighPhrase 30.86 (3.5%) 23.85
(2.6%) -22.7% ( -27% - -17%) 0.000
LowPhrase 51.09 (3.1%) 42.38
(2.2%) -17.1% ( -21% - -12%) 0.000
LowTerm 1057.76 (5.4%) 881.22
(2.5%) -16.7% ( -23% - -9%) 0.000
MedPhrase 82.18 (3.0%) 71.88
(1.7%) -12.5% ( -16% - -8%) 0.000
HighTermMonthSort 6482.52 (4.5%) 5739.50
(3.5%) -11.5% ( -18% - -3%) 0.000
PKLookup 293.95 (3.2%) 276.15
(3.7%) -6.1% ( -12% - 0%) 0.000
MedSloppyPhrase 8.68 (2.7%) 8.20
(2.9%) -5.5% ( -10% - 0%) 0.000
OrHighLow 578.06 (4.4%) 550.49
(4.0%) -4.8% ( -12% - 3%) 0.016
HighSloppyPhrase 7.43 (2.2%) 7.10
(4.0%) -4.4% ( -10% - 1%) 0.003
Fuzzy1 244.70 (2.9%) 238.49
(3.3%) -2.5% ( -8% - 3%) 0.080
OrHighHigh 39.76 (9.5%) 39.21
(6.1%) -1.4% ( -15% - 15%) 0.717
HighTerm 370.57 (8.5%) 367.09
(4.4%) -0.9% ( -12% - 13%) 0.768
LowSloppyPhrase 13.68 (2.3%) 13.71
(3.3%) 0.2% ( -5% - 5%) 0.868
Respell 204.23 (1.8%) 204.98
(2.0%) 0.4% ( -3% - 4%) 0.679
Prefix3 225.23 (5.1%) 226.74
(5.5%) 0.7% ( -9% - 11%) 0.786
Wildcard 170.34 (4.0%) 171.63
(3.4%) 0.8% ( -6% - 8%) 0.665
IntNRQ 92.30 (11.9%) 95.15
(10.2%) 3.1% ( -17% - 28%) 0.555
MedSpanNear 5.79 (6.8%) 5.99
(9.3%) 3.4% ( -11% - 20%) 0.378
OrHighMed 104.41 (7.3%) 107.99
(5.3%) 3.4% ( -8% - 17%) 0.253
HighSpanNear 2.47 (4.2%) 2.56
(4.1%) 3.7% ( -4% - 12%) 0.059
Fuzzy2 139.96 (2.8%) 146.77
(2.6%) 4.9% ( 0% - 10%) 0.000
LowSpanNear 42.96 (3.6%) 45.21
(2.5%) 5.2% ( 0% - 11%) 0.000
AndHighHigh 33.24 (6.2%) 36.20
(4.3%) 8.9% ( -1% - 20%) 0.000
AndHighMed 131.84 (5.2%) 144.31
(3.2%) 9.5% ( 0% - 18%) 0.000
HighTermDayOfYearSort 186.67 (2.9%) 208.78
(3.2%) 11.8% ( 5% - 18%) 0.000
AndHighLow 590.69 (3.2%) 677.22
(2.2%) 14.6% ( 9% - 20%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]