Re: Hunspell performance

Dawid Weiss Fri, 12 Feb 2021 01:00:55 -0800

Corrected a dumb mistake in lru (insertion vs. access order), the results
are not that much different though:


Insertion-order

LRU on linked hash map:
>
> Cache size: 1000 Stemming en: average 928.8571428571429, all times = [921,
> 929, 953, 929, 925, 924, 921]
> Cache size: 2000 Stemming en: average 817.8571428571429, all times = [818,
> 818, 818, 820, 819, 815, 817]
> Cache size: 4000 Stemming en: average 706.0, all times = [705, 705, 708,
> 705, 706, 706, 707]
> Cache size: 8000 Stemming en: average 583.0, all times = [583, 584, 582,
> 584, 582, 585, 581]
>

Access-order:

Cache size: 1000 Stemming en: average 894.5714285714286, all times = [896,
894, 894, 897, 895, 895, 891]
Cache size: 2000 Stemming en: average 782.0, all times = [782, 783, 783,
784, 781, 781, 780]
Cache size: 4000 Stemming en: average 667.5714285714286, all times = [661,
666, 666, 666, 677, 668, 669]
Cache size: 8000 Stemming en: average 534.2857142857143, all times = [544,
532, 531, 532, 542, 529, 530]
Cache size: 30000 Stemming en: average 323.2857142857143, all times = [317,
330, 326, 325, 319, 325, 321]
Cache size: 50000 Stemming en: average 250.28571428571428, all times =
[256, 249, 247, 250, 249, 256, 245]
Cache size: 53000 Stemming en: average 242.42857142857142, all times =
[246, 239, 240, 245, 241, 245, 241]
Cache size: 55000 Stemming en: average 44.285714285714285, all times = [44,
45, 44, 44, 44, 45, 44]

There are exactly 53869 unique keys on input and it's interesting to see
how the performance drops between "barely" fitting cache and fully cached
input. I think it may be the jit just optimizing away something...

It's all an intellectual exercise though. I consider the initial
performance drop on small cache windows a nice, cheap, win. Increasing
the cache leads to other problems that may sting later (gc activity, memory
consumption on multiple parallel threads, etc.).

D.

Re: Hunspell performance

Reply via email to