Corrected a dumb mistake in lru (insertion vs. access order), the results are not that much different though:
Insertion-order LRU on linked hash map: > > Cache size: 1000 Stemming en: average 928.8571428571429, all times = [921, > 929, 953, 929, 925, 924, 921] > Cache size: 2000 Stemming en: average 817.8571428571429, all times = [818, > 818, 818, 820, 819, 815, 817] > Cache size: 4000 Stemming en: average 706.0, all times = [705, 705, 708, > 705, 706, 706, 707] > Cache size: 8000 Stemming en: average 583.0, all times = [583, 584, 582, > 584, 582, 585, 581] > Access-order: Cache size: 1000 Stemming en: average 894.5714285714286, all times = [896, 894, 894, 897, 895, 895, 891] Cache size: 2000 Stemming en: average 782.0, all times = [782, 783, 783, 784, 781, 781, 780] Cache size: 4000 Stemming en: average 667.5714285714286, all times = [661, 666, 666, 666, 677, 668, 669] Cache size: 8000 Stemming en: average 534.2857142857143, all times = [544, 532, 531, 532, 542, 529, 530] Cache size: 30000 Stemming en: average 323.2857142857143, all times = [317, 330, 326, 325, 319, 325, 321] Cache size: 50000 Stemming en: average 250.28571428571428, all times = [256, 249, 247, 250, 249, 256, 245] Cache size: 53000 Stemming en: average 242.42857142857142, all times = [246, 239, 240, 245, 241, 245, 241] Cache size: 55000 Stemming en: average 44.285714285714285, all times = [44, 45, 44, 44, 44, 45, 44] There are exactly 53869 unique keys on input and it's interesting to see how the performance drops between "barely" fitting cache and fully cached input. I think it may be the jit just optimizing away something... It's all an intellectual exercise though. I consider the initial performance drop on small cache windows a nice, cheap, win. Increasing the cache leads to other problems that may sting later (gc activity, memory consumption on multiple parallel threads, etc.). D.
