In fact I see a pronounced effect even with the smallish (10k) index! And I
should correct my earlier statement about FST50 - My earlier test was
flawed: I was confused about how these benchmarks work and updated
nightlyBench.py rather than my localrun.py. After correcting that and
comparing FST50 with Memory I see that indeed it recovers the lost perf in
this benchmark, indeed in three runs it seems to be a consistent
improvement over Memory, although these test results are quite noisy so
that may not be accurate.

Maybe we ought to update nightlyBench.py to use the FST50 codec for this
test? I'm not sure what it is trying to demonstrate though: would that be a
"fair" test? AT least it would be more faithful to the original version of
the chart. Also, please let me know if these benchmarking discussions
belong elsewhere; I see that luceneutil is not really part of the apache
package per se, but I doubt it has its own mailing list :)

On Fri, Aug 24, 2018 at 3:17 AM Adrien Grand <jpou...@gmail.com> wrote:

> I don't think you need an index that is so large that the terms dictionary
> doesn't fit in the OS cache to reproduce the difference, but you might need
> a larger index indeed. On my end I use wikimedium10M or wikimediumall (and
> wikibigall if I need to test phrases) most of the time as I get more noise
> with smaller indices. I added an annotation, it should be caught up next
> time benchmarks run.
>
> I also pushed a change to take into account the fact that the default
> codec changed. However, I did not add backward-codecs.jar to the classpath,
> you should rebuild the index that you use for benchmarking so that it uses
> the Lucene80 codec instead of Lucene70.
>
> Le ven. 24 août 2018 à 02:03, Michael Sokolov <msoko...@gmail.com> a
> écrit :
>
>> I think the benchmarks need updating after LUCENE-8461. I got them
>> working again by replacing lucene70 with lucene80 everywhere except for the
>> DocValues formats, and adding the backward-codecs.jar to the benchmarks
>> build. I'm not sure that was really the right way to go about this? After
>> that I did try switching to use FST50 for this PKLookup benchmark (see
>> below), but it did not recover the lost perf.
>>
>> diff --git a/src/python/nightlyBench.py b/src/python/nightlyBench.py
>> index b42fe84..5807e49 100644
>> --- a/src/python/nightlyBench.py
>> +++ b/src/python/nightlyBench.py
>> @@ -699,7 +699,7 @@ def run():
>> -                                  idFieldPostingsFormat='Lucene50',
>> +                                  idFieldPostingsFormat='FST50',
>>
>>
>> On Thu, Aug 23, 2018 at 5:52 PM Michael Sokolov <msoko...@gmail.com>
>> wrote:
>>
>>> OK thanks. I guess this benchmark must be run on a large-enough index
>>> that it doesn't fit entirely in RAM already anyway? When I ran it locally
>>> using the vanilla benchmark instructions, I believe the generated index was
>>> quite small (wikimedium10k).  At any rate, I don't have any specific use
>>> case yet, just thinking about some possibilities related to primary key
>>> lookup and came across this anomaly. Perhaps at least it deserves an
>>> annotation on the benchmark graph.
>>>
>>

Reply via email to