I think the results of the benchmark will depend on the properties of the indexed terms. For english wikipedia (luceneutil) the average word length is around 5 bytes so this optimization may not do much.
On Tue, Apr 25, 2023 at 1:58 AM Patrick Zhai <zhai7...@gmail.com> wrote: > > I did a quick run with your patch, but since I turned on the CMS as well as > TieredMergePolicy I'm not sure how fair the comparison is. Here's the result: > Candidate: > Indexer: indexing done (890209 msec); total 33332620 docs > Indexer: waitForMerges done (71622 msec) > Indexer: finished (961877 msec) > Baseline: > Indexer: indexing done (909706 msec); total 33332620 docs > Indexer: waitForMerges done (54775 msec) > Indexer: finished (964528 msec) > > For more accurate comparison I guess it's better to use LogxxMergePolicy and > turn off CMS? If you want to run it yourself you can find the lines I quoted > from the log file. > > Patrick > > On Mon, Apr 24, 2023 at 12:34 PM Thomas Dullien > <thomas.dull...@elastic.co.invalid> wrote: >> >> Hey all, >> >> I've been experimenting with fixing some low-hanging performance fruit in >> the ElasticSearch codebase, and came across the fact that the MurmurHash >> implementation that is used by ByteRef.hashCode() is reading 4 bytes per >> loop iteration (which is likely an artifact from 32-bit architectures, which >> are ever-less-important). I made a small fix to change the implementation to >> read 8 bytes per loop iteration; I expected a very small impact (2-3% CPU or >> so over an indexing run in ElasticSearch), but got a pretty nontrivial >> throughput improvement over a few indexing benchmarks. >> >> I tried running Lucene-only benchmarks, and succeeded in running the example >> from https://github.com/mikemccand/luceneutil - but I couldn't figure out >> how to run indexing benchmarks and how to interpret the results. >> >> Could someone help me in running the benchmarks for the attached patch? >> >> Cheers, >> Thomas >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org