GC improvement when skipping bytes in DataInput

Greg Miller Wed, 17 Feb 2021 13:15:22 -0800

Hi folks-

I work on a Lucene-based search system and we recently added Java Flight
Recorder to our benchmark tooling. When looking through results, we found
DataInput#skipBytes() to be a top contributor to garbage creation. We're
using Lucene84SkipReader and always skipping over Impacts in our use-case.
At first glance, it appeared pretty obvious that creating new instances of
the skipBuffer byte[] for each instance of DataInput was the culprit.


It looks like alternatives were discussed originally in LUCENE-5583
<https://issues.amazon.com/issues/LUCENE-4947>, one of which being a
thread-local implementation of the skip buffer (since it can't be a static
field without breaking delegating subclasses, like ChecksumIndexInput). At
the time, a thread-local was advised against
<https://issues.apache.org/jira/browse/LUCENE-5583?focusedCommentId=13964258&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13964258>
by
Uwe due to GC expense, but in our benchmarks, bringing in a thread-local
implementation reduced overall GC time by ~7%.

I'd like to revisit this implementation decision and discuss ways in which
we can reduce this unnecessary garbage creation. It seems like moving to a
thread-local implementation is a win here, but I'd love to hear more
thoughts or alternative suggestions from the group. I'm new to this
community, so I'm not sure the best way to proceed. Should I open a Jira
issue as a next step? Thanks in advance!

Cheers,
-Greg

GC improvement when skipping bytes in DataInput

Reply via email to