Thanks for pointing me to the existing issue! I think I generally agree
that no threadlocal would be needed if the functionality of skipBytes() and
seek() were collapsed as per the issue you pointed me to. In the current
state though, concurrency causes problems for delegating implementations of
DataInput, specifically ChecksumIndexInput (as detailed in LUCENE-5583
<https://issues.apache.org/jira/browse/LUCENE-5583>).

I'll spend a little more time thinking about LUCENE-9480 and comment there.
Seems like a better approach to eliminate the need for reading in bytes
just to skip, assuming it can be made to play nice with checksums, etc.
Thanks again!

Cheers,
-Greg

On Wed, Feb 17, 2021 at 1:47 PM Robert Muir <rcm...@gmail.com> wrote:

> See this already-open issue:
> https://issues.apache.org/jira/browse/LUCENE-9480
>
> No threadlocals are necessary. DataInput/IndexInput are already intended
> for use by one thread.
>
> For e.g. mmapdirectory this should be "+=". no buffer is required.
>
> On Wed, Feb 17, 2021 at 4:15 PM Greg Miller <gsmil...@gmail.com> wrote:
>
>> Hi folks-
>>
>> I work on a Lucene-based search system and we recently added Java Flight
>> Recorder to our benchmark tooling. When looking through results, we found
>> DataInput#skipBytes() to be a top contributor to garbage creation. We're
>> using Lucene84SkipReader and always skipping over Impacts in our use-case.
>> At first glance, it appeared pretty obvious that creating new instances of
>> the skipBuffer byte[] for each instance of DataInput was the culprit.
>>
>> It looks like alternatives were discussed originally in LUCENE-5583
>> <https://issues.amazon.com/issues/LUCENE-4947>, one of which being a
>> thread-local implementation of the skip buffer (since it can't be a static
>> field without breaking delegating subclasses, like ChecksumIndexInput). At
>> the time, a thread-local was advised against
>> <https://issues.apache.org/jira/browse/LUCENE-5583?focusedCommentId=13964258&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13964258>
>>  by
>> Uwe due to GC expense, but in our benchmarks, bringing in a thread-local
>> implementation reduced overall GC time by ~7%.
>>
>> I'd like to revisit this implementation decision and discuss ways in
>> which we can reduce this unnecessary garbage creation. It seems like moving
>> to a thread-local implementation is a win here, but I'd love to hear more
>> thoughts or alternative suggestions from the group. I'm new to this
>> community, so I'm not sure the best way to proceed. Should I open a Jira
>> issue as a next step? Thanks in advance!
>>
>> Cheers,
>> -Greg
>>
>

Reply via email to