On Wed, Feb 17, 2021 at 6:53 PM Greg Miller <gsmil...@gmail.com> wrote:

>
> Right, I am looking at the code but maybe we're talking about two
> different things, so let me clarify. I agree that there is no concurrency
> issue with the current code and I apologise if my point was confusing. The
> reason skipBytes was made an instance variable as opposed to a static one
> was to *avoid *creating a concurrency issue (which certainly would exist
> if it had been made static). Making it an instance variable is wasteful for
> GC though, no? My suggestion of moving to a threadlocal hits a "happy
> medium" where we're not allocating these silly buffers for each DataInput
> instance but making sure each thread has a separate one. Does this make
> more sense now?
>
>
Adding a threadlocal isn't a happy medium here. The first thing I see is
thread churn issues (apps that use non-fixed threadpools). See javadocs for
ClosableThreadLocal for more information. We can't even use that
CloseableThreadLocal hack here, because nobody calls close() on clones of
IndexInputs. So threadlocal would seriously make matters worse for some
apps using the library, all for something that should be a "+=" :)

I'm not trying to suggest perfection, just start by deprecating the slow
impl, make it abstract and "hoist" the responsibility of implementation
upwards to subclasses that are better prepared to handle them. For the
majority use case (e.g. IndexInput), your buffer trivially goes away since
you implemented it with seek(). The ones that are left can remain slow,
that's fine, you speed up 90% easily, and we can see what is needed to fix
the leftovers. But I really don't think these will be difficult either,
e.g. for ChecksumIndexInput we can probably leave it abstract and implement
the skipping directly on BufferedChecksumIndexInput (it already has a
buffer, so it should be able to use that one, and avoid 2 buffers like
today).

Reply via email to