On Wed, Feb 17, 2021 at 6:53 PM Greg Miller <gsmil...@gmail.com> wrote:
> > Right, I am looking at the code but maybe we're talking about two > different things, so let me clarify. I agree that there is no concurrency > issue with the current code and I apologise if my point was confusing. The > reason skipBytes was made an instance variable as opposed to a static one > was to *avoid *creating a concurrency issue (which certainly would exist > if it had been made static). Making it an instance variable is wasteful for > GC though, no? My suggestion of moving to a threadlocal hits a "happy > medium" where we're not allocating these silly buffers for each DataInput > instance but making sure each thread has a separate one. Does this make > more sense now? > > Adding a threadlocal isn't a happy medium here. The first thing I see is thread churn issues (apps that use non-fixed threadpools). See javadocs for ClosableThreadLocal for more information. We can't even use that CloseableThreadLocal hack here, because nobody calls close() on clones of IndexInputs. So threadlocal would seriously make matters worse for some apps using the library, all for something that should be a "+=" :) I'm not trying to suggest perfection, just start by deprecating the slow impl, make it abstract and "hoist" the responsibility of implementation upwards to subclasses that are better prepared to handle them. For the majority use case (e.g. IndexInput), your buffer trivially goes away since you implemented it with seek(). The ones that are left can remain slow, that's fine, you speed up 90% easily, and we can see what is needed to fix the leftovers. But I really don't think these will be difficult either, e.g. for ChecksumIndexInput we can probably leave it abstract and implement the skipping directly on BufferedChecksumIndexInput (it already has a buffer, so it should be able to use that one, and avoid 2 buffers like today).