+1 to explore Uwe's lambda idea. I really hate lambdas (you should too, if you know what happens behind the scenes), but this sounds like maybe an actual justified use of lambdas, taking advantage of the capture facility and not using them just to be "cute"
On Sat, Feb 20, 2021 at 6:17 AM Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > > > I was thinking about something similar when working on the Java 17 > replacement for MMapDirectory. My idea would still be late evaluation, but > using a lambda. Same way how you can do lazy evaluation in Log4j2 when > logging with debug/trace/… > > > > Instead of having “String resourceDescription“ as a field in each > Indexinput, we can use “Supplier<String> resourceDescription” and > initialize it like: () -> foo + bar + parent.toString +… > > > > Just because this might be an option to do, I fully agree with Mike Drob > and Robert Muir: The risk of memory leaks for such a small improvement is > too high. It also depends on the GC you use. If you use old > ConcMarkSweepGC, I agree this is an overhead, but nowadays, Lucene should > use G1GC. > > > > In addition, the strings are not for debugging, they are really useful > when something goes wrong, e.g. when I/O errors occur. > > > > Uwe > > > > ----- > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > *From:* Robert Muir <rcm...@gmail.com> > *Sent:* Saturday, February 20, 2021 11:56 AM > *To:* dev@lucene.apache.org > *Subject:* Re: GC cost of creating String resource description on > IndexInput clone > > > > The issue is that clone or not, they are both IndexInput.java. So if we go > with your proposal, then *sometimes* the code will have this reference and > *other times* it won't and the reference will be null. In that non-clone > case, where would its resource description (filename) come from? I predict > too many bugs. > > > > Making this part of the code complex would be the wrong tradeoff for 1% > performance. > > > > > > On Fri, Feb 19, 2021 at 4:58 PM Viral Gandhi <viral.dev...@gmail.com> > wrote: > > Hello everyone, > > We recently added Java Flight Recorder (JFR) to our internal benchmarking. > While looking through top heap allocations from JFR output, we noticed the > following to be in top 5 contributors to garbage creation. > > org.apache.lucene.store.IndexInput#getFullSliceDescription() > at org.apache.lucene.store.ByteBufferIndexInput#buildSlice() > at org.apache.lucene.store.ByteBufferIndexInput#slice() > at > org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice() > at org.apache.lucene.store.IndexInput#randomAccessSlice() > > It seems that we construct a new String resource description for each > clone/slice of IndexInput instance. Resource description for a clone/slice > instance is a String concatenation of resource description from original > instance and current slice description. Also, clone/slice can happen > multiple times per query per segment. > > Can we avoid upfront String allocation by late-evaluating > IndexInput.toString() on the cloned instance? One approach could be to > hold a reference of the base IndexInput instance in the cloned instance. > Then while evaluating IndexInput.toString() on a cloned instance, we can > construct a resource description by concatenating base instance's > toString() with current resource description. My understanding is that > IndexInput.toString() is primarily used for debugging purposes hence we > can benefit from the late-evaluation. > > With this approach, we are seeing sustainable GC time reduction of ~6% and > gain of ~1% to red-line QPS (throughput). My intention to start this thread > is to collect feedback on this approach as well as to discuss any other > ideas. > > > > Thanks, > > Viral Gandhi > >