Hi,
I was thinking about something similar when working on the Java 17 replacement for MMapDirectory. My idea would still be late evaluation, but using a lambda. Same way how you can do lazy evaluation in Log4j2 when logging with debug/trace/… Instead of having “String resourceDescription“ as a field in each Indexinput, we can use “Supplier<String> resourceDescription” and initialize it like: () -> foo + bar + parent.toString +… Just because this might be an option to do, I fully agree with Mike Drob and Robert Muir: The risk of memory leaks for such a small improvement is too high. It also depends on the GC you use. If you use old ConcMarkSweepGC, I agree this is an overhead, but nowadays, Lucene should use G1GC. In addition, the strings are not for debugging, they are really useful when something goes wrong, e.g. when I/O errors occur. Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de From: Robert Muir <rcm...@gmail.com> Sent: Saturday, February 20, 2021 11:56 AM To: dev@lucene.apache.org Subject: Re: GC cost of creating String resource description on IndexInput clone The issue is that clone or not, they are both IndexInput.java. So if we go with your proposal, then *sometimes* the code will have this reference and *other times* it won't and the reference will be null. In that non-clone case, where would its resource description (filename) come from? I predict too many bugs. Making this part of the code complex would be the wrong tradeoff for 1% performance. On Fri, Feb 19, 2021 at 4:58 PM Viral Gandhi <viral.dev...@gmail.com <mailto:viral.dev...@gmail.com> > wrote: Hello everyone, We recently added Java Flight Recorder (JFR) to our internal benchmarking. While looking through top heap allocations from JFR output, we noticed the following to be in top 5 contributors to garbage creation. org.apache.lucene.store.IndexInput#getFullSliceDescription() at org.apache.lucene.store.ByteBufferIndexInput#buildSlice() at org.apache.lucene.store.ByteBufferIndexInput#slice() at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice() at org.apache.lucene.store.IndexInput#randomAccessSlice() It seems that we construct a new String resource description for each clone/slice of IndexInput instance. Resource description for a clone/slice instance is a String concatenation of resource description from original instance and current slice description. Also, clone/slice can happen multiple times per query per segment. Can we avoid upfront String allocation by late-evaluating IndexInput.toString() on the cloned instance? One approach could be to hold a reference of the base IndexInput instance in the cloned instance. Then while evaluating IndexInput.toString() on a cloned instance, we can construct a resource description by concatenating base instance's toString() with current resource description. My understanding is that IndexInput.toString() is primarily used for debugging purposes hence we can benefit from the late-evaluation. With this approach, we are seeing sustainable GC time reduction of ~6% and gain of ~1% to red-line QPS (throughput). My intention to start this thread is to collect feedback on this approach as well as to discuss any other ideas. Thanks, Viral Gandhi