Hi,

 

I was thinking about something similar when working on the Java 17 replacement 
for MMapDirectory. My idea would still be late evaluation, but using a lambda. 
Same way how you can do lazy evaluation in Log4j2 when logging with 
debug/trace/…

 

Instead of having “String resourceDescription“ as a field in each Indexinput, 
we can use “Supplier<String> resourceDescription” and initialize it like: () -> 
foo + bar + parent.toString +…

 

Just because this might be an option to do, I fully agree with Mike Drob and 
Robert Muir: The risk of memory leaks for such a small improvement is too high. 
It also depends on the GC you use. If you use old ConcMarkSweepGC, I agree this 
is an overhead, but nowadays, Lucene should use G1GC.

 

In addition, the strings are not for debugging, they are really useful when 
something goes wrong, e.g. when I/O errors occur.

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Robert Muir <rcm...@gmail.com> 
Sent: Saturday, February 20, 2021 11:56 AM
To: dev@lucene.apache.org
Subject: Re: GC cost of creating String resource description on IndexInput clone

 

The issue is that clone or not, they are both IndexInput.java. So if we go with 
your proposal, then *sometimes* the code will have this reference and *other 
times* it won't and the reference will be null. In that non-clone case, where 
would its resource description (filename) come from?  I predict too many bugs. 

 

Making this part of the code complex would be the wrong tradeoff for 1% 
performance.

 

 

On Fri, Feb 19, 2021 at 4:58 PM Viral Gandhi <viral.dev...@gmail.com 
<mailto:viral.dev...@gmail.com> > wrote:

Hello everyone,

We recently added Java Flight Recorder (JFR) to our internal benchmarking. 
While looking through top heap allocations from JFR output, we noticed the 
following to be in top 5 contributors to garbage creation.

org.apache.lucene.store.IndexInput#getFullSliceDescription()
  at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
   at org.apache.lucene.store.ByteBufferIndexInput#slice()
   at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice()
   at org.apache.lucene.store.IndexInput#randomAccessSlice()

It seems that we construct a new String resource description for each 
clone/slice of IndexInput instance. Resource description for a clone/slice 
instance is a String concatenation of resource description from original 
instance and current slice description. Also, clone/slice can happen multiple 
times per query per segment. 

Can we avoid upfront String allocation by late-evaluating IndexInput.toString() 
on the cloned instance? One approach could be to hold a reference of the base 
IndexInput instance in the cloned instance. Then while evaluating 
IndexInput.toString() on a cloned instance, we can construct a resource 
description by concatenating base instance's toString() with current resource 
description. My understanding is that IndexInput.toString() is primarily used 
for debugging purposes hence we can benefit from the late-evaluation. 

With this approach, we are seeing sustainable GC time reduction of ~6% and gain 
of ~1% to red-line QPS (throughput). My intention to start this thread is to 
collect feedback on this approach as well as to discuss any other ideas.

 

Thanks,

Viral Gandhi

Reply via email to