Hi Misha, Please provide a pull request. Small, isolated improvements are easier to review and parse by us than large changes but all are welcome.
Also, a lot of (trained) eyes are looking at this code... very often the reports of Lucene not performing well are caused by wrong usage rather than problems within the implementation - it would be good to share the entire context of when the problem is happening and the context of occurrence. Dawid On Wed, Jan 21, 2026 at 11:13 PM Misha Dmitriev via java-user < [email protected]> wrote: > Hi Lucene community, > > At LinkedIn, we use lucene in some important search apps. We recently > found some problems with GC and memory footpring in one of them. We took a > heap dump and analyzed it with JXRay (https://jxray.com). Unfortunately, > we cannot share the entire jxray analysis due to security restrictions, but > we can share one excerpt from it, see below. It comes from section 11 of > jxray report, “Bad Primitive Arrays”, which tells us how much memory is > wasted due to empty or under-utilized primitive arrays. That section says > that nearly 4G of memory (25.6% of used heap) is wasted. And it turns out > that most of that is due to byte[] arrays managed by SegmentTermsEnumFrame > class. > > > To clarify: from the above screenshot, e.g. 80% of all arrays pointed by > suffixBytes data field are just empty, i.e. contain only zeroes, which > likely means that they have never been used. Of the remaining arrays, 3% > are “trail-0s”, i.e. have more than a half of trailing zero elements, i.e. > were only partially utilized. So only 17% of these arrays have been > utilized more or less fully. The same is true for all other byte[] arrays > managed by SegmentTermsEnumFrame. Note that from other sections of the heap > dump it’s clear that the majority of these objects are garbage, i.e. they > have already been used and discarded. Thus, at least 80% of memory that was > allocated for these byte[] arrays has not been used and was wasted. From > separate memory allocation profiling, we estimated that these arrays are > responsible for ~2G/sec of memory allocation. If they were allocated lazily > rather than eagerly, i.e. just before they would be really used, we could > potentially reduce their memory allocation rate share from 2G/sec to (1 - > 0.8)*2 = 0.4 G/sec. > > A switch from eager to lazy allocation of some data structure is usually > easy to implement. Let’s take a quick look at the *source code > <https://fossies.org/linux/www/lucene-10.3.2-src.tgz/lucene-10.3.2/lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/blocktree/SegmentTermsEnumFrame.java>*. > The suffixBytes array usage has the following pattern: > > // Eager construction with hardcoded size > byte[] suffixBytes = new byte[128]; > > … // Fast forward to the loadBlock() method > … > if (suffixBytes.length < numSuffixBytes) { > // If we need to read more than 128 bytes, increase the array… > // … or more precisely, throw away the old array and allocate another one > suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)]; > } > > From this code, it’s clear that two negative things can happen: > > 1. suffixBytes may not be used at all (the loadBlock() method may not > be called or may return early). The memory used by the array will be > completely wasted > 2. If numSuffixBytes happens to be greater than 128, the current > eagerly allocated array will be discarded. The memory used by it will be > wasted. > > > And as our heap dump illustrates, these things likely happen very often. > To address this problem, it would be sufficient to change the code as > follows: > > // Avoid eager construction > byte[] suffixBytes; > … > if (suffixByte == null || suffixBytes.length < numSuffixBytes) { > // If we need to read more than 128 bytes, increase the array… > // … or more precisely, throw away the old array and allocate another one > suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)]; > } > > Note that reducing memory allocation rate results primarily in reduction > of CPU usage and/or improved latency. That’s because each object allocation > requires work from the JVM - updating pointers and setting all object bytes > to zero. And then GCing these objects is also CPU-intensive, and results in > pausing app threads, which affects latency. However, once memory allocation > rate is reduced, it may be possible to also reduce the JVM heap memory. So > the ultimate win is going to be in both CPU and memory. > > Please let us know how we can proceed with this. The proposed change is > trivial, and thus maybe it can be done quickly by some established Lucene > contributor. If not, I guess I can make it myself and then hope that it > goes through review and release in reasonable time. > > Misha > >
