Hi Misha,

Please provide a pull request. Small, isolated improvements are easier to
review and parse by us than large changes but all are welcome.

Also, a lot of (trained) eyes are looking at this code... very often the
reports of Lucene not performing well are caused by wrong usage rather than
problems within the implementation - it would be good to share the entire
context of when the problem is happening and the context of occurrence.

Dawid

On Wed, Jan 21, 2026 at 11:13 PM Misha Dmitriev via java-user <
[email protected]> wrote:

> Hi Lucene community,
>
> At LinkedIn, we use lucene in some important search apps. We recently
> found some problems with GC and memory footpring in one of them. We took a
> heap dump and analyzed it with JXRay (https://jxray.com). Unfortunately,
> we cannot share the entire jxray analysis due to security restrictions, but
> we can share one excerpt from it, see below. It comes from section 11 of
> jxray report, “Bad Primitive Arrays”, which tells us how much memory is
> wasted due to empty or under-utilized primitive arrays. That section says
> that nearly 4G of memory (25.6% of used heap) is wasted. And it turns out
> that most of that is due to byte[] arrays managed by SegmentTermsEnumFrame
> class.
>
>
> To clarify: from the above screenshot, e.g. 80% of all arrays pointed by
> suffixBytes data field are just empty, i.e. contain only zeroes, which
> likely means that they have never been used. Of the remaining arrays, 3%
> are “trail-0s”, i.e. have more than a half of trailing zero elements, i.e.
> were only partially utilized. So only 17% of these arrays have been
> utilized more or less fully. The same is true for all other byte[] arrays
> managed by SegmentTermsEnumFrame. Note that from other sections of the heap
> dump it’s clear that the majority of these objects are garbage, i.e. they
> have already been used and discarded. Thus, at least 80% of memory that was
> allocated for these byte[] arrays has not been used and was wasted. From
> separate memory allocation profiling, we estimated that these arrays are
> responsible for ~2G/sec of memory allocation. If they were allocated lazily
> rather than eagerly, i.e. just before they would be really used, we could
> potentially reduce their memory allocation rate share from 2G/sec to (1 -
> 0.8)*2 = 0.4 G/sec.
>
> A switch from eager to lazy allocation of some data structure is usually
> easy to implement. Let’s take a quick look at the *source code
> <https://fossies.org/linux/www/lucene-10.3.2-src.tgz/lucene-10.3.2/lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/blocktree/SegmentTermsEnumFrame.java>*.
> The suffixBytes array usage has the following pattern:
>
> // Eager construction with hardcoded size
> byte[] suffixBytes = new byte[128];
>
> …  // Fast forward to the loadBlock() method
> …
> if (suffixBytes.length < numSuffixBytes) {
>   // If we need to read more than 128 bytes, increase the array…
>   // … or more precisely, throw away the old array and allocate another one
>   suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)];
> }
>
> From this code, it’s clear that two negative things can happen:
>
>    1. suffixBytes may not be used at all (the loadBlock() method may not
>    be called or may return early). The memory used by the array will be
>    completely wasted
>    2. If numSuffixBytes happens to be greater than 128, the current
>    eagerly allocated array will be discarded. The memory used by it will be
>    wasted.
>
>
> And as our heap dump illustrates, these things likely happen very often.
> To address this problem, it would be sufficient to change the code as
> follows:
>
> // Avoid eager construction
> byte[] suffixBytes;
> …
> if (suffixByte == null || suffixBytes.length < numSuffixBytes) {
>   // If we need to read more than 128 bytes, increase the array…
>   // … or more precisely, throw away the old array and allocate another one
>   suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)];
> }
>
> Note that reducing memory allocation rate results primarily in reduction
> of CPU usage and/or improved latency. That’s because each object allocation
> requires work from the JVM - updating pointers and setting all object bytes
> to zero. And then GCing these objects is also CPU-intensive, and results in
> pausing app threads, which affects latency. However, once memory allocation
> rate is reduced, it may be possible to also reduce the JVM heap memory. So
> the ultimate win is going to be in both CPU and memory.
>
> Please let us know how we can proceed with this. The proposed change is
> trivial, and thus maybe it can be done quickly by some established Lucene
> contributor. If not, I guess I can make it myself and then hope that it
> goes through review and release in reasonable time.
>
> Misha
>
>

Reply via email to