Hi again David, Forgive my ignorance, since I've never contributed to lucene or apache before. I created a git branch based on lucene main branch locally, made, build, checked and commited my fix, and then tried to create a PR by pushing that branch, see below. Unfortunately, I get an error. I used a classic PAT as a password, so the problem seems to be not with password itself, but with not having some "access rights". Could you please take a look? I am using my github login countmdm, email [email protected]
Misha $ git push -u origin optimize-STEF-main Username for 'https://github.com': countmdm Password for 'https://[email protected]': remote: Permission to apache/lucene.git denied to countmdm. fatal: unable to access 'https://github.com/apache/lucene.git/': The requested URL returned error: 403 ________________________________ From: Dawid Weiss <[email protected]> Sent: Wednesday, January 21, 2026 10:21 PM To: [email protected] <[email protected]> Cc: Misha Dmitriev <[email protected]> Subject: Re: A deficiency in lucene code that affects memory footprint and GC Hi Misha, Please provide a pull request. Small, isolated improvements are easier to review and parse by us than large changes but all are welcome. Also, a lot of (trained) eyes are looking at this code... very often the reports of Lucene not performing well are caused by wrong usage rather than problems within the implementation - it would be good to share the entire context of when the problem is happening and the context of occurrence. Dawid On Wed, Jan 21, 2026 at 11:13 PM Misha Dmitriev via java-user <[email protected]<mailto:[email protected]>> wrote: Hi Lucene community, At LinkedIn, we use lucene in some important search apps. We recently found some problems with GC and memory footpring in one of them. We took a heap dump and analyzed it with JXRay (https://jxray.com<https://jxray.com/>). Unfortunately, we cannot share the entire jxray analysis due to security restrictions, but we can share one excerpt from it, see below. It comes from section 11 of jxray report, “Bad Primitive Arrays”, which tells us how much memory is wasted due to empty or under-utilized primitive arrays. That section says that nearly 4G of memory (25.6% of used heap) is wasted. And it turns out that most of that is due to byte[] arrays managed by SegmentTermsEnumFrame class. [X] To clarify: from the above screenshot, e.g. 80% of all arrays pointed by suffixBytes data field are just empty, i.e. contain only zeroes, which likely means that they have never been used. Of the remaining arrays, 3% are “trail-0s”, i.e. have more than a half of trailing zero elements, i.e. were only partially utilized. So only 17% of these arrays have been utilized more or less fully. The same is true for all other byte[] arrays managed by SegmentTermsEnumFrame. Note that from other sections of the heap dump it’s clear that the majority of these objects are garbage, i.e. they have already been used and discarded. Thus, at least 80% of memory that was allocated for these byte[] arrays has not been used and was wasted. From separate memory allocation profiling, we estimated that these arrays are responsible for ~2G/sec of memory allocation. If they were allocated lazily rather than eagerly, i.e. just before they would be really used, we could potentially reduce their memory allocation rate share from 2G/sec to (1 - 0.8)*2 = 0.4 G/sec. A switch from eager to lazy allocation of some data structure is usually easy to implement. Let’s take a quick look at the source code<https://fossies.org/linux/www/lucene-10.3.2-src.tgz/lucene-10.3.2/lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/blocktree/SegmentTermsEnumFrame.java>. The suffixBytes array usage has the following pattern: // Eager construction with hardcoded size byte[] suffixBytes = new byte[128]; … // Fast forward to the loadBlock() method … if (suffixBytes.length < numSuffixBytes) { // If we need to read more than 128 bytes, increase the array… // … or more precisely, throw away the old array and allocate another one suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)]; } From this code, it’s clear that two negative things can happen: 1. suffixBytes may not be used at all (the loadBlock() method may not be called or may return early). The memory used by the array will be completely wasted 2. If numSuffixBytes happens to be greater than 128, the current eagerly allocated array will be discarded. The memory used by it will be wasted. And as our heap dump illustrates, these things likely happen very often. To address this problem, it would be sufficient to change the code as follows: // Avoid eager construction byte[] suffixBytes; … if (suffixByte == null || suffixBytes.length < numSuffixBytes) { // If we need to read more than 128 bytes, increase the array… // … or more precisely, throw away the old array and allocate another one suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)]; } Note that reducing memory allocation rate results primarily in reduction of CPU usage and/or improved latency. That’s because each object allocation requires work from the JVM - updating pointers and setting all object bytes to zero. And then GCing these objects is also CPU-intensive, and results in pausing app threads, which affects latency. However, once memory allocation rate is reduced, it may be possible to also reduce the JVM heap memory. So the ultimate win is going to be in both CPU and memory. Please let us know how we can proceed with this. The proposed change is trivial, and thus maybe it can be done quickly by some established Lucene contributor. If not, I guess I can make it myself and then hope that it goes through review and release in reasonable time. Misha
