You won't be able to push to apache/lucene, but if you create a fork, like Misha/lucene, then you can push to that and got will ask if you want to create a pr at that point
On Thu, Jan 22, 2026, 8:30 PM Misha Dmitriev via java-user < [email protected]> wrote: > Hi again David, > > Forgive my ignorance, since I've never contributed to lucene or apache > before. I created a git branch based on lucene main branch locally, made, > build, checked and commited my fix, and then tried to create a PR by > pushing that branch, see below. Unfortunately, I get an error. I used a > classic PAT as a password, so the problem seems to be not with password > itself, but with not having some "access rights". Could you please take a > look? I am using my github login countmdm, email [email protected] > > Misha > > > $ git push -u origin optimize-STEF-main > Username for 'https://github.com': countmdm > Password for 'https://[email protected]': > remote: Permission to apache/lucene.git denied to countmdm. > fatal: unable to access 'https://github.com/apache/lucene.git/': The > requested URL returned error: 403 > > ________________________________ > From: Dawid Weiss <[email protected]> > Sent: Wednesday, January 21, 2026 10:21 PM > To: [email protected] <[email protected]> > Cc: Misha Dmitriev <[email protected]> > Subject: Re: A deficiency in lucene code that affects memory footprint and > GC > > > Hi Misha, > > Please provide a pull request. Small, isolated improvements are easier to > review and parse by us than large changes but all are welcome. > > Also, a lot of (trained) eyes are looking at this code... very often the > reports of Lucene not performing well are caused by wrong usage rather than > problems within the implementation - it would be good to share the entire > context of when the problem is happening and the context of occurrence. > > Dawid > > On Wed, Jan 21, 2026 at 11:13 PM Misha Dmitriev via java-user < > [email protected]<mailto:[email protected]>> wrote: > Hi Lucene community, > > At LinkedIn, we use lucene in some important search apps. We recently > found some problems with GC and memory footpring in one of them. We took a > heap dump and analyzed it with JXRay (https://jxray.com<https://jxray.com/>). > Unfortunately, we cannot share the entire jxray analysis due to security > restrictions, but we can share one excerpt from it, see below. It comes > from section 11 of jxray report, “Bad Primitive Arrays”, which tells us how > much memory is wasted due to empty or under-utilized primitive arrays. That > section says that nearly 4G of memory (25.6% of used heap) is wasted. And > it turns out that most of that is due to byte[] arrays managed by > SegmentTermsEnumFrame class. > > [X] > > To clarify: from the above screenshot, e.g. 80% of all arrays pointed by > suffixBytes data field are just empty, i.e. contain only zeroes, which > likely means that they have never been used. Of the remaining arrays, 3% > are “trail-0s”, i.e. have more than a half of trailing zero elements, i.e. > were only partially utilized. So only 17% of these arrays have been > utilized more or less fully. The same is true for all other byte[] arrays > managed by SegmentTermsEnumFrame. Note that from other sections of the heap > dump it’s clear that the majority of these objects are garbage, i.e. they > have already been used and discarded. Thus, at least 80% of memory that was > allocated for these byte[] arrays has not been used and was wasted. From > separate memory allocation profiling, we estimated that these arrays are > responsible for ~2G/sec of memory allocation. If they were allocated lazily > rather than eagerly, i.e. just before they would be really used, we could > potentially reduce their memory allocation rate share from 2G/sec to (1 - > 0.8)*2 = 0.4 G/sec. > > A switch from eager to lazy allocation of some data structure is usually > easy to implement. Let’s take a quick look at the source code< > https://fossies.org/linux/www/lucene-10.3.2-src.tgz/lucene-10.3.2/lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/blocktree/SegmentTermsEnumFrame.java>. > The suffixBytes array usage has the following pattern: > > // Eager construction with hardcoded size > byte[] suffixBytes = new byte[128]; > > … // Fast forward to the loadBlock() method > … > if (suffixBytes.length < numSuffixBytes) { > // If we need to read more than 128 bytes, increase the array… > // … or more precisely, throw away the old array and allocate another one > suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)]; > } > > From this code, it’s clear that two negative things can happen: > > 1. > suffixBytes may not be used at all (the loadBlock() method may not be > called or may return early). The memory used by the array will be > completely wasted > 2. > If numSuffixBytes happens to be greater than 128, the current eagerly > allocated array will be discarded. The memory used by it will be wasted. > > And as our heap dump illustrates, these things likely happen very often. > To address this problem, it would be sufficient to change the code as > follows: > > // Avoid eager construction > byte[] suffixBytes; > … > if (suffixByte == null || suffixBytes.length < numSuffixBytes) { > // If we need to read more than 128 bytes, increase the array… > // … or more precisely, throw away the old array and allocate another one > suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)]; > } > > Note that reducing memory allocation rate results primarily in reduction > of CPU usage and/or improved latency. That’s because each object allocation > requires work from the JVM - updating pointers and setting all object bytes > to zero. And then GCing these objects is also CPU-intensive, and results in > pausing app threads, which affects latency. However, once memory allocation > rate is reduced, it may be possible to also reduce the JVM heap memory. So > the ultimate win is going to be in both CPU and memory. > > Please let us know how we can proceed with this. The proposed change is > trivial, and thus maybe it can be done quickly by some established Lucene > contributor. If not, I guess I can make it myself and then hope that it > goes through review and release in reasonable time. > > Misha > >
