Re: UpdateLog memory usage on startup

Ishan Chattopadhyaya Tue, 19 Jul 2022 09:30:10 -0700

Great find! Lets have it committed.

On Tue, Jul 19, 2022 at 9:49 PM Bram Van Dam <bram.van...@intix.eu> wrote:


> Thanks for your reply, David. Given the apparent migration from from
> Jira->Github, I didn't think that would get more response than the
> mailing list 😅
>
> We've been running a patched version of 7.7 with a smaller Versions
> arraylist for a while now, without any ill effects.
>
>   - Bram
>
> On 13/07/2022 23.54, David Smiley wrote:
> > Makes sense Bram.
> >
> > I note that it's been over a month with no response.  Just a suggestion
> --
> > try commenting on the pertinent JIRA because it will get the attention of
> > the last committer (and interested parties).
> > BTW we could cap the initial ArrayList size to, say, Math.min(1024,n)
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Thu, Jun 9, 2022 at 11:34 AM Bram Van Dam <bram.van...@intix.eu>
> wrote:
> >
> >> Howdy,
> >>
> >> We've noticed that enabling larger transaction logs causes the memory
> >> requirements for Solr to increase: Solr consumes large amounts of memory
> >> at startup.
> >>
> >> After procuring a heap dump, this seems to be because Solr initializes
> >> ArrayLists in UpdateLog::getVersions with size
> >> maxNbTransactionLogEntries. While this may be more efficient if your
> >> actual log files are close to maximum size, this wastes memory when the
> >> actual logs are small. This is something that occurs frequently when you
> >> have a small number of shards which receive a lot of writes, and a lot
> >> of shards which receive few (or no) writes.
> >>
> >> We've seen cases where Solr needs an additional 10GiB of memory during
> >> startup. It gets freed afterwards, but it does make startup painful.
> >>
> >> The fix for SOLR-15676 further increased the memory footprint by
> >> allocating a LongSet of the same size.
> >>
> >> public List<Long> getVersions(int n, long maxVersion) {
> >>         List<Long> ret = new ArrayList<>(n);
> >>         LongSet set = new LongSet(n);
> >>
> >> The naïve fix would be to simply replace this init of new ArrayList<>(n)
> >> with new ArrayList<>(). ArrayList grows its capacity by 50% every time
> >> it's full, resulting in some extra garbage overhead and extra calls to
> >> array copy.
> >>
> >> A quick bit of napkin math shows that for 10M entries, the array will
> >> have to be reallocated 35 times. In our case, this is worth the extra
> >> overhead. In the general case, it might not be?
> >>
> >> Does anyone have any further insights?
> >>
> >> Thanks,
> >>
> >>    - Bram
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> >> For additional commands, e-mail: dev-h...@solr.apache.org
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

Re: UpdateLog memory usage on startup

Reply via email to