Great find! Lets have it committed. On Tue, Jul 19, 2022 at 9:49 PM Bram Van Dam <bram.van...@intix.eu> wrote:
> Thanks for your reply, David. Given the apparent migration from from > Jira->Github, I didn't think that would get more response than the > mailing list 😅 > > We've been running a patched version of 7.7 with a smaller Versions > arraylist for a while now, without any ill effects. > > - Bram > > On 13/07/2022 23.54, David Smiley wrote: > > Makes sense Bram. > > > > I note that it's been over a month with no response. Just a suggestion > -- > > try commenting on the pertinent JIRA because it will get the attention of > > the last committer (and interested parties). > > BTW we could cap the initial ArrayList size to, say, Math.min(1024,n) > > > > ~ David Smiley > > Apache Lucene/Solr Search Developer > > http://www.linkedin.com/in/davidwsmiley > > > > > > On Thu, Jun 9, 2022 at 11:34 AM Bram Van Dam <bram.van...@intix.eu> > wrote: > > > >> Howdy, > >> > >> We've noticed that enabling larger transaction logs causes the memory > >> requirements for Solr to increase: Solr consumes large amounts of memory > >> at startup. > >> > >> After procuring a heap dump, this seems to be because Solr initializes > >> ArrayLists in UpdateLog::getVersions with size > >> maxNbTransactionLogEntries. While this may be more efficient if your > >> actual log files are close to maximum size, this wastes memory when the > >> actual logs are small. This is something that occurs frequently when you > >> have a small number of shards which receive a lot of writes, and a lot > >> of shards which receive few (or no) writes. > >> > >> We've seen cases where Solr needs an additional 10GiB of memory during > >> startup. It gets freed afterwards, but it does make startup painful. > >> > >> The fix for SOLR-15676 further increased the memory footprint by > >> allocating a LongSet of the same size. > >> > >> public List<Long> getVersions(int n, long maxVersion) { > >> List<Long> ret = new ArrayList<>(n); > >> LongSet set = new LongSet(n); > >> > >> The naïve fix would be to simply replace this init of new ArrayList<>(n) > >> with new ArrayList<>(). ArrayList grows its capacity by 50% every time > >> it's full, resulting in some extra garbage overhead and extra calls to > >> array copy. > >> > >> A quick bit of napkin math shows that for 10M entries, the array will > >> have to be reallocated 35 times. In our case, this is worth the extra > >> overhead. In the general case, it might not be? > >> > >> Does anyone have any further insights? > >> > >> Thanks, > >> > >> - Bram > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > >> For additional commands, e-mail: dev-h...@solr.apache.org > >> > >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org > >