Howdy,

We've noticed that enabling larger transaction logs causes the memory requirements for Solr to increase: Solr consumes large amounts of memory at startup.

After procuring a heap dump, this seems to be because Solr initializes ArrayLists in UpdateLog::getVersions with size maxNbTransactionLogEntries. While this may be more efficient if your actual log files are close to maximum size, this wastes memory when the actual logs are small. This is something that occurs frequently when you have a small number of shards which receive a lot of writes, and a lot of shards which receive few (or no) writes.

We've seen cases where Solr needs an additional 10GiB of memory during startup. It gets freed afterwards, but it does make startup painful.

The fix for SOLR-15676 further increased the memory footprint by allocating a LongSet of the same size.

public List<Long> getVersions(int n, long maxVersion) {
      List<Long> ret = new ArrayList<>(n);
      LongSet set = new LongSet(n);

The naïve fix would be to simply replace this init of new ArrayList<>(n) with new ArrayList<>(). ArrayList grows its capacity by 50% every time it's full, resulting in some extra garbage overhead and extra calls to array copy.

A quick bit of napkin math shows that for 10M entries, the array will have to be reallocated 35 times. In our case, this is worth the extra overhead. In the general case, it might not be?

Does anyone have any further insights?

Thanks,

 - Bram

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to