Howdy,
We've noticed that enabling larger transaction logs causes the memory
requirements for Solr to increase: Solr consumes large amounts of memory
at startup.
After procuring a heap dump, this seems to be because Solr initializes
ArrayLists in UpdateLog::getVersions with size
maxNbTransactionLogEntries. While this may be more efficient if your
actual log files are close to maximum size, this wastes memory when the
actual logs are small. This is something that occurs frequently when you
have a small number of shards which receive a lot of writes, and a lot
of shards which receive few (or no) writes.
We've seen cases where Solr needs an additional 10GiB of memory during
startup. It gets freed afterwards, but it does make startup painful.
The fix for SOLR-15676 further increased the memory footprint by
allocating a LongSet of the same size.
public List<Long> getVersions(int n, long maxVersion) {
List<Long> ret = new ArrayList<>(n);
LongSet set = new LongSet(n);
The naïve fix would be to simply replace this init of new ArrayList<>(n)
with new ArrayList<>(). ArrayList grows its capacity by 50% every time
it's full, resulting in some extra garbage overhead and extra calls to
array copy.
A quick bit of napkin math shows that for 10M entries, the array will
have to be reallocated 35 times. In our case, this is worth the extra
overhead. In the general case, it might not be?
Does anyone have any further insights?
Thanks,
- Bram
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org