Thanks for your reply, David. Given the apparent migration from from
Jira->Github, I didn't think that would get more response than the
mailing list 😅
We've been running a patched version of 7.7 with a smaller Versions
arraylist for a while now, without any ill effects.
- Bram
On 13/07/2022 23.54, David Smiley wrote:
Makes sense Bram.
I note that it's been over a month with no response. Just a suggestion --
try commenting on the pertinent JIRA because it will get the attention of
the last committer (and interested parties).
BTW we could cap the initial ArrayList size to, say, Math.min(1024,n)
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
On Thu, Jun 9, 2022 at 11:34 AM Bram Van Dam <bram.van...@intix.eu> wrote:
Howdy,
We've noticed that enabling larger transaction logs causes the memory
requirements for Solr to increase: Solr consumes large amounts of memory
at startup.
After procuring a heap dump, this seems to be because Solr initializes
ArrayLists in UpdateLog::getVersions with size
maxNbTransactionLogEntries. While this may be more efficient if your
actual log files are close to maximum size, this wastes memory when the
actual logs are small. This is something that occurs frequently when you
have a small number of shards which receive a lot of writes, and a lot
of shards which receive few (or no) writes.
We've seen cases where Solr needs an additional 10GiB of memory during
startup. It gets freed afterwards, but it does make startup painful.
The fix for SOLR-15676 further increased the memory footprint by
allocating a LongSet of the same size.
public List<Long> getVersions(int n, long maxVersion) {
List<Long> ret = new ArrayList<>(n);
LongSet set = new LongSet(n);
The naïve fix would be to simply replace this init of new ArrayList<>(n)
with new ArrayList<>(). ArrayList grows its capacity by 50% every time
it's full, resulting in some extra garbage overhead and extra calls to
array copy.
A quick bit of napkin math shows that for 10M entries, the array will
have to be reallocated 35 times. In our case, this is worth the extra
overhead. In the general case, it might not be?
Does anyone have any further insights?
Thanks,
- Bram
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org