Hi folks,

We caught an OOME in an elasticsearch instance earlier this week that appears 
to be caused by a large Merge filling up the heap with 
Lucene103BlockTreeTermsWriter$PendingBlock instances.  The offending segment 
has 250,000,000 or so documents, and the TermsWriter was building the 
dictionary for an `_id` field with 15-byte long entries which are randomly 
generated and so less likely to share prefixes or suffixes.  The `pending` 
field on the TermsWriter was holding on to 2Gb of PendingBlock objects.

It may be that this is an underpowered JVM for the amount of data being 
indexed, that this is a particularly adversarial data set, and that the new 
block tree structure in lucene 10.3 has just tipped it over the edge.  However, 
there is a comment in the new `TrieBuilder` class reading `TODO make this trie 
builder a more memory efficient structure` which implies that we're using more 
memory during merges than before.

At the moment I don’t have a good reproduction, hence an email to the dev list 
rather than a GitHub issue.  But I thought it worth raising, and if it starts 
to happen more frequently I will hopefully be able to write something that 
demonstrates it deterministically.

Note that the original issue[1] for the new block tree structure does have some 
comments about OOMs[2], but these seem to be happening earlier in the pipeline 
so looks like a different problem.

- Alan

[1] https://github.com/apache/lucene/pull/14333
[2] https://github.com/apache/lucene/pull/14447

Reply via email to