Hi folks, We caught an OOME in an elasticsearch instance earlier this week that appears to be caused by a large Merge filling up the heap with Lucene103BlockTreeTermsWriter$PendingBlock instances. The offending segment has 250,000,000 or so documents, and the TermsWriter was building the dictionary for an `_id` field with 15-byte long entries which are randomly generated and so less likely to share prefixes or suffixes. The `pending` field on the TermsWriter was holding on to 2Gb of PendingBlock objects.
It may be that this is an underpowered JVM for the amount of data being indexed, that this is a particularly adversarial data set, and that the new block tree structure in lucene 10.3 has just tipped it over the edge. However, there is a comment in the new `TrieBuilder` class reading `TODO make this trie builder a more memory efficient structure` which implies that we're using more memory during merges than before. At the moment I don’t have a good reproduction, hence an email to the dev list rather than a GitHub issue. But I thought it worth raising, and if it starts to happen more frequently I will hopefully be able to write something that demonstrates it deterministically. Note that the original issue[1] for the new block tree structure does have some comments about OOMs[2], but these seem to be happening earlier in the pipeline so looks like a different problem. - Alan [1] https://github.com/apache/lucene/pull/14333 [2] https://github.com/apache/lucene/pull/14447
