LexBartnicki opened a new issue, #15883:
URL: https://github.com/apache/lucene/issues/15883
### Description
## Summary
After upgrading from Lucene ~9.10 (OpenSearch 2.15) to Lucene 9.12
(OpenSearch 2.19), nested aggregations exhibit a severe performance regression.
FixedBitSet parent/child filters appear to not be cached between queries,
causing them to be rebuilt on every request even for repeated identical queries.
## Reproduction
A `size: 0` aggregation query with a `nested` aggregation on a nested field
path (`variant.prices` in our case) takes ~3.5s per shard, with all time spent
in `build_leaf_collector`. Running the identical query a second time shows no
improvement, indicating bitsets are not being cached/reused.
Pre-upgrade (Lucene ~9.10), the same query on the same data completed in
~130ms average.
## Profile evidence (single shard)
```json
{
"type": "NestedAggregator",
"description": "nested_price",
"time_in_nanos": 3597626115,
"breakdown": {
"build_leaf_collector": 3597502948,
"build_leaf_collector_count": 66,
"collect_count": 0,
"collect": 0
}
}
```
Key observations:
- 3.5s spent entirely in build_leaf_collector across 66 leaf reader segments
- Zero documents collected — the query matched nothing, yet bitset
construction still takes 3.5s
- No improvement on repeated execution — bitsets are rebuilt from scratch
each time
- build_leaf_collector_count: 66 corresponds to segment count × 2 (parent +
child docs), consistent with FixedBitSet construction per segment
- fixed_bit_set_memory_in_bytes on this index is ~82MB across 3 primary
shards
- index.load_fixed_bitset_filters_eagerly: true (default)
Index characteristics
- ~267 million documents
- 3 primary shards, 2 replicas
- ~90-145 segments across primaries (~30-48 per shard) — this is a
write-heavy index with continuous updates
- Nested field path: variant.prices
- The aggregation is a NestedAggregator → FilterAggregator → child
aggregations (terms, range), wrapped in a SamplerAggregator
Expected behavior
FixedBitSet parent/child filters should be cached after first construction
and reused on subsequent queries against the same segments, as was the behavior
on Lucene ~9.10.
Actual behavior
FixedBitSet filters are rebuilt on every query execution, causing ~53ms per
segment slice for bitset construction. On an index with many segments, this
dominates query latency even when zero documents match.
### Version and environment details
- Lucene version: 9.12 (bundled with OpenSearch 2.19)
- Previous working version: Lucene ~9.10 (bundled with OpenSearch 2.15)
- Platform: AWS OpenSearch Service (managed)
- OS: Amazon Linux (managed by AWS)
- FWIW instances are network attached storage
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]