LexBartnicki opened a new issue, #15883:
URL: https://github.com/apache/lucene/issues/15883

   ### Description
   
   ## Summary
   After upgrading from Lucene ~9.10 (OpenSearch 2.15) to Lucene 9.12 
(OpenSearch 2.19), nested aggregations exhibit a severe performance regression. 
FixedBitSet parent/child filters appear to not be cached between queries, 
causing them to be rebuilt on every request even for repeated identical queries.
   
   ## Reproduction
   A `size: 0` aggregation query with a `nested` aggregation on a nested field 
path (`variant.prices` in our case) takes ~3.5s per shard, with all time spent 
in `build_leaf_collector`. Running the identical query a second time shows no 
improvement, indicating bitsets are not being cached/reused.
   
   Pre-upgrade (Lucene ~9.10), the same query on the same data completed in 
~130ms average.
   
   ## Profile evidence (single shard)
   ```json
     {
       "type": "NestedAggregator",
       "description": "nested_price",
       "time_in_nanos": 3597626115,
       "breakdown": {
         "build_leaf_collector": 3597502948,
         "build_leaf_collector_count": 66,
         "collect_count": 0,
         "collect": 0
       }
     }
   ```
   Key observations:
   - 3.5s spent entirely in build_leaf_collector across 66 leaf reader segments
   - Zero documents collected — the query matched nothing, yet bitset 
construction still takes 3.5s
   - No improvement on repeated execution — bitsets are rebuilt from scratch 
each time
   - build_leaf_collector_count: 66 corresponds to segment count × 2 (parent + 
child docs), consistent with FixedBitSet construction per segment
   - fixed_bit_set_memory_in_bytes on this index is ~82MB across 3 primary 
shards
   - index.load_fixed_bitset_filters_eagerly: true (default)
   
   Index characteristics
   
   - ~267 million documents
   - 3 primary shards, 2 replicas
   - ~90-145 segments across primaries (~30-48 per shard) — this is a 
write-heavy index with continuous updates
   - Nested field path: variant.prices
   - The aggregation is a NestedAggregator → FilterAggregator → child 
aggregations (terms, range), wrapped in a SamplerAggregator
   
   Expected behavior
   
   FixedBitSet parent/child filters should be cached after first construction 
and reused on subsequent queries against the same segments, as was the behavior 
on Lucene ~9.10.
   
   Actual behavior
   
   FixedBitSet filters are rebuilt on every query execution, causing ~53ms per 
segment slice for bitset construction. On an index with many segments, this 
dominates query latency even when zero documents match.
   
   ### Version and environment details
   
   - Lucene version: 9.12 (bundled with OpenSearch 2.19)
   - Previous working version: Lucene ~9.10 (bundled with OpenSearch 2.15)
   - Platform: AWS OpenSearch Service (managed)
   - OS: Amazon Linux (managed by AWS)
   - FWIW instances are network attached storage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to