sjs004 opened a new issue, #15668: URL: https://github.com/apache/lucene/issues/15668
### Description The `PlainHighlighter` via [WeightedSpanTermExtractor.java#L447](https://github.com/apache/lucene/blob/releases/lucene/10.3.2/lucene/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java#L447) is prone to crashing when rewriting modern Lucene query types. This happens because the internal `DelegatingLeafReader` used for highlighting throws `UnsupportedOperationException` when `getFieldInfos()` is called. **The Recurrent Issue** As Lucene adds more optimizations to query rewrite() methods (e.g., checking index statistics or DocValues existence), these queries begin to fail when processed by the highlighter. This pattern was previously observed with `FieldExistsQuery` (fixed in https://github.com/apache/lucene/pull/12088 by explicitly ignoring it). I am now observing the same crash in [OpenSearch](https://github.com/opensearch-project/OpenSearch/issues/20496) with `SortedNumericDocValuesRangeQuery` (via IndexOrDocValuesQuery), which attempts to access DocValuesSkipper and subsequently getFieldInfos() during rewrite This creates a "whack-a-mole" situation where every new query optimization potentially breaks the highlighter. **Example Failure** When highlighting a boolean query containing a range filter: ``` Caused by: java.lang.UnsupportedOperationException at org.apache.lucene.search.highlight.WeightedSpanTermExtractor$DelegatingLeafReader.getFieldInfos(WeightedSpanTermExtractor.java:447) at org.apache.lucene.index.DocValuesSkipper.globalMinValue(DocValuesSkipper.java:137) at org.apache.lucene.document.SortedNumericDocValuesRangeQuery.rewrite(SortedNumericDocValuesRangeQuery.java:101) ``` **Proposal & Discussion** I believe we should address this structurally rather than adding exceptions for every new query type. **Option 1**: The Structural Fix - Modify DelegatingLeafReader.getFieldInfos() to return FieldInfos instead of throwing an exception PS: I am not very familiar with lucene codebase & not so sure if this option is feasible or it may require a long time to fix **Option 2**: The Targeted Fix - Continue the pattern established in [PR #12088](https://github.com/apache/lucene/pull/12088) by adding IndexOrDocValuesQuery (and others) to the "ignore" list in `WeightedSpanTermExtractor.extract()` method I am happy to submit a PR for option 2 & maybe option 1 as well if team suggests some solution ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
