Re: timeAllowed discounting and QueryLimits

Mark Miller Mon, 22 Sep 2025 16:13:56 -0700

I forgot to post a message about it, but FYI, I pushed a Lucene 10 upgrade
branch a while back as well.

I don't claim it is 100% correct, I only spent a day on it, but all tests
do pass.

I also tried to identify where it differed from the upgrade that is in main
(at the time that upgrade work went in). Handling timeouts was one of the
differences. Here is a list I put together of all of the differences:

1) Timeouts: enforced via IndexSearcher.setTimeout + timedOut (Lucene 10
idiom)

- My branch sets an IndexSearcher timeout using a thread local strategy
(wired to QueryLimits) for timeAllowed and component searches (e.g.,
Grouping/CommandHandler), and checks timedOut() to flag partial results
consistently. The Lucene‑10 partition-based search path is also used so
QueryTimeout is consulted during scoring.
- In main, the setTimeout call in the main search path is commented out,
and the Lucene‑10 partition-based override is disabled; it instead relies
on ExitableDirectoryReader if a system property is set. Ramification:
timeAllowed is frequently inert, many queries won’t be interrupted inside
Lucene, and partial results are not consistently flagged unless
ExitableDirectoryReader is explicitly enabled.

2) Partial results handling in field sort values (FSV)

- My branch handles FSV timeouts in place: doFieldSortValues catches
ExitableDirectoryReader.ExitingReaderException, logs the issue, marks the
response partial via rsp.setPartialResults(req), and returns an empty
sort_values block while letting the rest of the request finish

(solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:569-575).

- /home/markmiller/solr-main does not set the flag locally; it rethrows
the same exception and relies on SearchHandler.shortCircuitedResults to
intercept it later and short-circuit the response

(external/solr-main/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:575-579,
then SearchHandler.java:526-529 & 761-779). Ramification: here the
timeout is contained to the FSV sub-phase (other components still run) and
the partial-results status is flagged exactly where the failure occurs,
versus invoking the global short-circuit path and aborting remaining work.

3) PathHierarchyTokenizer compatibility preserved

- My branch adds targeted handling in TextField.parseFieldQuery for
PathHierarchyTokenizer’s Lucene 10 positional change so hierarchical path
queries continue to work as before.
- main does not include this fix. Ramification: hierarchical search can
break (phrase-style matching fails), whereas it works here.

4) PreAnalyzedField policy

- This repo retains and adapts PreAnalyzedField and
PreAnalyzedUpdateProcessor to Lucene 10 (createFields + custom Field
subclass).
- main removes these features. Ramification: users can no longer use
pre‑analyzed input on main; it continue to work in my branch.

5) ConcurrentMergeScheduler (auto IO throttling) default

- My branch matches Lucene 10’s default (autoIOThrottle=false) and only
enables throttling if ioThrottle=true is explicitly configured.
- main is enabled-by-default and disables only when ioThrottle=false.

6) Legacy Field.setTokenStream removed in Lucene 10

- My branch overrides LegacyField.setTokenStream to throw
UnsupportedOperationException with guidance on the supported migration path
(solr/core/src/java/org/apache/solr/legacy/LegacyField.java:83-89), so
callers discover the issue immediately.
- main leaves the legacy override calling
super.setTokenStream(tokenStream)

(external/solr-main/solr/core/src/java/org/apache/solr/legacy/LegacyField.java:83-89),
which still compiles because Lucene 10.2.2 retains that method.
Ramification: my branch fails fast and forces callers toward the Lucene 10
model (separate field subclass or paired fields), while main continues to
tolerate the now-discouraged API, so dependent code may postpone migrating
until the method eventually disappears.

7) RegExp complement toggle (migration knob)

- My branch exposes an allowDeprecatedRegexpComplement flag to ease
migration where complement was previously used; main does not include this
toggle. Ramification: greater operational flexibility here during migration
away from deprecated constructs.

8) Directory reference management precision

- My branch incRefs/releases the underlying Directory using the raw
reader (getRawReader().directory()), i.e., the Directory associated with
the reader before Uninverting/Exitable wrapping.
- main incRefs/releases via getIndexReader().directory(). Ramification:
using the raw reader ensures ref‑counts track the actual on‑disk Directory
reserved by DirectoryFactory and avoids ref‑count skew or premature release
when wrapper readers (Uninverting/Exitable) are added/removed; it also
avoids any surprises if a wrapper abstracts the Directory differently.

9) Max boolean clauses (per‑searcher vs global)

- My branch applies per‑searcher maxBooleanClauses (via
setMaxClauseCount on the searcher), allowing per‑collection limits that
respect each collection’s config.
- main sets a global limit at container startup. Ramification: mixed
collections share one global cap; a limit tuned for one collection can
over‑ or under‑constrain others, reducing isolation and potentially causing
avoidable failures.

10) In main ltr/TestLTRQParserPlugin.java:151 is annotated
@Ignore("SOLR-17840"), while my branches corresponding test file
(solr/modules/ltr/src/test/org/apache/solr/ltr/TestLTRQParserPlugin.java:1-200)
has no ignore and executes normally.

11) GraphQuery explicit Wildcard handling (determinization + rewrite)

- My branch uses explicit WildcardQuery construction in GraphQuery (sets
a determinize work limit and uses constant‑score rewrite), mirroring parser
behavior and bounding resource use.
- main uses the simpler constructor. Ramification: specifying a
determinize work limit prevents pathological wildcard patterns from
triggering expensive automaton determinization (CPU/memory spikes), and
constant‑score rewrite avoids term‑by‑term scoring expansion; together this
yields more predictable performance and consistent semantics with other
multi‑term queries.

12) SlowCompositeReaderWrapper#getDocValuesSkipper contract

- My branch throws UnsupportedOperationException (explicit “not
supported”) to prevent null surprises.
- main returns null. Ramification: clearer failure mode here reduces the
chance of latent NPEs and makes unsupported usage visible during testing.

13) ComplexPhraseQParserPlugin: unified multi‑term handling
(wildcard/prefix/automaton)

- My branch delegates wildcard/prefix construction to the reverse‑aware
Solr parser (so we don’t duplicate logic), then explicitly normalizes
Lucene 10 behavior inside complex phrases: sets a determinize work limit on
wildcard; enforces SCORING_BOOLEAN_REWRITE for wildcard, prefix, and
automaton; and enforces minimum prefix length.
- main delegates as well but only forces SCORING_BOOLEAN_REWRITE for
wildcard/automaton and does not normalize prefix rewrite inside complex
phrases. Ramification: here, complex phrases behave more consistently with
top‑level multi‑term queries (bounded determinization and explicit rewrite
semantics across wildcard, prefix, and automaton) which avoids unexpected
scoring/expansion behavior inside complex phrases.
- Additionally, in main wildcard handling is applied in two places
(delegate and ComplexPhrase override), which is functionally harmless but
redundant and harder to maintain; my branch centralizes the normalization
in one place.

14) RegExp complement migration flag

- My branch adds an allowDeprecatedRegexpComplement request parameter
that, when set, re-enables Lucene’s deprecated RegExp complement option by
OR-ing the RegExp.DEPRECATED_COMPLEMENT flag inside
SolrQueryParserBase.newRegexpQuery
(solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java:600-614).

- main omits this toggle and always uses the default RegExp.ALL flags

(external/solr-main/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java:600-608).
Ramification: here operators can phase out complement usage gradually by
flipping the parameter per collection or per request, while the other
branch forces an immediate migration away from complement.

Re: timeAllowed discounting and QueryLimits

Reply via email to