I forgot to post a message about it, but FYI, I pushed a Lucene 10 upgrade
branch a while back as well.

I don't claim it is 100% correct, I only spent a day on it, but all tests
do pass.

I also tried to identify where it differed from the upgrade that is in main
(at the time that upgrade work went in). Handling timeouts was one of the
differences. Here is a list I put together of all of the differences:

1) Timeouts: enforced via IndexSearcher.setTimeout + timedOut (Lucene 10
idiom)

   - My branch sets an IndexSearcher timeout using a thread local strategy
   (wired to QueryLimits) for timeAllowed and component searches (e.g.,
   Grouping/CommandHandler), and checks timedOut() to flag partial results
   consistently. The Lucene‑10 partition-based search path is also used so
   QueryTimeout is consulted during scoring.
   - In main, the setTimeout call in the main search path is commented out,
   and the Lucene‑10 partition-based override is disabled; it instead relies
   on ExitableDirectoryReader if a system property is set. Ramification:
   timeAllowed is frequently inert, many queries won’t be interrupted inside
   Lucene, and partial results are not consistently flagged unless
   ExitableDirectoryReader is explicitly enabled.


2) Partial results handling in field sort values (FSV)

   - My branch handles FSV timeouts in place: doFieldSortValues catches
   ExitableDirectoryReader.ExitingReaderException, logs the issue, marks the
   response partial via rsp.setPartialResults(req), and returns an empty
   sort_values block while letting the rest of the request finish
   
(solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:569-575).

   - /home/markmiller/solr-main does not set the flag locally; it rethrows
   the same exception and relies on SearchHandler.shortCircuitedResults to
   intercept it later and short-circuit the response
   
(external/solr-main/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:575-579,
   then SearchHandler.java:526-529 & 761-779). Ramification: here the
   timeout is contained to the FSV sub-phase (other components still run) and
   the partial-results status is flagged exactly where the failure occurs,
   versus invoking the global short-circuit path and aborting remaining work.


3) PathHierarchyTokenizer compatibility preserved

   - My branch adds targeted handling in TextField.parseFieldQuery for
   PathHierarchyTokenizer’s Lucene 10 positional change so hierarchical path
   queries continue to work as before.
   - main does not include this fix. Ramification: hierarchical search can
   break (phrase-style matching fails), whereas it works here.


4) PreAnalyzedField policy

   - This repo retains and adapts PreAnalyzedField and
   PreAnalyzedUpdateProcessor to Lucene 10 (createFields + custom Field
   subclass).
   - main removes these features. Ramification: users can no longer use
   pre‑analyzed input on main; it continue to work in my branch.


5) ConcurrentMergeScheduler (auto IO throttling) default

   - My branch matches Lucene 10’s default (autoIOThrottle=false) and only
   enables throttling if ioThrottle=true is explicitly configured.
   - main is enabled-by-default and disables only when ioThrottle=false.


6) Legacy Field.setTokenStream removed in Lucene 10

   - My branch overrides LegacyField.setTokenStream to throw
   UnsupportedOperationException with guidance on the supported migration path
   (solr/core/src/java/org/apache/solr/legacy/LegacyField.java:83-89), so
   callers discover the issue immediately.
   - main leaves the legacy override calling
   super.setTokenStream(tokenStream)
   
(external/solr-main/solr/core/src/java/org/apache/solr/legacy/LegacyField.java:83-89),
   which still compiles because Lucene 10.2.2 retains that method.
   Ramification: my branch fails fast and forces callers toward the Lucene 10
   model (separate field subclass or paired fields), while main continues to
   tolerate the now-discouraged API, so dependent code may postpone migrating
   until the method eventually disappears.


7) RegExp complement toggle (migration knob)

   - My branch exposes an allowDeprecatedRegexpComplement flag to ease
   migration where complement was previously used; main does not include this
   toggle. Ramification: greater operational flexibility here during migration
   away from deprecated constructs.


8) Directory reference management precision

   - My branch incRefs/releases the underlying Directory using the raw
   reader (getRawReader().directory()), i.e., the Directory associated with
   the reader before Uninverting/Exitable wrapping.
   - main incRefs/releases via getIndexReader().directory(). Ramification:
   using the raw reader ensures ref‑counts track the actual on‑disk Directory
   reserved by DirectoryFactory and avoids ref‑count skew or premature release
   when wrapper readers (Uninverting/Exitable) are added/removed; it also
   avoids any surprises if a wrapper abstracts the Directory differently.


9) Max boolean clauses (per‑searcher vs global)

   - My branch applies per‑searcher maxBooleanClauses (via
   setMaxClauseCount on the searcher), allowing per‑collection limits that
   respect each collection’s config.
   - main sets a global limit at container startup. Ramification: mixed
   collections share one global cap; a limit tuned for one collection can
   over‑ or under‑constrain others, reducing isolation and potentially causing
   avoidable failures.


10) In main ltr/TestLTRQParserPlugin.java:151 is annotated
@Ignore("SOLR-17840"), while my branches corresponding test file
(solr/modules/ltr/src/test/org/apache/solr/ltr/TestLTRQParserPlugin.java:1-200)
has no ignore and executes normally.

11) GraphQuery explicit Wildcard handling (determinization + rewrite)

   - My branch uses explicit WildcardQuery construction in GraphQuery (sets
   a determinize work limit and uses constant‑score rewrite), mirroring parser
   behavior and bounding resource use.
   - main uses the simpler constructor. Ramification: specifying a
   determinize work limit prevents pathological wildcard patterns from
   triggering expensive automaton determinization (CPU/memory spikes), and
   constant‑score rewrite avoids term‑by‑term scoring expansion; together this
   yields more predictable performance and consistent semantics with other
   multi‑term queries.


12) SlowCompositeReaderWrapper#getDocValuesSkipper contract

   - My branch throws UnsupportedOperationException (explicit “not
   supported”) to prevent null surprises.
   - main returns null. Ramification: clearer failure mode here reduces the
   chance of latent NPEs and makes unsupported usage visible during testing.


13) ComplexPhraseQParserPlugin: unified multi‑term handling
(wildcard/prefix/automaton)

   - My branch delegates wildcard/prefix construction to the reverse‑aware
   Solr parser (so we don’t duplicate logic), then explicitly normalizes
   Lucene 10 behavior inside complex phrases: sets a determinize work limit on
   wildcard; enforces SCORING_BOOLEAN_REWRITE for wildcard, prefix, and
   automaton; and enforces minimum prefix length.
   - main delegates as well but only forces SCORING_BOOLEAN_REWRITE for
   wildcard/automaton and does not normalize prefix rewrite inside complex
   phrases. Ramification: here, complex phrases behave more consistently with
   top‑level multi‑term queries (bounded determinization and explicit rewrite
   semantics across wildcard, prefix, and automaton) which avoids unexpected
   scoring/expansion behavior inside complex phrases.
   - Additionally, in main wildcard handling is applied in two places
   (delegate and ComplexPhrase override), which is functionally harmless but
   redundant and harder to maintain; my branch centralizes the normalization
   in one place.


14) RegExp complement migration flag

   - My branch adds an allowDeprecatedRegexpComplement request parameter
   that, when set, re-enables Lucene’s deprecated RegExp complement option by
   OR-ing the RegExp.DEPRECATED_COMPLEMENT flag inside
   SolrQueryParserBase.newRegexpQuery
   (solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java:600-614).

   - main omits this toggle and always uses the default RegExp.ALL flags
   
(external/solr-main/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java:600-608).
   Ramification: here operators can phase out complement usage gradually by
   flipping the parameter per collection or per request, while the other
   branch forces an immediate migration away from complement.

Reply via email to