Mark, You make many excellent points, I’d like to see some of your changes in main. Are you planning to work on merging them? If not then I can at least create a bunch of tickets so that the diffs are easier to process.
Below I’ll comment mostly on the timeAllowed parts: > On 23 Sep 2025, at 01:11, Mark Miller <[email protected]> wrote: > > I forgot to post a message about it, but FYI, I pushed a Lucene 10 upgrade > branch a while back as well. > > I don't claim it is 100% correct, I only spent a day on it, but all tests > do pass. > > I also tried to identify where it differed from the upgrade that is in main > (at the time that upgrade work went in). Handling timeouts was one of the > differences. Here is a list I put together of all of the differences: > > 1) Timeouts: enforced via IndexSearcher.setTimeout + timedOut (Lucene 10 > idiom) > > - My branch sets an IndexSearcher timeout using a thread local strategy > (wired to QueryLimits) for timeAllowed and component searches (e.g., > Grouping/CommandHandler), and checks timedOut() to flag partial results > consistently. The Lucene‑10 partition-based search path is also used so > QueryTimeout is consulted during scoring. > - In main, the setTimeout call in the main search path is commented out, > and the Lucene‑10 partition-based override is disabled; it instead relies > on ExitableDirectoryReader if a system property is set. Ramification: > timeAllowed is frequently inert, many queries won’t be interrupted inside > Lucene, and partial results are not consistently flagged unless > ExitableDirectoryReader is explicitly enabled. > As of last week :) that’s no longer the case, I reinstated the setTimeout using the same trick as before, i.e. creating a new short-lived instance of IndexSearcher using the same IndexReader, which is (presumably) inexpensive, and using setTimeout on that instance. As to the other change: using CollectorManager is something we should definitely do - in a separate ticket. > > 2) Partial results handling in field sort values (FSV) > > - My branch handles FSV timeouts in place: doFieldSortValues catches > ExitableDirectoryReader.ExitingReaderException, logs the issue, marks the > response partial via rsp.setPartialResults(req), and returns an empty > sort_values block while letting the rest of the request finish > > (solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:569-575). > > - /home/markmiller/solr-main does not set the flag locally; it rethrows > the same exception and relies on SearchHandler.shortCircuitedResults to > intercept it later and short-circuit the response > > (external/solr-main/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:575-579, > then SearchHandler.java:526-529 & 761-779). Ramification: here the > timeout is contained to the FSV sub-phase (other components still run) and > the partial-results status is flagged exactly where the failure occurs, > versus invoking the global short-circuit path and aborting remaining work. > +1 makes sense to change this. > > 3) PathHierarchyTokenizer compatibility preserved > > - My branch adds targeted handling in TextField.parseFieldQuery for > PathHierarchyTokenizer’s Lucene 10 positional change so hierarchical path > queries continue to work as before. > - main does not include this fix. Ramification: hierarchical search can > break (phrase-style matching fails), whereas it works here. > > > 4) PreAnalyzedField policy > > - This repo retains and adapts PreAnalyzedField and > PreAnalyzedUpdateProcessor to Lucene 10 (createFields + custom Field > subclass). > - main removes these features. Ramification: users can no longer use > pre‑analyzed input on main; it continue to work in my branch. As the original author of this field type I was sad to see it go … but it seemed that very few people used it anyway. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
