parveen saini created SOLR-18117:
------------------------------------

             Summary: Docs suggestion: semantic and p99 validation checks to 
consider during major Solr/Lucene upgrades
                 Key: SOLR-18117
                 URL: https://issues.apache.org/jira/browse/SOLR-18117
             Project: Solr
          Issue Type: Improvement
          Components: documentation
            Reporter: parveen saini


h2. *Description*

I’m sharing a small set of upgrade validation checks that may be useful for 
documentation when teams perform major Solr/Lucene version upgrades.

During a Solr 5 to 8 upgrade in a latency-sensitive production system (Solr 
feeding downstream ML ranking), we observed behavior changes that were not 
configuration errors and did not surface in standard regression tests. Queries 
returned valid responses, but ranking behavior and p99 latency shifted in ways 
that only became visible under production-like traffic.

The underlying causes were subtle semantic and execution-path differences 
across versions. Nothing was failing in the traditional sense, which made the 
changes harder to detect without explicit side-by-side comparison.

This is not a bug report and no code changes are proposed. The goal is simply 
to document validation steps that helped detect behavior changes during a major 
upgrade.
h2. *Validation checks that surfaced issues*

The following checks helped identify behavior differences that were not caught 
by typical configuration review or test suites:

*Rank churn under identical queries*
Compare top-N document identity and ordering across retries, not just score 
thresholds. Relative ordering drift can occur even when average scores appear 
stable.

*Negative intermediate scores affecting candidate inclusion*
Validate whether function queries or boosts that can produce negative values 
affect document inclusion earlier in the execution path under newer versions.

*Query rewrite differences across versions*
Inspect rewritten query forms side-by-side. Internal rewrite behavior changes 
can alter retrieval semantics without producing explicit errors.

*Downstream ML feature stability*
If Solr feeds an ML ranking layer, validate feature distributions and candidate 
set stability before retraining. Retrieval semantic changes may affect model 
behavior even if raw queries appear correct.

*p99 latency shifts driven by response construction*
Measure tail latency and CPU cost in field loading and transformer execution 
paths. p99 may regress independently of average latency.

*Behavior under realistic concurrency*
Reproduce correctness and latency behavior under production-like concurrency 
and payload sizes. Some differences may only appear under sustained load.
h2. *Reference*

A more detailed checklist with additional context is available here:

[https://github.com/parveensaini/solr-lucene-migration-correctness.git]

(The checklist is intended as a validation aid, not configuration guidance.)
h2. *Intent*
 * Documentation-oriented only

 * No code changes proposed

 * Not claiming universal applicability

 * Shared in case it helps teams detect subtle behavioral changes during major 
upgrades

Feedback on whether this would be useful as documentation is welcome.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to