[ 
https://issues.apache.org/jira/browse/SOLR-18236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079891#comment-18079891
 ] 

David Smiley commented on SOLR-18236:
-------------------------------------

+I developed this problem statement and plan with Claude and I certainly 
reviewed everything.  Didn't validate every claim but overall makes sense with 
what I've seen.  I don't have conviction of all elements of the plan.  For 
example, I'd prefer the changes here for 10.1 with no need for waiting till 11.+
h2. 1. Proposed Vision

Three exception types, three distinct concepts. Each carries its weight; none 
lies about what it represents.
h3. IOException — local / transport failure

Reserved for genuine I/O at the transport layer that prevents a 
request-response exchange from completing (socket closed mid-read, EOF before 
headers, connection refused).

*Critical rule:* stop wrapping every IOException in SolrServerException. Let it 
propagate. This is the original, correct intent of the +throws IOException+ 
clause.
h3. SolrServerException — client-side orchestration / pre-flight failure

Failures that happen on the client side, _before or instead of_ a successful 
HTTP exchange:
 * No live servers in the LB pool
 * Time/processing limits exceeded inside the LB or async client
 * Retries exhausted in CloudSolrClient
 * Missing default collection (configuration / pre-flight validation)
 * Connection refused (logical "I couldn't reach a server"), URI syntax errors
 * Timeout waiting for headers (request never produced a server response)

Stays a checked {+}Exception{+}. Its semantics narrow: "I couldn't get a 
response from a Solr server."
h3. RemoteSolrException — server replied with an HTTP error

Re-parented under SolrServerException. Becomes {*}checked{*}. Carries: +code()+ 
(HTTP status), {+}getMetadata(){+}, {+}getDetails(){+}, 
{+}getRemoteErrorObject(){+}, {+}shouldSkipRetry(){+}.

Triggered iff the server returned a non-2xx status code. Connection-refused / 
timeout / parse-failure cases move out of this class into SolrServerException 
or IOException.
h3. Resulting SolrClient.request signature
{code:java}
public abstract NamedList<Object> request(SolrRequest<?> req, String coll)
    throws SolrServerException, IOException;
{code}
Unchanged signature. Javadoc now means what it says. A single +catch 
(SolrServerException)+ covers both client-orchestration failures and 
server-replied-with-error.
{noformat}
Throwable
 ├─ Exception
 │   ├─ IOException                         (transport)
 │   └─ SolrServerException                 (client orchestration; checked)
 │       └─ RemoteSolrException             (server returned HTTP error; 
checked)
 └─ RuntimeException
     └─ SolrException                       (shared error w/ HTTP-code shape; 
unchanged)
{noformat}
h3. SolrException — the shared HTTP-shaped error type

SolrException lives in +org.apache.solr.common+ (intentionally shared between 
server and SolrJ). There are ~88 +throw new SolrException+ sites inside the 
SolrJ build:
 * ~70 in +o.a.s.common.*+ shared utilities (SolrParams, Aliases, ClusterState, 
DocRouter, FacetParams, NamedList, Utils, etc.)
 * ~15 in +o.a.s.client.solrj.*+ proper (pre-flight validation, response parser 
failures, CloudSolrClient cluster-state checks)

*Decision: keep SolrException structurally unchanged.* SolrException and 
SolrServerException are parallel hierarchies:
 * *SolrException* — "I'm encountering an error; here's its HTTP-shape." 
Unchecked. Throwable from anywhere — server handlers, shared validators, 
pre-flight checks.
 * *SolrServerException* — "I, as a SolrClient, cannot deliver a response to my 
caller." Checked. Post-{+}request(){+} API contract.

RemoteSolrException bridges both by duplicating SolrException's 
+code/metadata/details+ accessors while subclassing SolrServerException.

*Mandatory: fix the missing javadoc on SolrException* (currently just {+}/** 
*/{+}). Document its role, cross-reference RemoteSolrException and 
SolrServerException, document {+}code(){+}, {+}getMetadata(){+}, 
{+}getDetails(){+}, MDC logging helpers.

*Out of scope (future cleanup):* The ~15 strict-client SolrException throws in 
+o.a.s.client.solrj.*+ (pre-flight → SolrServerException, parser failures → 
IOException) each flip unchecked → checked at caller signatures — separate 
breaking-change pass.
----
h2. 2. Changes
h3. 2a. Type hierarchy

*RemoteSolrException* — re-parent from +extends SolrException+ to {+}extends 
SolrServerException{+}. Loses inherited 
{+}code(){+}/{+}metadata{+}/{+}details{+} accessors; duplicate those fields and 
accessors directly on RemoteSolrException.

*SolrServerException* — add a constructor accepting a cause Throwable plus HTTP 
code (so RemoteSolrException can chain through).
h3. 2b. Stop wrapping IOException in SolrServerException
 * *HttpJettySolrClient.java:495-514* — drop the IOException → 
SolrServerException translation; let IOExceptions propagate. ConnectException: 
one explicit translation kept. Timeout: stays SolrServerException.
 * *HttpJdkSolrClient.java:182-260* — same audit. RuntimeException wrap at line 
199: keep (genuinely surprising). URISyntaxException at line 259: keep as 
SolrServerException (pre-flight).
 * *HttpSolrClientBase.processErrorsAndResponse:299-300* — IOException reading 
response body should propagate as IOException, not be re-thrown as 
RemoteSolrException.
 * *ConcurrentUpdateBaseSolrClient.java* — similar audit.

h3. 2c. Server-error path becomes checked

All +throw new RemoteSolrException(...)+ sites in HttpSolrClientBase and 
ConcurrentUpdateBaseSolrClient now throw a checked exception. 
+processErrorsAndResponse+ already declares {+}throws SolrServerException{+}, 
which now covers RemoteSolrException. No signature changes propagate further up 
the call chain.
h3. 2d. Javadoc rewrite

SolrClient.java — update every method's javadoc:
{noformat}
@throws IOException low-level transport failure (socket, parse)
@throws SolrServerException client-side failure (no live server, timeout, 
retries exhausted),
        OR server returned an HTTP error (a RemoteSolrException); inspect the 
type
        and call code() to distinguish.
{noformat}
Update class javadocs on SolrException, SolrServerException, 
RemoteSolrException.
h3. 2e. Files to modify
 # +solr/solrj/src/java/org/apache/solr/client/solrj/RemoteSolrException.java+ 
— re-parent, duplicate fields
 # +solr/solrj/src/java/org/apache/solr/client/solrj/SolrServerException.java+ 
— add constructors, javadoc
 # +solr/solrj/src/java/org/apache/solr/common/SolrException.java+ — write the 
missing class javadoc
 # +solr/solrj/src/java/org/apache/solr/client/solrj/SolrClient.java+ — javadoc 
on every throws-bearing method
 # 
+solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClientBase.java+ 
— IOException pass-through in processErrorsAndResponse
 # 
+solr/solrj-jetty/src/java/org/apache/solr/client/solrj/jetty/HttpJettySolrClient.java+
 — drop IOException wrapping in request
 # 
+solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpJdkSolrClient.java+ 
— same audit
 # 
+solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateBaseSolrClient.java+
 — same
 # +solr/CHANGES.txt+ — migration entry

----
h2. 3. Backwards-Compatibility Concerns
h3. 3a. Source-compat for callers — mostly safe
||Pattern||Sites||Result after change||
|+catch (SolrServerException \| IOException e)+|52 in tree + unknown 
third-party|*No change* — RSE caught by SolrServerException arm|
|+catch (SolrServerException e)+ alone|11 in solrj|Now also catches RSE. 
Improvement: callers were discarding HTTP-code anyway.|
|+catch (RemoteSolrException e)+|45 in solrj|*No change* — still compiles, 
behavior unchanged|
|+catch (SolrException e)+ expecting to catch RSE|unknown|*Breaks* — RSE no 
longer extends SolrException|
h3. 3b. Source-compat for propagators — the actual breakage

Code that propagates RemoteSolrException through a method *without* a +throws 
SolrServerException+ declaration will fail to compile:
 * +Runnable.run()+ bodies issuing SolrJ calls
 * Lambda callbacks, async handlers, executor tasks
 * Methods with no +throws+ clause currently letting RSE escape unannounced

These sites must add a +throws+ clause, wrap, or catch. Audit pass needed 
across the codebase. Third-party callers need a migration note.
h3. 3c. Wire compatibility

The wire payload between server and SolrJ client is {*}unchanged{*}. 
RemoteSolrException is constructed locally from parsed response bytes. 
SolrException is unchanged — server-side error reporting (ResponseUtils, 
BinaryResponseWriter) is untouched.
h3. 3d. Migration plan — two majors

*Solr 10.x (non-breaking groundwork):*
 # Add new constructors on SolrServerException needed by RemoteSolrException.
 # Rewrite javadocs on SolrClient, RemoteSolrException, SolrServerException, 
SolrException.
 # Stop wrapping IOException in SolrServerException in the four impl files. 
(Observable but defensible; CHANGES.txt entry required.)
 # Internal audit: find all sites in solr/ that propagate RemoteSolrException 
without a +throws+ clause. Add explicit clauses or catch blocks.

*Solr 11.0 (structural break): _{color:#ffab00}or we just do this in 
10.x{color}_*
 # Re-parent {+}RemoteSolrException extends SolrServerException{+}. Duplicate 
code/metadata/details fields directly on RemoteSolrException.
 # CHANGES.txt + upgrade-notes entry: "RemoteSolrException is now checked. Code 
that previously relied on it being a RuntimeException to propagate without a 
+throws+ clause will no longer compile. Add +throws SolrServerException+ or 
catch."

h3. 3e. Risks not worth taking
 * *Don't* make SolrException extend SolrServerException — flips all of 
SolrException from unchecked to checked. Catastrophic for server-side handlers 
and +o.a.s.common+ validators.
 * *Don't* make RemoteSolrException extend IOException — calling a 404 an "I/O 
error" is the lie this cleanup is fixing.
 * *Don't* rename anything — stable public API; deprecation cycle for cosmetic 
value is not worth it.

h2. Verification
 * Test: request to a server returning 500 — assert RemoteSolrException is 
caught by +catch (SolrServerException e)+ and +e.code() == 500+
 * Test: request to an unreachable port — assert raw IOException propagates 
without SolrServerException wrapping
 * Audit: +grep -r "catch (SolrException" solr/+ — find sites that relied on 
RSE being a SolrException
 * Audit: search lambdas/Runnables propagating RSE implicitly

> SolrJ exception hiearchy needs work
> -----------------------------------
>
>                 Key: SOLR-18236
>                 URL: https://issues.apache.org/jira/browse/SOLR-18236
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>            Reporter: David Smiley
>            Priority: Major
>
> +SolrClient.request()+ declares +throws SolrServerException, IOException+ 
> with javadoc claiming:
>  * *IOException* — "communication error with the server"
>  * *SolrServerException* — "error on the server"
> Reality is the inverse on both axes:
> ||Scenario||Actual exception||Checked?||
> |Server returns non-2xx|RemoteSolrException (extends SolrException → 
> RuntimeException)|*unchecked*|
> |Response body read fails (true I/O error)|RemoteSolrException|*unchecked*|
> |Server timeout / connection refused|SolrServerException (wraps the 
> IOException)|checked|
> |No live servers / retries exhausted|SolrServerException|checked|
> |Raw IOException propagated as-is|almost never|—|
> h3. Contradictions
>  # "Error on the server" (4xx/5xx reply) surfaces as *unchecked* 
> RemoteSolrException — the javadoc says it should be SolrServerException. 
> Compiler cannot help callers handle the most common failure mode.
>  # "Communication error" rarely surfaces as IOException. Every concrete 
> SolrClient wraps IOException (ConnectException, socket timeout, etc.) into 
> SolrServerException. The +throws IOException+ clause is essentially dead.
>  # HttpSolrClientBase.processErrorsAndResponse catches a real I/O error 
> reading the response body and re-throws it as RemoteSolrException — calling a 
> transport failure a "remote error."
>  # 52 call sites just do +catch (SolrServerException | IOException e)+ and 
> treat them identically — the split has no observable value at most call sites.
>  # Yet 45 sites specifically catch RemoteSolrException to inspect +e.code()+ 
> — the HTTP-status-bearing exception is the genuinely useful concept, and it 
> is the one the type system hides.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to