[jira] [Created] (SOLR-17394) IndexFetcher should inspect HTTP status codes on its requests

Jason Gerlowski (Jira) Tue, 06 Aug 2024 07:08:04 -0700

Jason Gerlowski created SOLR-17394:
--------------------------------------

             Summary: IndexFetcher should inspect HTTP status codes on its 
requests
                 Key: SOLR-17394
                 URL: https://issues.apache.org/jira/browse/SOLR-17394
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: replication (java)
    Affects Versions: 9.6.1, main (10.0)
            Reporter: Jason Gerlowski

Typically, SolrJ will look at the HTTP status code of a response and it will
throw exceptions as appropriate (see
[here|https://github.com/apache/solr/blob/988e9e3c2666784638475f4cd4827bc2fe77c707/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClientBase.java#L187-L217]).
But it skips this logic if users have elected to parse the response
themselves by use of an "InputStreamResponseParser".

Solr's IndexFetcher uses this "InputStreamResponseParser" so that it can access
the binary index data in the HTTP response. But it doesn't check the status
code of responses as it should.

IndexFetcher will typically notice that the response is unexpected and can
retry and ultimately succeed, but this happens relatively late in the process.
And that delay can be very expensive in many cases. For instance:

When IndexFetcher gets a "filecontent" response, it expects the first few bytes
to indicate the size of the binary response. So it [reads these bytes and
instantiates a byte-array of the indicated
size|https://github.com/apache/solr/blob/988e9e3c2666784638475f4cd4827bc2fe77c707/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1836C1-L1847].
But if IndexFetcher happens to be reading a 404 response, the first few bytes
of the response will be the '*<*', '*h*', '*e*', and '*a*' characters from the
"*<head*>" tag that Solr uses to begin all its HTML errors. This leads to
IndexFetcher allocating a massive > 1GB byte-array! This can cause GC churn in
production and (for me at least) was causing test runs to frequently OOM on
certain machines.

We should have IndexFetcher (and other places that use
InputStreamResponseParser) check the response status code as soon as its
available and handle errors accordingly.

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-17394) IndexFetcher should inspect HTTP status codes on its requests

Reply via email to