[jira] [Updated] (SOLR-17394) IndexFetcher should inspect HTTP status codes on its requests

ASF GitHub Bot (Jira) Tue, 06 Aug 2024 12:53:53 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated SOLR-17394:
----------------------------------
    Labels: pull-request-available  (was: )

> IndexFetcher should inspect HTTP status codes on its requests
> -------------------------------------------------------------
>
>                 Key: SOLR-17394
>                 URL: https://issues.apache.org/jira/browse/SOLR-17394
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: replication (java)
>    Affects Versions: main (10.0), 9.6.1
>            Reporter: Jason Gerlowski
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Typically, SolrJ will look at the HTTP status code of a response and it will 
> throw exceptions as appropriate (see 
> [here|https://github.com/apache/solr/blob/988e9e3c2666784638475f4cd4827bc2fe77c707/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClientBase.java#L187-L217]).
>   But it skips this logic if users have elected to parse the response 
> themselves by use of an "InputStreamResponseParser".
> Solr's IndexFetcher uses this "InputStreamResponseParser" so that it can 
> access the binary index data in the HTTP response.  But it doesn't check the 
> status code of responses as it should.
> IndexFetcher will typically notice that the response is unexpected and can 
> retry and ultimately succeed, but this happens relatively late in the 
> process.  And that delay can be very expensive in many cases.  For instance:
> When IndexFetcher gets a "filecontent" response, it expects the first few 
> bytes to indicate the size of the binary response.  So it [reads these bytes 
> and instantiates a byte-array of the indicated 
> size|https://github.com/apache/solr/blob/988e9e3c2666784638475f4cd4827bc2fe77c707/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1836C1-L1847].
>   But if IndexFetcher happens to be reading a 404 response, the first few 
> bytes of the response will be the '*<*', '*h*', '*e*', and '*a*' characters 
> from the "*<head*>" tag that Solr uses to begin all its HTML errors.  This 
> leads to IndexFetcher allocating a massive > 1GB byte-array!  This can cause 
> GC churn in production and (for me at least) was causing test runs to 
> frequently OOM on certain machines.
> We should have IndexFetcher (and other places that use 
> InputStreamResponseParser) check the response status code as soon as its 
> available and handle errors accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-17394) IndexFetcher should inspect HTTP status codes on its requests

Reply via email to