[ https://issues.apache.org/jira/browse/SOLR-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SOLR-17394: ---------------------------------- Labels: pull-request-available (was: ) > IndexFetcher should inspect HTTP status codes on its requests > ------------------------------------------------------------- > > Key: SOLR-17394 > URL: https://issues.apache.org/jira/browse/SOLR-17394 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) > Affects Versions: main (10.0), 9.6.1 > Reporter: Jason Gerlowski > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Typically, SolrJ will look at the HTTP status code of a response and it will > throw exceptions as appropriate (see > [here|https://github.com/apache/solr/blob/988e9e3c2666784638475f4cd4827bc2fe77c707/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClientBase.java#L187-L217]). > But it skips this logic if users have elected to parse the response > themselves by use of an "InputStreamResponseParser". > Solr's IndexFetcher uses this "InputStreamResponseParser" so that it can > access the binary index data in the HTTP response. But it doesn't check the > status code of responses as it should. > IndexFetcher will typically notice that the response is unexpected and can > retry and ultimately succeed, but this happens relatively late in the > process. And that delay can be very expensive in many cases. For instance: > When IndexFetcher gets a "filecontent" response, it expects the first few > bytes to indicate the size of the binary response. So it [reads these bytes > and instantiates a byte-array of the indicated > size|https://github.com/apache/solr/blob/988e9e3c2666784638475f4cd4827bc2fe77c707/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1836C1-L1847]. > But if IndexFetcher happens to be reading a 404 response, the first few > bytes of the response will be the '*<*', '*h*', '*e*', and '*a*' characters > from the "*<head*>" tag that Solr uses to begin all its HTML errors. This > leads to IndexFetcher allocating a massive > 1GB byte-array! This can cause > GC churn in production and (for me at least) was causing test runs to > frequently OOM on certain machines. > We should have IndexFetcher (and other places that use > InputStreamResponseParser) check the response status code as soon as its > available and handle errors accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org