Luke Kot-Zaniewski created SOLR-18087:
-----------------------------------------

             Summary: HTTP/2 Struggles With Streaming Large Responses
                 Key: SOLR-18087
                 URL: https://issues.apache.org/jira/browse/SOLR-18087
             Project: Solr
          Issue Type: Bug
            Reporter: Luke Kot-Zaniewski
         Attachments: flow-control-stall.log, index-recovery-tests.md, 
stream-benchmark-results.md

There appear to be some severe http/2 regressions since at least 9.8, most 
notably with the stream handler as well as index recovery. The impact is at the 
very least slowness and in some cases outright response stalling. The response 
stalling appears to be caused by HTTP/2's flow control. The obvious thing these 
two very different workloads share in common is that they stream large 
responses. This means, among other things, that they may be more directly 
impacted by HTTP2's flow control mechanism.

In my testing I have tweaked the following parameters:
 # http1 vs http2 - as stated, http1 seems to be strictly better as in faster 
and more stable.
 # shards per node - the greater the number of shards per node the more (large, 
simultaneous) responses share a single connection during inter-node 
communication. This has generally resulted in poorer performance.
 #  maxConcurrentStreams - reducing this to, say 1, can effectively circumvent 
multiplexing. Circumventing multiplexing does seem to improve index recovery 
somewhat (still slower than HTTP/1). On the other hand, this seems antithetical 
to the point of http2. It's also interesting this doesn't help 
 #  initialSessionRecvWindow - This is the amount of buffer the client gets for 
each connection. This gets shared by the many responses that share the 
multiplexed connection.
 #  initialStreamRecvWindow - This is the amount of buffer each stream gets 
within a single HTTP/2 session. I've found that when this is too big relative 
to initialSessionRecvWindow it can lead to stalling because of flow control 
enforcement


I’m attaching summaries of my findings, some of which can be reproduced by 
running the appropriate benchmark in this branch: 
[https://github.com/kotman12/solr/tree/http2-shenanigans.]

My next step is to solicit some feedback from the community. Absent anything 
else I may try reproducing this in a pure jetty example. I am beginning to 
think multiple large responses getting streamed simultaneously between the same 
client and server may some kind of edge case the library doesn't handle well. 
It may have something to do with how Jetty's InputStreamResponseListener is 
implemented although according to the docs it _should_ be compatible with 
HTTP/2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to