Hi All, I've raised this before during a meetup and on the dev slack but I'd like to raise it again after a more thorough review on my part. HTTP/2 seems to struggle with streaming large responses relative to HTTP/1. I was hoping the problem would "go away" with the latest versions but I can reproduce the same slowness and occasional stalling that we saw with Solr 9.X/Jetty 9.x, running very recent Solr main and Jetty 12.
After observing the same issues, I decided to do a deeper dive on HTTP/2 and Jetty's HTTP/2 API. I found a variety of levers to tune flow-control (one of the major architectural shifts of HTTP/2 over 1) but TLDR; none of them really worked in improving the performance reliably. You can read a more detailed version of the analysis here https://issues.apache.org/jira/browse/SOLR-18087 Some of the tests I ran can be hopefully reproduced running the benchmarks I added here https://github.com/apache/solr/pull/4079 The linked jira ticket has, among others, an attachment detailing the stream benchmark results as well as the exact jmh command that was run to achieve each result listed. A possible next step would be to reproduce this with a minimal Jetty example, without Solr in the mix. At a high level, we are streaming several large files concurrently over the same HTTP/2 connection, using Jetty’s InputStreamResponseListener to expose each response as an InputStream. If we can demonstrate the degradation in a small standalone test, we could share it with the Jetty project to see if there are optimizations we are missing, or additional flow-control control knobs that should be exposed. My current understanding is that HTTP/2 is a bigger win for smaller request/response traffic on connections that are often idle, where header compression and multiplexing help and flow control is less likely to be the bottleneck. For concurrent bulk streaming, HTTP layer flow control seems to hurt performance vs standard TCP which is famously performant for this kind of workflow. One thing that has puzzled me is how no one else seems to be complaining about this :-). It's possible that our set-up is unique, i.e. the problem is exacerbated by multiple shards co-located on a single, addressable node. We may also not be fully utilizing our network bandwidth with a single TCP connection and thus piling on to the flow control overhead (but my testing suggests flow control is significantly contributing to this). I'd appreciate any thoughts the community may have about this issue. I'd also love to hear about your Solr topology (if you are able to share), i.e. how many shards do you have on a single process and whether these shards share a single address from the perspective of other nodes. Thanks, Luke
