[
https://issues.apache.org/jira/browse/SOLR-18087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SOLR-18087:
----------------------------------
Labels: pull-request-available (was: )
> HTTP/2 Struggles With Streaming Large Responses
> -----------------------------------------------
>
> Key: SOLR-18087
> URL: https://issues.apache.org/jira/browse/SOLR-18087
> Project: Solr
> Issue Type: Bug
> Reporter: Luke Kot-Zaniewski
> Priority: Major
> Labels: pull-request-available
> Attachments: flow-control-stall.log, index-recovery-tests.md,
> stream-benchmark-results.md
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> There appear to be some severe http/2 regressions since at least 9.8, most
> notably with the stream handler as well as index recovery. The impact is at
> the very least slowness and in some cases outright response stalling. The
> response stalling appears to be caused by HTTP/2's flow control. The obvious
> thing these two very different workloads share in common is that they stream
> large responses. This means, among other things, that they may be more
> directly impacted by HTTP2's flow control mechanism.
> In my testing I have tweaked the following parameters:
> # http1 vs http2 - as stated, http1 seems to be strictly better as in faster
> and more stable.
> # shards per node - the greater the number of shards per node the more
> (large, simultaneous) responses share a single connection during inter-node
> communication. This has generally resulted in poorer performance.
> # maxConcurrentStreams - reducing this to, say 1, can effectively
> circumvent multiplexing. Circumventing multiplexing does seem to improve
> index recovery somewhat (still slower than HTTP/1). On the other hand, this
> seems antithetical to the point of http2. It's also interesting this doesn't
> help
> # initialSessionRecvWindow - This is the amount of buffer the client gets
> for each connection. This gets shared by the many responses that share the
> multiplexed connection.
> # initialStreamRecvWindow - This is the amount of buffer each stream gets
> within a single HTTP/2 session. I've found that when this is too big relative
> to initialSessionRecvWindow it can lead to stalling because of flow control
> enforcement
> I’m attaching summaries of my findings, some of which can be reproduced by
> running the appropriate benchmark in this branch:
> [https://github.com/kotman12/solr/tree/http2-shenanigans.]
> My next step is to solicit some feedback from the community. Absent anything
> else I may try reproducing this in a pure jetty example. I am beginning to
> think multiple large responses getting streamed simultaneously between the
> same client and server may some kind of edge case the library doesn't handle
> well. It may have something to do with how Jetty's
> InputStreamResponseListener is implemented although according to the docs it
> _should_ be compatible with HTTP/2.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]