[
https://issues.apache.org/jira/browse/SOLR-18087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke Kot-Zaniewski updated SOLR-18087:
--------------------------------------
Description:
There appear to be some severe http/2 regressions since at least 9.8, most
notably with the stream handler as well as index recovery. The impact is at the
very least slowness and in some cases outright response stalling. The response
stalling appears to be caused by HTTP/2's flow control. The obvious thing these
two very different workloads share in common is that they stream large
responses. This means, among other things, that they may be more directly
impacted by HTTP2's flow control mechanism.
In my testing I have tweaked the following parameters:
# http1 vs http2 - as stated, http1 seems to be strictly better as in faster
and more stable.
# shards per node - the greater the number of shards per node the more (large,
simultaneous) responses share a single connection during inter-node
communication. This has generally resulted in poorer performance.
# maxConcurrentStreams - reducing this to, say 1, can effectively circumvent
multiplexing. Circumventing multiplexing does seem to improve index recovery
somewhat (still slower than HTTP/1). On the other hand, this seems antithetical
to the point of http2. It's also interesting this doesn't help
# initialSessionRecvWindow - This is the amount of buffer the client gets
initially for each connection. This gets shared by the many responses that
share the multiplexed connection.
# initialStreamRecvWindow - This is the amount of buffer each stream gets
initially within a single HTTP/2 session. I've found that when this is too big
relative to initialSessionRecvWindow it can lead to stalling because of flow
control enforcement
I’m attaching summaries of my findings, some of which can be reproduced by
running the appropriate benchmark in this branch:
[https://github.com/kotman12/solr/tree/http2-shenanigans.
|https://github.com/kotman12/solr/tree/http2-shenanigans.]
I may try reproducing this in a pure jetty example. I am beginning to think
multiple large responses getting streamed simultaneously between the same
client and server may some kind of edge case in the library or protocol,
itself. It may have something to do with how Jetty's
InputStreamResponseListener is implemented although according to the docs it
_should_ be compatible with HTTP/2.
Another option would be to
was:
There appear to be some severe http/2 regressions since at least 9.8, most
notably with the stream handler as well as index recovery. The impact is at the
very least slowness and in some cases outright response stalling. The response
stalling appears to be caused by HTTP/2's flow control. The obvious thing these
two very different workloads share in common is that they stream large
responses. This means, among other things, that they may be more directly
impacted by HTTP2's flow control mechanism.
In my testing I have tweaked the following parameters:
# http1 vs http2 - as stated, http1 seems to be strictly better as in faster
and more stable.
# shards per node - the greater the number of shards per node the more (large,
simultaneous) responses share a single connection during inter-node
communication. This has generally resulted in poorer performance.
# maxConcurrentStreams - reducing this to, say 1, can effectively circumvent
multiplexing. Circumventing multiplexing does seem to improve index recovery
somewhat (still slower than HTTP/1). On the other hand, this seems antithetical
to the point of http2. It's also interesting this doesn't help
# initialSessionRecvWindow - This is the amount of buffer the client gets for
each connection. This gets shared by the many responses that share the
multiplexed connection.
# initialStreamRecvWindow - This is the amount of buffer each stream gets
within a single HTTP/2 session. I've found that when this is too big relative
to initialSessionRecvWindow it can lead to stalling because of flow control
enforcement
I’m attaching summaries of my findings, some of which can be reproduced by
running the appropriate benchmark in this branch:
[https://github.com/kotman12/solr/tree/http2-shenanigans.]
My next step is to solicit some feedback from the community. Absent anything
else I may try reproducing this in a pure jetty example. I am beginning to
think multiple large responses getting streamed simultaneously between the same
client and server may some kind of edge case the library doesn't handle well.
It may have something to do with how Jetty's InputStreamResponseListener is
implemented although according to the docs it _should_ be compatible with
HTTP/2.
> HTTP/2 Struggles With Streaming Large Responses
> -----------------------------------------------
>
> Key: SOLR-18087
> URL: https://issues.apache.org/jira/browse/SOLR-18087
> Project: Solr
> Issue Type: Bug
> Reporter: Luke Kot-Zaniewski
> Priority: Major
> Labels: pull-request-available
> Attachments: flow-control-stall.log, index-recovery-tests.md,
> stream-benchmark-results.md
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> There appear to be some severe http/2 regressions since at least 9.8, most
> notably with the stream handler as well as index recovery. The impact is at
> the very least slowness and in some cases outright response stalling. The
> response stalling appears to be caused by HTTP/2's flow control. The obvious
> thing these two very different workloads share in common is that they stream
> large responses. This means, among other things, that they may be more
> directly impacted by HTTP2's flow control mechanism.
> In my testing I have tweaked the following parameters:
> # http1 vs http2 - as stated, http1 seems to be strictly better as in faster
> and more stable.
> # shards per node - the greater the number of shards per node the more
> (large, simultaneous) responses share a single connection during inter-node
> communication. This has generally resulted in poorer performance.
> # maxConcurrentStreams - reducing this to, say 1, can effectively circumvent
> multiplexing. Circumventing multiplexing does seem to improve index recovery
> somewhat (still slower than HTTP/1). On the other hand, this seems
> antithetical to the point of http2. It's also interesting this doesn't help
> # initialSessionRecvWindow - This is the amount of buffer the client gets
> initially for each connection. This gets shared by the many responses that
> share the multiplexed connection.
> # initialStreamRecvWindow - This is the amount of buffer each stream gets
> initially within a single HTTP/2 session. I've found that when this is too
> big relative to initialSessionRecvWindow it can lead to stalling because of
> flow control enforcement
> I’m attaching summaries of my findings, some of which can be reproduced by
> running the appropriate benchmark in this branch:
> [https://github.com/kotman12/solr/tree/http2-shenanigans.
> |https://github.com/kotman12/solr/tree/http2-shenanigans.]
> I may try reproducing this in a pure jetty example. I am beginning to think
> multiple large responses getting streamed simultaneously between the same
> client and server may some kind of edge case in the library or protocol,
> itself. It may have something to do with how Jetty's
> InputStreamResponseListener is implemented although according to the docs it
> _should_ be compatible with HTTP/2.
> Another option would be to
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]