Feature Request: Configurable "drain duration" between the two HTTP/2 GOAWAY
frames for graceful shutdown in Http2UpgradeHandler's checkPauseState()

Currently, Apache Tomcat's HTTP/2 implementation follows the proposal
of the RFC 9113 to allow at least (and in Tomcat's case "exactly")
one round-trip-time (RTT) between sending the first GOAWAY with max stream
id
and the final GOAWAY with the last received stream id.

However, this duration may be too short in the case the client has already
buffered writes/requests for higher stream ids between receiving the first
GOAWAY and the final GOAWAY from Tomcat.

In this case, it may either locally generate a stream reset error due to
"refused stream" condition, or it may still send out this new stream which
then Tomcat's HTTP/2 implementation will reject.

In our concrete scenarion, we are using Tomcat 11 (via Spring Boot 3.5.7)
behind an Envoy proxy and configured HTTP/2 (h2c / prior knowledge) as the
protocol between both, so that Envoy talks h2c with Tomcat.

In the case of graceful shutdown of the Tomcat Server, and upon it sending
the two GOAWAYs (often only separated by a few microseconds), the client
is not fast enough to react to the last GOAWAY and forbidding sending new
requests for later streams which it might have already buffered for write.

This is the exact situation the Envoy proxy has (for its downstream clients)
introduced the "drain_timeout" property:
https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto#envoy-v3-api-field-extensions-filters-network-http-connection-manager-v3-httpconnectionmanager-drain-timeout

This allows to have a "grace" period between sending the first GOAWAY with
max stream id and the final GOAWAY with the last received stream id,
effectively allowing the client some leeway time to react to the first
GOAWAY
and stop creating more streams on that connection.

Especially, when the server is under high load, RTT times can fluctuate and
async packet processing in the client can be delayed by ever so few
nano-/microseconds.

It would be nice to have an additional property in Tomcat's HTTP/2
implementation
that exactly mirrors Envoy's drain_timeout here.
Effectively, this would need to be added here in
Http2UpgradeHandler.checkPauseState():

> if (pausedNanoTime + pingManager.getRoundTripTimeNano() + drainTimeout <
System.nanoTime()) {

The default should be 0, of course, leaving the current behaviour untouched.

Kind regards,

Kai Burjack

Reply via email to