shinrich opened a new issue #8163:
URL: https://github.com/apache/trafficserver/issues/8163


   While testing Http/2 to origin with another ATS box acting as the origin, 
the setting proxy.config.http2.stream_error_rate_threshold is causing problems. 
 The test scenario is below.  I am running HTTP/2 to origin code on ATS1.  ATS2 
is running 9.0 with proxy.config.http2.stream_error_rate_threshold set to the 
default of 0.1
   
   ```
   UA -> ATS1 -> ATS2 -> O
   ```
   
   In my tests, I see that one of the origins directed to ATS2 has a large 
increase in ERR_CONN_FAIL once we turn on HTTP/2 to origin.  Looking at 
diags.log on ATS2, I do not see any HTTP/2 error messages for the client IP 
corresponding to ATS1, so this means we must be hitting the failure case in 
rcv_data_frame in Http2ConnectionState.cc shown below.  The msg argument is 
null so, no message will show up in diags.log.
   
   ```
   return Http2Error(Http2ErrorClass::HTTP2_ERROR_CLASS_STREAM, 
Http2ErrorCode::HTTP2_ERROR_STREAM_CLOSED, nullptr);
   ```
   In this case, ATS2 has received a data frame on a stream that has already 
been deleted.  Analogous to sending a TCP reset when receiving a packet for a 
socket that has been closed.  Seems to happen pretty frequently in the HTTP/2 
inbound case.  Thus the removal of the message..
   
   In my test case, all the communication to the problem domain is POST 
traffic.  And the final origin (O) is known to have resource problems, so 
sometimes the connection to O will fail due to a timeout.  In this scenario, 
ATS2 will return a 504 header immediately and not read the post body.
   
   With the handing early header fix (PR #7976), ATS1 will process the response 
header and give up on sending the post body.  However, some post body frames 
will be sent to ATS2 before the 504 response header is returned.  It is likely 
that a data frame and response header will cross in on the wire, resulting in a 
data frame being sent to ATS2 after the stream has been closed.
   
   Once enough errors happen, a GOAWAY frame will be sent with no error code 
set.  Either we process the GOAWAY frame to immediately close all active 
streams on the session (which in my scenarios was 5-20 streams or more), or you 
stop adding new streams to the session but leave the existing streams to 
finish.  Closing the active streams greatly increased the number of 
ERR_CONN_FAIL's.  Leaving the streams greatly increased the number of ABORTS 
and TIMEOUTS.
   
   I plan on putting up two PRs.  One that increases the default for 
proxy.config.http2.stream_error_rate_threshold from 0.1 to 0.5 (which we is 
what we are running on the ATS1 box).  Different group manages the ATS2 box.  
Another PR that does not include the STREAM_CLOSED reset from rcv_data_from in 
the error code used to calculate the error rate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to