shinrich opened a new issue #8079:
URL: https://github.com/apache/trafficserver/issues/8079


   In our environment, we started seeing crashes with this stack.
   
   ```
   #0  0x00002af8ef890387 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:55
   #1  0x00002af8ef891a78 in __GI_abort () at abort.c:90
   #2  0x00002af8ecdeb63b in ink_abort 
(message_format=message_format@entry=0x2af8ece0dc47 "%s:%d: failed assertion 
`%s`") at ../../../../../../_scm/trafficserver9.1/src/tscore/ink_error.cc:99
   #3  0x00002af8ecde7bb5 in _ink_assert (expression=expression@entry=0x7e4b6a 
"0", file=file@entry=0x7f9c50 
"../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", 
line=line@entry=1418)
       at ../../../../../../_scm/trafficserver9.1/src/tscore/ink_assert.cc:37
   #4  0x000000000056d96d in 
HttpSM::state_common_wait_for_transform_read(HttpTransformInfo*, int 
(HttpSM::*)(int, void*), int, void*) () at 
../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc:1418
   #5  0x000000000056da74 in 
HttpSM::state_response_wait_for_transform_read(int, void*) () at 
../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc:1329
   #6  0x000000000057e73b in HttpSM::main_handler(int, void*) () at 
../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc:2742
   #7  0x000000000079840e in read_signal_and_update(int, UnixNetVConnection*) 
() at 
../../../../../../_scm/trafficserver9.1/iocore/net/UnixNetVConnection.cc:83
   #8  0x00000000007988bb in UnixNetVConnection::mainEvent(int, Event*) () at 
../../../../../../_scm/trafficserver9.1/iocore/net/UnixNetVConnection.cc:1160
   #9  0x000000000078d47b in InactivityCop::check_inactivity 
(this=0x2af90a8cb000, event=<optimized out>, e=<optimized out>) at 
../../../../../../_scm/trafficserver9.1/iocore/net/UnixNet.cc:100
   #10 0x00000000007da0dd in handleEvent (data=0x2af90a800060, event=2, 
this=0x2af90a8cb000) at 
../../../../../../_scm/trafficserver9.1/iocore/eventsystem/I_Continuation.h:219
   #11 handleEvent (data=0x2af90a800060, event=2, this=0x2af90a8cb000) at 
../../../../../../_scm/trafficserver9.1/iocore/eventsystem/I_Continuation.h:215
   #12 EThread::process_event(Event*, int) () at 
../../../../../../_scm/trafficserver9.1/iocore/eventsystem/UnixEThread.cc:164
   #13 0x00000000007dae04 in EThread::execute_regular 
(this=this@entry=0x2af8f53b0e00) at 
../../../../../../_scm/trafficserver9.1/iocore/eventsystem/UnixEThread.cc:273
   #14 0x00000000007db182 in execute (this=0x2af8f53b0e00) at 
../../../../../../_scm/trafficserver9.1/iocore/eventsystem/UnixEThread.cc:364
   #15 EThread::execute (this=0x2af8f53b0e00) at 
../../../../../../_scm/trafficserver9.1/iocore/eventsystem/UnixEThread.cc:342
   #16 0x00000000007d97ca in spawn_thread_internal (a=0x2af8f08e0d40) at 
../../../../../../_scm/trafficserver9.1/iocore/eventsystem/Thread.cc:92
   #17 0x00002af8eec22ea5 in start_thread (arg=0x2af8fc703700) at 
pthread_create.c:307
   #18 0x00002af8ef9588dd in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
   ```
   
   The end of the HttpSM history contains the following
   ```
   {location = {file = 0x7f9c50 
"../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", 
           func = 0x7fd680 
<HttpSM::setup_server_transfer_to_transform()::__FUNCTION__> 
"setup_server_transfer_to_transform", line = 6666}, event = 34463, reentrancy = 
6}, {location = {
           file = 0x7f9c50 
"../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", 
           func = 0x7fdaf0 <HttpSM::tunnel_handler_transform_write(int, 
HttpTunnelConsumer*)::__FUNCTION__> "tunnel_handler_transform_write", line = 
3954}, event = 103, reentrancy = 6}, {location = {
           file = 0x7f9c50 
"../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", func = 0x7fdc80 
<HttpSM::tunnel_handler_server(int, HttpTunnelProducer*)::__FUNCTION__> 
"tunnel_handler_server", 
           line = 3063}, event = 102, reentrancy = 6}, {location = {file = 
0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", 
           func = 0x7fe000 <HttpSM::state_response_wait_for_transform_read(int, 
void*)::__FUNCTION__> "state_response_wait_for_transform_read", line = 1314}, 
event = 2301, reentrancy = 7}, {location = {
           file = 0x7f9c50 
"../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", 
           func = 0x7fdfc0 
<HttpSM::state_common_wait_for_transform_read(HttpTransformInfo*, int 
(HttpSM::*)(int, void*), int, void*)::__FUNCTION__> 
"state_common_wait_for_transform_read", line = 1344}, 
         event = 2301, reentrancy = 7}, {location = {file = 0x7f9c50 
"../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", 
           func = 0x7fe000 <HttpSM::state_response_wait_for_transform_read(int, 
void*)::__FUNCTION__> "state_response_wait_for_transform_read", line = 1314}, 
event = 106, reentrancy = 1}, {location = {
           file = 0x7f9c50 
"../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", 
           func = 0x7fdfc0 
<HttpSM::state_common_wait_for_transform_read(HttpTransformInfo*, int 
(HttpSM::*)(int, void*), int, void*)::__FUNCTION__> 
"state_common_wait_for_transform_read", line = 1344}, 
         event = 106, reentrancy = 1}, 
   ```
   
   You see that the tunnel was set up.  The consumer got a write complete, the 
producer got a read complete and tunnel done was sent to the final handler.  
Then inactivity timeout (106) is sent to the default handler.  In our 
environment the outbound active timeout is set to 14 minutes.
   
   Digging into the core a bit more, we see that the problem requests were GET 
requests.  The respond body is 0 bytes long (correctly reported by the 
content-length).  In our environment a transform plugin (for most of the cases 
this was txn_box) was installed to adjust the response being sent to the user 
agent.  But from the history, the TRANSFORM_READ_READY is never sent.  Instead 
things timeout and the HttpSM is in a bad state for the particularly handler 
left.  (I will file a separate issue about not handling a 0 length body for a 
write transform).
   
   @SolidWallOfCode was able to reproduce this in a txn_box autest.  We will 
try to get a test together to exercise this scenario using one of the 
transforms in the trafficserver source tree (null or buffererd null).
   
   We were able to get around the assert crash by clearing the active timeout 
when ProxyTransaction::release is called on server_txn.  I think this issue 
arose when PR #7849.  We are running with that change in our 9.1.x branch.  
With that change, the server_txn is not immediately delivered to the 
HttpSessionManager when release() is called.  Rather the actual transfer of 
control is delayed until the HttpSM is deleted.
   
   The change is to apply the keepalive timeout and clear the active timeout 
when release() is called.  The active timeout makes no sense when the server 
session is in the keep alive pool.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to