masaori335 opened a new pull request, #12367:
URL: https://github.com/apache/trafficserver/pull/12367

   We faced a crash with Http2Stream is in the Half-Closed (local) state. It's 
highly possible that is introduced by recent HTTP/2 code change 
https://github.com/apache/trafficserver/commit/c3a40a79111818eb7be3a5805b2f7ebdb0fd59f2.
   
   ```
   #0  
std::__1::__cxx_atomic_fetch_add[abi:v160006]<int>(std::__1::__cxx_atomic_base_impl<int>*,
 int, std::__1::memory_order) (__a=0xed08c1e923056031, __delta=1, 
__order=std::__1::memory_order::seq_cst) at /bin/../include/c++/v1/atomic:1014
   #1  std::__1::__atomic_base<int, true>::fetch_add[abi:v160006](int, 
std::__1::memory_order) (this=0xed08c1e923056031, __op=1, 
__m=std::__1::memory_order::seq_cst) at /bin/../include/c++/v1/atomic:1649
   #2  std::__1::__atomic_base<int, true>::operator++[abi:v160006]() 
(this=0xed08c1e923056031) at /bin/../include/c++/v1/atomic:1686
   #3  RefCountObj::refcount_inc (this=0xed08c1e923056029) at 
/include/tscore/Ptr.h:58
   #4  Ptr<ProxyMutex>::Ptr (this=0x7feadc7fb380, src=...) at 
/include/tscore/Ptr.h:204
   #5  MutexTryLock::MutexTryLock (this=0x7feadc7fb380, am=..., 
t=0x7feae81c5400) at /include/iocore/eventsystem/Lock.h:590
   #6  Http2Stream::signal_read_event (this=0x7fd6c15fd200, event=100) at 
/src/proxy/http2/Http2Stream.cc:908
   #7  0x00005627de0da5f9 in Http2ConnectionState::rcv_data_frame 
(this=0x7f9686b15f20, frame=...) at /src/proxy/http2/Http2ConnectionState.cc:271
   #8  0x00005627de0e34d7 in Http2ConnectionState::rcv_frame 
(this=0x7f9686b15f20, frame=0x7feadc7fb4d8) at 
/src/proxy/http2/Http2ConnectionState.cc:1470
   #9  0x00005627de0d808e in Http2CommonSession::do_complete_frame_read 
(this=0x7f9686b15f18) at /src/proxy/http2/Http2CommonSession.cc:338
   #10 0x00005627de0d7381 in Http2CommonSession::do_process_frame_read 
(this=0x7f9686b15f18, vio=0x7fe15da833e0, inside_frame=<optimized out>) at 
/src/proxy/http2/Http2CommonSession.cc:400
   #11 0x00005627de0d70e3 in Http2CommonSession::state_start_frame_read 
(this=0x7f9686b15f18, event=100, edata=0x7fe15da833e0) at 
/src/proxy/http2/Http2CommonSession.cc:249
   #12 0x00005627de0f1a29 in Http2ClientSession::main_event_handler 
(this=0x7f9686b15c00, event=2259, edata=0x7fea1aa2d2c0) at 
/src/proxy/http2/Http2ClientSession.cc:192
   #13 0x00005627de2e9d75 in Continuation::handleEvent (this=<optimized out>, 
event=2259, data=0x7fea1aa2d2c0) at 
/include/iocore/eventsystem/Continuation.h:236
   #14 EThread::process_event (this=0x7feae81c5400, e=0x7fea1aa2d2c0, 
calling_code=2259) at /src/iocore/eventsystem/UnixEThread.cc:162
   #15 0x00005627de2ea66d in EThread::execute_regular (this=0x7feae81c5400) at 
/src/iocore/eventsystem/UnixEThread.cc:269
   #16 0x00005627de2eae34 in EThread::execute (this=0x7feae81c5400) at 
/src/iocore/eventsystem/UnixEThread.cc:348
   #17 0x00005627de2e8ca7 in spawn_thread_internal (a=0x7feaee0d5f50) at 
/src/iocore/eventsystem/Thread.cc:75
   #18 0x00007feaee48a19a in pthread_getattr_np@GLIBC_2.2.5 () from 
/lib64/libc.so.6
   #19 0x00007feaee50f210 in msgctl@@GLIBC_2.2.5 () from /lib64/libc.so.6
   #20 0x0000000000000000 in ?? ()
   ```
   
   # Problem
   It looks like this happens when a Http2Stream is in the half-closed (local) 
state and `Http2Stream::transaction_done()` is invoked. There is a race of 
setting a mutex `nullptr` vs checking it.
   
   On frame 6, a). `read_vio.cont->mutex.m_ptr` is invalid and b). `_sm` is 
`nullptr`. (Note: `read_vio.cont` is `HttpSM *` here)
   ```
   (gdb) frame 6
   #6  Http2Stream::signal_read_event (this=0x7fd6c15fd200, event=100) at 
/src/proxy/http2/Http2Stream.cc:908
   908    MUTEX_TRY_LOCK(lock, read_vio.cont->mutex, this_ethread());
   (gdb) p read_vio.cont->mutex
   $1 = {m_ptr = 0xed08c1e923056029}
   (gdb) p this->_sm
   $2 = (HttpSM *) 0x0
   ```
   
   In the beginning of `Http2Stream::signal_read_event`, there is a nullptr 
check of `this->read_vio.cont->mutex` but not for  `_sm`. 
   
https://github.com/apache/trafficserver/blob/27478a7ffea60dfc1d7ad27bf1cb4fd2413f0943/src/proxy/http2/Http2Stream.cc#L901-L907
   
   For the nullptr check of  `this->read_vio.cont->mutex`, this doesn't work in 
some cases, because it does race with `HttpSM::cleanup` freeing it and setting 
it `nullptr`. 
   
https://github.com/apache/trafficserver/blob/27478a7ffea60dfc1d7ad27bf1cb4fd2413f0943/include/tscore/Ptr.h#L256-L266
   
   # Fix 
   This PR does below
   
   1. Disable read_vio to avoid calling `Http2Stream::signal_read_event` after 
`Http2Stream::transaction_done` is called
   2. Add nullptr check of `_sm` in `Http2Stream::signal_read_event` for safety
   
   Same for `write_vio` and `signal_write_event` for consistency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to