masaori335 opened a new pull request, #12367: URL: https://github.com/apache/trafficserver/pull/12367
We faced a crash with Http2Stream is in the Half-Closed (local) state. It's highly possible that is introduced by recent HTTP/2 code change https://github.com/apache/trafficserver/commit/c3a40a79111818eb7be3a5805b2f7ebdb0fd59f2. ``` #0 std::__1::__cxx_atomic_fetch_add[abi:v160006]<int>(std::__1::__cxx_atomic_base_impl<int>*, int, std::__1::memory_order) (__a=0xed08c1e923056031, __delta=1, __order=std::__1::memory_order::seq_cst) at /bin/../include/c++/v1/atomic:1014 #1 std::__1::__atomic_base<int, true>::fetch_add[abi:v160006](int, std::__1::memory_order) (this=0xed08c1e923056031, __op=1, __m=std::__1::memory_order::seq_cst) at /bin/../include/c++/v1/atomic:1649 #2 std::__1::__atomic_base<int, true>::operator++[abi:v160006]() (this=0xed08c1e923056031) at /bin/../include/c++/v1/atomic:1686 #3 RefCountObj::refcount_inc (this=0xed08c1e923056029) at /include/tscore/Ptr.h:58 #4 Ptr<ProxyMutex>::Ptr (this=0x7feadc7fb380, src=...) at /include/tscore/Ptr.h:204 #5 MutexTryLock::MutexTryLock (this=0x7feadc7fb380, am=..., t=0x7feae81c5400) at /include/iocore/eventsystem/Lock.h:590 #6 Http2Stream::signal_read_event (this=0x7fd6c15fd200, event=100) at /src/proxy/http2/Http2Stream.cc:908 #7 0x00005627de0da5f9 in Http2ConnectionState::rcv_data_frame (this=0x7f9686b15f20, frame=...) at /src/proxy/http2/Http2ConnectionState.cc:271 #8 0x00005627de0e34d7 in Http2ConnectionState::rcv_frame (this=0x7f9686b15f20, frame=0x7feadc7fb4d8) at /src/proxy/http2/Http2ConnectionState.cc:1470 #9 0x00005627de0d808e in Http2CommonSession::do_complete_frame_read (this=0x7f9686b15f18) at /src/proxy/http2/Http2CommonSession.cc:338 #10 0x00005627de0d7381 in Http2CommonSession::do_process_frame_read (this=0x7f9686b15f18, vio=0x7fe15da833e0, inside_frame=<optimized out>) at /src/proxy/http2/Http2CommonSession.cc:400 #11 0x00005627de0d70e3 in Http2CommonSession::state_start_frame_read (this=0x7f9686b15f18, event=100, edata=0x7fe15da833e0) at /src/proxy/http2/Http2CommonSession.cc:249 #12 0x00005627de0f1a29 in Http2ClientSession::main_event_handler (this=0x7f9686b15c00, event=2259, edata=0x7fea1aa2d2c0) at /src/proxy/http2/Http2ClientSession.cc:192 #13 0x00005627de2e9d75 in Continuation::handleEvent (this=<optimized out>, event=2259, data=0x7fea1aa2d2c0) at /include/iocore/eventsystem/Continuation.h:236 #14 EThread::process_event (this=0x7feae81c5400, e=0x7fea1aa2d2c0, calling_code=2259) at /src/iocore/eventsystem/UnixEThread.cc:162 #15 0x00005627de2ea66d in EThread::execute_regular (this=0x7feae81c5400) at /src/iocore/eventsystem/UnixEThread.cc:269 #16 0x00005627de2eae34 in EThread::execute (this=0x7feae81c5400) at /src/iocore/eventsystem/UnixEThread.cc:348 #17 0x00005627de2e8ca7 in spawn_thread_internal (a=0x7feaee0d5f50) at /src/iocore/eventsystem/Thread.cc:75 #18 0x00007feaee48a19a in pthread_getattr_np@GLIBC_2.2.5 () from /lib64/libc.so.6 #19 0x00007feaee50f210 in msgctl@@GLIBC_2.2.5 () from /lib64/libc.so.6 #20 0x0000000000000000 in ?? () ``` # Problem It looks like this happens when a Http2Stream is in the half-closed (local) state and `Http2Stream::transaction_done()` is invoked. There is a race of setting a mutex `nullptr` vs checking it. On frame 6, a). `read_vio.cont->mutex.m_ptr` is invalid and b). `_sm` is `nullptr`. (Note: `read_vio.cont` is `HttpSM *` here) ``` (gdb) frame 6 #6 Http2Stream::signal_read_event (this=0x7fd6c15fd200, event=100) at /src/proxy/http2/Http2Stream.cc:908 908 MUTEX_TRY_LOCK(lock, read_vio.cont->mutex, this_ethread()); (gdb) p read_vio.cont->mutex $1 = {m_ptr = 0xed08c1e923056029} (gdb) p this->_sm $2 = (HttpSM *) 0x0 ``` In the beginning of `Http2Stream::signal_read_event`, there is a nullptr check of `this->read_vio.cont->mutex` but not for `_sm`. https://github.com/apache/trafficserver/blob/27478a7ffea60dfc1d7ad27bf1cb4fd2413f0943/src/proxy/http2/Http2Stream.cc#L901-L907 For the nullptr check of `this->read_vio.cont->mutex`, this doesn't work in some cases, because it does race with `HttpSM::cleanup` freeing it and setting it `nullptr`. https://github.com/apache/trafficserver/blob/27478a7ffea60dfc1d7ad27bf1cb4fd2413f0943/include/tscore/Ptr.h#L256-L266 # Fix This PR does below 1. Disable read_vio to avoid calling `Http2Stream::signal_read_event` after `Http2Stream::transaction_done` is called 2. Add nullptr check of `_sm` in `Http2Stream::signal_read_event` for safety Same for `write_vio` and `signal_write_event` for consistency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
