[ https://issues.apache.org/jira/browse/TS-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Susan Hinrichs updated TS-3522: ------------------------------- Attachment: inactivity_crash.diff After looking at a few more cores, we have a theory on the scenario and a patch attached here as inactivity_crash.diff. Here is our scenario. Client connection comes in. A transaction is successfully completed. The client has keep alive set. So the VC and client session is saved, but the read and write structures are not cleared. Another request comes in. A read is set up immediately, but things stall before the write (server/cache response) is set up. There are many posts in this workload, so stalling out during a longish post is feasible. When the inactivity timer triggers, the write state for the first transaction is still laying around pointing to the now deleted first state machine. The patch clears the write structure as the VC and client session are saved aside. > Seg Fault due to inactivity_cop after lost continutation from > write_signal_and_update > ------------------------------------------------------------------------------------- > > Key: TS-3522 > URL: https://issues.apache.org/jira/browse/TS-3522 > Project: Traffic Server > Issue Type: Bug > Components: Network > Reporter: Steven Feltner > Assignee: Alan M. Carroll > Fix For: 6.0.0 > > Attachments: inactivity_crash.diff > > > (gdb) bt full > #0 0x00000000006ec51e in handleEvent (event=105, vc=0x2b1c900461e0) at > ../../iocore/eventsystem/I_Continuation.h:146 > No locals. > #1 write_signal_and_update (event=105, vc=0x2b1c900461e0) at > UnixNetVConnection.cc:154 > No locals. > #2 0x00000000006ec837 in UnixNetVConnection::mainEvent (this=0x2b1c900461e0, > event=<value optimized out>, e=<value optimized out>) at > UnixNetVConnection.cc:1089 > wlock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true} > signal_event = 105 > next_activity_timeout_at = 0 > t = 0x0 > hlock = {m = {m_ptr = 0x1430c30}, lock_acquired = true} > rlock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true} > signal_timeout = 0x2b1c6b9ddc30 > reader_cont = 0x0 > writer_cont = 0x2b1d28051d48 > signal_timeout_at = 0x2b1c900463f8 > #3 0x00000000006e5061 in handleEvent (this=0x14519d0, event=<value optimized > out>, e=0x15792d0) at ../../iocore/eventsystem/I_Continuation.h:146 > No locals. > #4 InactivityCop::check_inactivity (this=0x14519d0, event=<value optimized > out>, e=0x15792d0) at UnixNet.cc:80 > vc = 0x2b1c900461e0 > lock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true} > now = 1428965697221995775 > nh = 0x2b1c695bea30 > __func__ = "check_inactivity" > #5 0x000000000070f628 in handleEvent (this=0x2b1c695bb010, e=0x15792d0, > calling_code=2) at I_Continuation.h:146 > No locals. > #6 EThread::process_event (this=0x2b1c695bb010, e=0x15792d0, calling_code=2) > at UnixEThread.cc:144 > c_temp = 0x14519d0 > lock = {m = {m_ptr = 0x1430c30}, lock_acquired = true} > #7 0x00000000007101c1 in EThread::execute (this=0x2b1c695bb010) at > UnixEThread.cc:223 > done_one = true > e = <value optimized out> > NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x1579330}, > tail = 0x1579330} > next_time = 1428963217761407178 > #8 0x000000000070ea52 in spawn_thread_internal (a=0x144a330) at Thread.cc:88 > p = 0x144a330 > #9 0x000000383e8079d1 in start_thread () from /lib64/libpthread.so.0 > No symbol table info available. > #10 0x000000383e0e88fd in clone () from /lib64/libc.so.6 > No symbol table info available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)