[ 
https://issues.apache.org/jira/browse/TS-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3522:
-------------------------------
    Attachment: inactivity_crash.diff

After looking at a few more cores, we have a theory on the scenario and a patch 
attached here as inactivity_crash.diff.

Here is our scenario.  Client connection comes in.  A transaction is 
successfully completed.  The client has keep alive set.  So the VC and client 
session is saved, but the read and write structures are not cleared.  Another 
request comes in.  A read is set up immediately, but things stall before the 
write (server/cache response) is set up.  There are many posts in this 
workload, so stalling out during a longish post is feasible.  When the 
inactivity timer triggers, the write state for the first transaction is still 
laying around pointing to the now deleted first state machine.

The patch clears the write structure as the VC and client session are saved 
aside.


> Seg Fault due to inactivity_cop after lost continutation from 
> write_signal_and_update
> -------------------------------------------------------------------------------------
>
>                 Key: TS-3522
>                 URL: https://issues.apache.org/jira/browse/TS-3522
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Network
>            Reporter: Steven Feltner
>            Assignee: Alan M. Carroll
>             Fix For: 6.0.0
>
>         Attachments: inactivity_crash.diff
>
>
> (gdb) bt full
> #0  0x00000000006ec51e in handleEvent (event=105, vc=0x2b1c900461e0) at 
> ../../iocore/eventsystem/I_Continuation.h:146
> No locals.
> #1  write_signal_and_update (event=105, vc=0x2b1c900461e0) at 
> UnixNetVConnection.cc:154
> No locals.
> #2  0x00000000006ec837 in UnixNetVConnection::mainEvent (this=0x2b1c900461e0, 
> event=<value optimized out>, e=<value optimized out>) at 
> UnixNetVConnection.cc:1089
>         wlock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
>         signal_event = 105
>         next_activity_timeout_at = 0
>         t = 0x0
>         hlock = {m = {m_ptr = 0x1430c30}, lock_acquired = true}
>         rlock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
>         signal_timeout = 0x2b1c6b9ddc30
>         reader_cont = 0x0
>         writer_cont = 0x2b1d28051d48
>         signal_timeout_at = 0x2b1c900463f8
> #3  0x00000000006e5061 in handleEvent (this=0x14519d0, event=<value optimized 
> out>, e=0x15792d0) at ../../iocore/eventsystem/I_Continuation.h:146
> No locals.
> #4  InactivityCop::check_inactivity (this=0x14519d0, event=<value optimized 
> out>, e=0x15792d0) at UnixNet.cc:80
>         vc = 0x2b1c900461e0
>         lock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
>         now = 1428965697221995775
>         nh = 0x2b1c695bea30
>         __func__ = "check_inactivity"
> #5  0x000000000070f628 in handleEvent (this=0x2b1c695bb010, e=0x15792d0, 
> calling_code=2) at I_Continuation.h:146
> No locals.
> #6  EThread::process_event (this=0x2b1c695bb010, e=0x15792d0, calling_code=2) 
> at UnixEThread.cc:144
>         c_temp = 0x14519d0
>         lock = {m = {m_ptr = 0x1430c30}, lock_acquired = true}
> #7  0x00000000007101c1 in EThread::execute (this=0x2b1c695bb010) at 
> UnixEThread.cc:223
>         done_one = true
>         e = <value optimized out>
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x1579330}, 
> tail = 0x1579330}
>         next_time = 1428963217761407178
> #8  0x000000000070ea52 in spawn_thread_internal (a=0x144a330) at Thread.cc:88
>         p = 0x144a330
> #9  0x000000383e8079d1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #10 0x000000383e0e88fd in clone () from /lib64/libc.so.6
> No symbol table info available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to