[ 
https://issues.apache.org/jira/browse/TS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3486:
-------------------------------
    Attachment: ts-3266-complete.diff

ts-3266-complete.diff replaces the other ts-3266*.diff files on this bug and on 
ts-3266.

I ran this overnight with a dev build on a production box.  I switched over to 
a release build this morning and it has been running for 5 hours on that build.

I think the underlying issue is due to shared server sessions moving between 
threads.  If you are opening a new server session or you are doing thread-local 
server session sharing, the IO events for the client session and the server 
session will always be on the same thread, so we can be a bit lax in our 
concurrency control for events on the same state machine.

Once you introduce global server session sharing, the server session network 
events may be handled on different threads than user agent events.  In 
particular, this is problematic when bringing a session out of the server 
session pool.  Thread 1 may be re-activating the server session for a new State 
Machine while Thread 2 is handing a server initiated network event.

In this latest patch, we deal with this potential read condition by holding 
both the pool lock and the state machine lock while canceling the old pool read 
and setting up the new state machine read.  As long as the read.vio is updated 
atomically, then the parallel server network event will be protected by a 
consistent read.vio lock.

Eventually we may want to look at a means to migrate a server session to 
another thread (or reactivate it in another thread), as a means to simplify our 
locking model.  But I think this solution solves the immediate problem.

> Segfault in do_io_write with plugin (??)
> ----------------------------------------
>
>                 Key: TS-3486
>                 URL: https://issues.apache.org/jira/browse/TS-3486
>             Project: Traffic Server
>          Issue Type: Bug
>    Affects Versions: 5.2.0, 5.3.0
>            Reporter: Qiang Li
>            Assignee: Phil Sorber
>              Labels: crash
>             Fix For: sometime
>
>         Attachments: ts-3266-2.diff, ts-3266-complete.diff, 
> ts3486-ptrace.txt.gz
>
>
> {code}
> (gdb) bt
> #0  0x00000000005bdb8b in HttpServerSession::do_io_write (this=<value 
> optimized out>, c=0x2aaadccc4bf0, nbytes=576, buf=0x2aaafc2ffee8, owner=false)
>     at HttpServerSession.cc:104
> #1  0x00000000005acc1d in HttpSM::setup_server_send_request 
> (this=0x2aaadccc4bf0) at HttpSM.cc:5686
> #2  0x00000000005b3f85 in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1520
> #3  0x00000000005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=60000, data=0x0) at HttpSM.cc:1455
> #4  0x00000000005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, 
> event=60000, data=0x0) at HttpSM.cc:1275
> #5  0x00000000004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, 
> event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614
> #6  0x00002ba118441c89 in cachefun (contp=<value optimized out>, event=<value 
> optimized out>, edata=0x2aaadccc4bf0) at main.cpp:1876
> #7  0x00000000005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=<value optimized out>, data=<value optimized out>) at HttpSM.cc:1381
> #8  0x00000000005b627d in HttpSM::do_http_server_open (this=0x2aaadccc4bf0, 
> raw=<value optimized out>) at HttpSM.cc:4639
> #9  0x00000000005baa04 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:7021
> #10 0x00000000005b25a3 in HttpSM::state_cache_open_write 
> (this=0x2aaadccc4bf0, event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2442
> #11 0x00000000005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, 
> event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2554
> #12 0x000000000059338a in handleEvent (this=0x2aaadccc6618, event=<value 
> optimized out>, data=0x2aab1c3b6800) at 
> ../../iocore/eventsystem/I_Continuation.h:145
> #13 HttpCacheSM::state_cache_open_write (this=0x2aaadccc6618, event=<value 
> optimized out>, data=0x2aab1c3b6800) at HttpCacheSM.cc:167
> #14 0x0000000000697223 in handleEvent (this=0x2aab1c3b6800, event=<value 
> optimized out>) at ../../iocore/eventsystem/I_Continuation.h:145
> #15 CacheVC::callcont (this=0x2aab1c3b6800, event=<value optimized out>) at 
> ../../iocore/cache/P_CacheInternal.h:662
> #16 0x0000000000715940 in Cache::open_write (this=<value optimized out>, 
> cont=<value optimized out>, key=0x2ba0ff762d70, info=<value optimized out>, 
>     apin_in_cache=46914401429576, type=CACHE_FRAG_TYPE_HTTP, 
>     hostname=0x2aaadd281078 
> "www.mifangba.comhttpapi.phpwww.mifangba.comhttp://www.mifangba.com/api.php?op=count&id=4&modelid=12";,
>  host_len=16) at CacheWrite.cc:1788
> #17 0x00000000006e5765 in open_write (this=<value optimized out>, 
> cont=0x2aaadccc6618, expected_size=<value optimized out>, url=0x2aaadccc5310, 
>     cluster_cache_local=<value optimized out>, request=<value optimized out>, 
> old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
> P_CacheInternal.h:1093
> #18 CacheProcessor::open_write (this=<value optimized out>, 
> cont=0x2aaadccc6618, expected_size=<value optimized out>, url=0x2aaadccc5310, 
>     cluster_cache_local=<value optimized out>, request=<value optimized out>, 
> old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3622
> #19 0x00000000005936f0 in HttpCacheSM::open_write (this=0x2aaadccc6618, 
> url=<value optimized out>, request=<value optimized out>, old_info=<value 
> optimized out>, 
>     pin_in_cache=<value optimized out>, retry=<value optimized out>, 
> allow_multiple=false) at HttpCacheSM.cc:298
> #20 0x00000000005a022e in HttpSM::do_cache_prepare_action 
> (this=0x2aaadccc4bf0, c_sm=0x2aaadccc6618, object_read_info=0x0, retry=true, 
> allow_multiple=false) at HttpSM.cc:4511
> #21 0x00000000005babd9 in do_cache_prepare_write (this=0x2aaadccc4bf0) at 
> HttpSM.cc:4436
> #22 HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:7098
> #23 0x00000000005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1517
> #24 0x00000000005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=0, data=0x0) at HttpSM.cc:1455
> #25 0x00000000005ba712 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:6876
> #26 0x00000000005ba702 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:6919
> #27 0x00000000005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1517
> #28 0x00000000005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=60000, data=0x0) at HttpSM.cc:1455
> #29 0x00000000005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, 
> event=60000, data=0x0) at HttpSM.cc:1275
> #30 0x00000000004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, 
> event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614
> #31 0x00002ba11844215d in service_from_this (contp=<value optimized out>, 
> event=<value optimized out>, edata=0x2aaadccc4bf0) at main.cpp:1498
> #32 cachefun (contp=<value optimized out>, event=<value optimized out>, 
> edata=0x2aaadccc4bf0) at main.cpp:1811
> #33 0x00000000005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=<value optimized out>, data=<value optimized out>) at HttpSM.cc:1381
> #34 0x00000000005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, 
> event=60000, data=0x0) at HttpSM.cc:1275
> #35 0x00000000004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, 
> event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614
> #36 0x00002ba1070e3275 in refresh_hook (contp=<value optimized out>, 
> event=<value optimized out>, edata=0x2aaadccc4bf0) at main.cpp:533
> #37 0x00000000005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=<value optimized out>, data=<value optimized out>) at HttpSM.cc:1381
> #38 0x00000000005b7a68 in do_api_callout (this=0x2aaadccc4bf0, event=1103, 
> data=0xffffffffffffb050) at HttpSM.cc:442
> #39 setup_cache_lookup_complete_api (this=0x2aaadccc4bf0, event=1103, 
> data=0xffffffffffffb050) at HttpSM.cc:2451
> #40 HttpSM::state_cache_open_read (this=0x2aaadccc4bf0, event=1103, 
> data=0xffffffffffffb050) at HttpSM.cc:2512
> ---Type <return> to continue, or q <return> to quit---
> #41 0x00000000005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, 
> event=1103, data=0xffffffffffffb050) at HttpSM.cc:2554
> #42 0x0000000000593a42 in handleEvent (this=0x2aaadccc6618, event=1103, 
> data=0xffffffffffffb050) at ../../iocore/eventsystem/I_Continuation.h:145
> #43 HttpCacheSM::state_cache_open_read (this=0x2aaadccc6618, event=1103, 
> data=0xffffffffffffb050) at HttpCacheSM.cc:131
> #44 0x0000000000703cae in Cache::open_read (this=<value optimized out>, 
> cont=0x2aaadccc6618, key=<value optimized out>, request=0x2aaadccc52f8, 
> params=0x2aaadccc4cd0, 
>     type=<value optimized out>, hostname=0x2aaadd281078 
> "www.mifangba.comhttpapi.phpwww.mifangba.comhttp://www.mifangba.com/api.php?op=count&id=4&modelid=12";,
>  host_len=16)
>     at CacheRead.cc:149
> #45 0x00000000006e153d in open_read (this=<value optimized out>, 
> cont=0x2aaadccc6618, url=0x2aaadccc5310, cluster_cache_local=<value optimized 
> out>, request=0x2aaadccc52f8, 
>     params=0x2aaadccc4cd0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
> P_CacheInternal.h:1074
> #46 CacheProcessor::open_read (this=<value optimized out>, 
> cont=0x2aaadccc6618, url=0x2aaadccc5310, cluster_cache_local=<value optimized 
> out>, request=0x2aaadccc52f8, 
>     params=0x2aaadccc4cd0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
> Cache.cc:3598
> #47 0x00000000005934f4 in do_cache_open_read (this=<value optimized out>, 
> url=<value optimized out>, hdr=<value optimized out>, params=<value optimized 
> out>, 
>     pin_in_cache=<value optimized out>) at HttpCacheSM.cc:211
> #48 HttpCacheSM::open_read (this=<value optimized out>, url=<value optimized 
> out>, hdr=<value optimized out>, params=<value optimized out>, 
> pin_in_cache=<value optimized out>)
>     at HttpCacheSM.cc:243
> #49 0x00000000005a0533 in HttpSM::do_cache_lookup_and_read 
> (this=0x2aaadccc4bf0) at HttpSM.cc:4389
> #50 0x00000000005bac3f in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:6986
> #51 0x00000000005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1517
> #52 0x00000000005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=0, data=0x0) at HttpSM.cc:1455
> #53 0x00000000005ba712 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:6876
> #54 0x000000000059fa6e in HttpSM::state_remap_request (this=0x2aaadccc4bf0, 
> event=63002) at HttpSM.cc:3880
> #55 0x00000000005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, 
> event=63002, data=0x0) at HttpSM.cc:2554
> #56 0x000000000061b15a in handleEvent (this=0x2aaadcfc8080, event=1, 
> e=0x2aaaf9350d80) at ../../../iocore/eventsystem/I_Continuation.h:145
> #57 RemapPlugins::run_remap (this=0x2aaadcfc8080, event=1, e=0x2aaaf9350d80) 
> at RemapPlugins.cc:179
> #58 0x0000000000765165 in handleEvent (this=0x2ba0ff262010, e=0x2aaaf9350d80, 
> calling_code=1) at I_Continuation.h:145
> #59 EThread::process_event (this=0x2ba0ff262010, e=0x2aaaf9350d80, 
> calling_code=1) at UnixEThread.cc:128
> #60 0x00000000007659fb in EThread::execute (this=0x2ba0ff262010) at 
> UnixEThread.cc:179
> #61 0x00000000007645aa in spawn_thread_internal (a=0x2919020) at Thread.cc:85
> #62 0x00000035966079d1 in start_thread () from /lib64/libpthread.so.0
> #63 0x0000003595ee88fd in clone () from /lib64/libc.so.6
> (gdb) f 0
> #0  0x00000000005bdb8b in HttpServerSession::do_io_write (this=<value 
> optimized out>, c=0x2aaadccc4bf0, nbytes=576, buf=0x2aaafc2ffee8, owner=false)
>     at HttpServerSession.cc:104
> 104     return server_vc->do_io_write(c, nbytes, buf, owner);
> (gdb) l
> 99    }
> 100   
> 101   VIO *
> 102   HttpServerSession::do_io_write(Continuation *c, int64_t nbytes, 
> IOBufferReader *buf, bool owner)
> 103   {
> 104     return server_vc->do_io_write(c, nbytes, buf, owner);
> 105   }
> 106   
> 107   void
> 108   HttpServerSession::do_io_shutdown(ShutdownHowTo_t howto)
> (gdb) p server_vc
> value has been optimized out
> (gdb) f 1
> #1  0x00000000005acc1d in HttpSM::setup_server_send_request 
> (this=0x2aaadccc4bf0) at HttpSM.cc:5686
> 5686    server_entry->write_vio = server_entry->vc->do_io_write(this, 
> hdr_length, buf_start);
> (gdb) l
> 5681                                                 
> MIME_PRESENCE_WWW_AUTHENTICATE)) {
> 5682      DebugSM("http_ss", "Setting server session to private for 
> authorization header");
> 5683      set_server_session_private(true);
> 5684    }
> 5685    milestones.server_begin_write = ink_get_hrtime();
> 5686    server_entry->write_vio = server_entry->vc->do_io_write(this, 
> hdr_length, buf_start);
> 5687  }
> 5688  
> 5689  void
> 5690  HttpSM::setup_server_read_response_header()
> (gdb) p server_entry
> $1 = (HttpVCTableEntry *) 0x0
> (gdb) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to