[ 
https://issues.apache.org/jira/browse/TS-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheer Vinukonda resolved TS-4160.
-----------------------------------
    Resolution: Fixed

> MLocRelease for txn buffers called too late in CPP API resulting buffer 
> corruption
> ----------------------------------------------------------------------------------
>
>                 Key: TS-4160
>                 URL: https://issues.apache.org/jira/browse/TS-4160
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: CPP API
>    Affects Versions: 6.1.0
>            Reporter: Sudheer Vinukonda
>            Assignee: Sudheer Vinukonda
>              Labels: yahoo
>             Fix For: 6.2.0
>
>
> CPP API currently acquires the buffer and heap handles for various kinds of 
> request/response objects (e.g cached_request_hdr_buf, cached_request_hdr_loc, 
> client_response_hdr_buf, client_response_hdr_loc etc) at the corresponding 
> hooks of a transaction, along with the url handles in the CPP API's 
> request/response objects, but keeps them until txn_close, before calling 
> MLocRelease on them finally.
> However, this is buggy, since the core doesn't seem to guarantee the life of 
> these handles until the end of the transaction. In particular, the 
> cached_request_hdr_buf/loc are released at the end of 
> tunnel_handler_cache_read(), which causes it to be reassigned to a different 
> txn and results in a crash when the original txn finally tries to release 
> them triggering the asserts from the MLocRelease C API. Another case is 
> internal redirect follow in the core, which destroys and recreates some of 
> these buffers.
> Note that the C API plugins typically call MLocRelease immediately after 
> acquiring/using them.
> Additionally, the order in which CPP API releases the handles is incorrect. 
> Transaction object contains the Request objects and Transaction object's 
> destructor first calls MLocRelease on the request heap handles, before which 
> the Request object's d'tor is triggered which calls the url handle in that 
> released request handle. This may not be a problem right now, since 
> MLocRelease for HTTP_OBJ and URL_OBJ are basically no-ops, but, the order of 
> release still breaks the implicit contract of the API, which requires the 
> child to be released before releasing the parent (which means, if the API's 
> internal implementation changes tomorrow, that can cause a problem).
> Here are the back traces for the issues mentioned above:
> {code}
> (gdb) bt
> #0  0x0000003c1b232625 in raise (sig=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x0000003c1b233e05 in abort () at abort.c:92
> #2  0x00002b149d517018 in ink_die_die_die (retval=1) at ink_error.cc:43
> #3  0x00002b149d5170e5 in ink_fatal_va(int, const char *, typedef 
> __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b149d525d78 
> "%s:%d: failed assert `%s`", ap=0x2b14a4806570) at ink_error.cc:65
> #4  0x00002b149d5171ae in ink_fatal (return_code=1, 
> message_format=0x2b149d525d78 "%s:%d: failed assert `%s`") at ink_error.cc:73
> #5  0x00002b149d515d20 in _ink_assert (expression=0x771185 "!\"invalid 
> mloc\"", file=0x770f1d "InkAPI.cc", line=1886) at ink_assert.cc:37
> #6  0x000000000050f809 in TSHandleMLocRelease (bufp=0x2b15e82357f0, 
> parent=0x2b15e40fca98, mloc=0x2b15e40fcd18) at InkAPI.cc:1886
> #7  0x00002b15759066fc in atscppapi::Request::~Request (this=0x2b15e0003318, 
> __in_chrg=<value optimized out>) at Request.cc:179
> #8  0x00002b1575902131 in ~TransactionState (this=0x2b15e0001320, 
> __in_chrg=<value optimized out>) at Transaction.cc:40
> #9  atscppapi::Transaction::~Transaction (this=0x2b15e0001320, 
> __in_chrg=<value optimized out>) at Transaction.cc:111
> #10 0x00002b1575901031 in (anonymous namespace)::handleTransactionEvents 
> (cont=<value optimized out>, event=<value optimized out>, 
> edata=0x2b15e8235460) at utils_internal.cc:87
> #11 0x000000000050d7f0 in INKContInternal::handle_event (this=0x20399b0, 
> event=60012, edata=0x2b15e8235460) at InkAPI.cc:1000
> #12 0x00000000004f8dee in Continuation::handleEvent (this=0x20399b0, 
> event=60012, data=0x2b15e8235460) at 
> ../iocore/eventsystem/I_Continuation.h:146
> #13 0x000000000050e037 in APIHook::invoke (this=0x202b170, event=60012, 
> edata=0x2b15e8235460) at InkAPI.cc:1219
> #14 0x00000000005cfb5d in HttpSM::state_api_callout (this=0x2b15e8235460, 
> event=0, data=0x0) at HttpSM.cc:1384
> #15 0x00000000005dba75 in HttpSM::do_api_callout_internal 
> (this=0x2b15e8235460) at HttpSM.cc:4896
> #16 0x00000000005e8b2a in HttpSM::do_api_callout (this=0x2b15e8235460) at 
> HttpSM.cc:437
> #17 0x00000000005e0ed0 in HttpSM::kill_this (this=0x2b15e8235460) at 
> HttpSM.cc:6615
> #18 0x00000000005d358f in HttpSM::main_handler (this=0x2b15e8235460, 
> event=2301, data=0x2b15e82360a0) at HttpSM.cc:2556
> #19 0x00000000004f8dee in Continuation::handleEvent (this=0x2b15e8235460, 
> event=2301, data=0x2b15e82360a0) at ../iocore/eventsystem/I_Continuation.h:146
> #20 0x000000000061cebf in HttpTunnel::main_handler (this=0x2b15e82360a0, 
> event=103, data=0x2b15fc01e2e8) at HttpTunnel.cc:1537
> #21 0x00000000004f8dee in Continuation::handleEvent (this=0x2b15e82360a0, 
> event=103, data=0x2b15fc01e2e8) at ../iocore/eventsystem/I_Continuation.h:146
> #22 0x000000000070eafa in CacheVC::calluser (this=0x2b15fc01e0a0, event=103) 
> at P_CacheInternal.h:642
> #23 0x00000000007174fb in CacheVC::openWriteMain (this=0x2b15fc01e0a0) at 
> CacheWrite.cc:1367
> #24 0x00000000004f8dee in Continuation::handleEvent (this=0x2b15fc01e0a0, 
> event=1, data=0x2b15e4041090) at ../iocore/eventsystem/I_Continuation.h:146
> #25 0x0000000000760bee in EThread::process_event (this=0x2b149ef4a010, 
> e=0x2b15e4041090, calling_code=1) at UnixEThread.cc:145
> #26 0x0000000000760dbc in EThread::execute (this=0x2b149ef4a010) at 
> UnixEThread.cc:196
> #27 0x0000000000760160 in spawn_thread_internal (a=0x1b4c790) at Thread.cc:88
> #28 0x00002b149dded9d1 in start_thread (arg=0x0) at pthread_create.c:301
> #29 0x0000003c1b2e88fd in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> {code}
> {code}
> (gdb) bt
> #0  0x0000003c1b232625 in raise (sig=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x0000003c1b233e05 in abort () at abort.c:92
> #2  0x00002b0af272e018 in ink_die_die_die (retval=1) at ink_error.cc:43
> #3  0x00002b0af272e0e5 in ink_fatal_va(int, const char *, typedef 
> __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b0af273cd78 
> "%s:%d: failed assert `%s`", ap=0x2b0af9d1b420) at ink_error.cc:65
> #4  0x00002b0af272e1ae in ink_fatal (return_code=1, 
> message_format=0x2b0af273cd78 "%s:%d: failed assert `%s`") at ink_error.cc:73
> #5  0x00002b0af272cd20 in _ink_assert (expression=0x771158 
> "sdk_sanity_check_mbuffer(bufp) == TS_SUCCESS", file=0x770f1d "InkAPI.cc", 
> line=1869) at ink_assert.cc:37
> #6  0x000000000050c948 in _TSReleaseAssert (text=0x771158 
> "sdk_sanity_check_mbuffer(bufp) == TS_SUCCESS", file=0x770f1d "InkAPI.cc", 
> line=1869) at InkAPI.cc:405
> #7  0x000000000050f78e in TSHandleMLocRelease (bufp=0x2b0c4400d068, 
> parent=0x0, mloc=0x2b0c942811c8) at InkAPI.cc:1869
> #8  0x00002b0bc98bf091 in atscppapi::Transaction::~Transaction 
> (this=0x2b0c64059580, __in_chrg=<value optimized out>) at Transaction.cc:105
> #9  0x00002b0bc98be031 in (anonymous namespace)::handleTransactionEvents 
> (cont=<value optimized out>, event=<value optimized out>, 
> edata=0x2b0c401c7480) at utils_internal.cc:87
> #10 0x000000000050d7f0 in INKContInternal::handle_event (this=0x1ba23c0, 
> event=60012, edata=0x2b0c401c7480) at InkAPI.cc:1000
> #11 0x00000000004f8dee in Continuation::handleEvent (this=0x1ba23c0, 
> event=60012, data=0x2b0c401c7480) at 
> ../iocore/eventsystem/I_Continuation.h:146
> #12 0x000000000050e037 in APIHook::invoke (this=0x1b93b80, event=60012, 
> edata=0x2b0c401c7480) at InkAPI.cc:1219
> #13 0x00000000005cfb5d in HttpSM::state_api_callout (this=0x2b0c401c7480, 
> event=0, data=0x0) at HttpSM.cc:1384
> #14 0x00000000005dba75 in HttpSM::do_api_callout_internal 
> (this=0x2b0c401c7480) at HttpSM.cc:4896
> #15 0x00000000005e8b2a in HttpSM::do_api_callout (this=0x2b0c401c7480) at 
> HttpSM.cc:437
> #16 0x00000000005e0ed0 in HttpSM::kill_this (this=0x2b0c401c7480) at 
> HttpSM.cc:6615
> #17 0x00000000005d358f in HttpSM::main_handler (this=0x2b0c401c7480, 
> event=2301, data=0x2b0c401c80c0) at HttpSM.cc:2556
> #18 0x00000000004f8dee in Continuation::handleEvent (this=0x2b0c401c7480, 
> event=2301, data=0x2b0c401c80c0) at ../iocore/eventsystem/I_Continuation.h:146
> #19 0x000000000061cebf in HttpTunnel::main_handler (this=0x2b0c401c80c0, 
> event=103, data=0x2b0c2800b898) at HttpTunnel.cc:1537
> #20 0x00000000004f8dee in Continuation::handleEvent (this=0x2b0c401c80c0, 
> event=103, data=0x2b0c2800b898) at ../iocore/eventsystem/I_Continuation.h:146
> #21 0x000000000073f958 in write_signal_and_update (event=103, 
> vc=0x2b0c2800b720) at UnixNetVConnection.cc:174
> #22 0x000000000073fb3e in write_signal_done (event=103, nh=0x2b0af8508cb0, 
> vc=0x2b0c2800b720) at UnixNetVConnection.cc:216
> #23 0x0000000000740d12 in write_to_net_io (nh=0x2b0af8508cb0, 
> vc=0x2b0c2800b720, thread=0x2b0af8505010) at UnixNetVConnection.cc:559
> #24 0x0000000000740579 in write_to_net (nh=0x2b0af8508cb0, vc=0x2b0c2800b720, 
> thread=0x2b0af8505010) at UnixNetVConnection.cc:414
> #25 0x0000000000739536 in NetHandler::mainNetEvent (this=0x2b0af8508cb0, 
> event=5, e=0x1701b70) at UnixNet.cc:534
> #26 0x00000000004f8dee in Continuation::handleEvent (this=0x2b0af8508cb0, 
> event=5, data=0x1701b70) at ../iocore/eventsystem/I_Continuation.h:146
> #27 0x0000000000760bee in EThread::process_event (this=0x2b0af8505010, 
> e=0x1701b70, calling_code=5) at UnixEThread.cc:145
> #28 0x00000000007610f8 in EThread::execute (this=0x2b0af8505010) at 
> UnixEThread.cc:269
> #29 0x0000000000760160 in spawn_thread_internal (a=0x162e500) at Thread.cc:88
> #30 0x00002b0af30049d1 in start_thread (arg=0x2b0af9d1c700) at 
> pthread_create.c:301
> #31 0x0000003c1b2e88fd in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> {code}
> Below are debug logs from load tests that prove the problem:
> {code}
> Step 1: Transaction 0x2b8430214d70 allocates 
> cached_request_hdr_loc=0x2b846c026a58
> [Jan 28 18:28:19.652] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.utils_internal.cc:176) [getTransaction()] existing transaction 
> object at 0x2b8444014450 for ats pointer 0x2b8430214d70, event 60005
> [Jan 28 18:28:19.652] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.utils_internal.cc:53) [handleTransactionEvents()] Got event 60005 
> on continuation 0x2d58950 for transaction (ats pointer 0x2b8430214d70, object 
> 0x2b8444014450)
> [Jan 28 18:28:19.652] Server {0x2b82ecc0b700} DEBUG: (atscppapi) 
> TSHttpTxnCachedReqGet txnp 0x2b8430214d70, new handle bufp 0x2b8434020878, 
> heap 0x2b846c0269d0, obj 0x2b846c026a58, magic 0xabcdfeed
> [Jan 28 18:28:19.652] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.Transaction.cc:494) [initCachedRequest()] Initializing cached 
> request, txn=0x2b8430214d70, state_=0x2b8444002130, 
> client_request_hdr_buf_=0x2b8430215080, 
> client_request_hdr_loc_=0x2b8468263f18, client_response_hdr_buf_=(nil), 
> client_response_hdr_loc_=(nil), server_request_hdr_buf_=(nil), 
> server_request_hdr_loc_=(nil), server_response_hdr_buf_=(nil), 
> server_response_hdr_loc_=(nil), cached_request_hdr_buf_=0x2b8434020878, 
> cached_request_hdr_loc_=0x2b846c026a58
> Step 2: Transaction 0x2b84301f8a30 allocates the same as 
> client_response_hdr_buf_=0x2b84301f8d80, while it's handle is still in the 
> previous transaction 0x2b8430214d70 in CPP API 
> [Jan 28 18:28:19.667] Server {0x2b82e7fdd700} DIAG: 
> (atscppapi.utils_internal.cc:53) [handleTransactionEvents()] Got event 60007 
> on continuation 0x2d58950 for transaction (ats pointer 0x2b84301f8a30, object 
> 0x2b830c00a750)
> [Jan 28 18:28:19.667] Server {0x2b82e7fdd700} DIAG: 
> (atscppapi.Transaction.cc:485) [initClientResponse()] Initializing client 
> response, txn=0x2b84301f8a30, state_=0x2b830c0017d0, 
> client_request_hdr_buf_=0x2b84301f8d40, 
> client_request_hdr_loc_=0x2b84641a73e8, 
> client_response_hdr_buf_=0x2b84301f8d80, 
> client_response_hdr_loc_=0x2b846c026a58, server_request_hdr_buf_=(nil), 
> server_request_hdr_loc_=(nil), server_response_hdr_buf_=(nil), 
> server_response_hdr_loc_=(nil), cached_request_hdr_buf_=0x2b8434010478, 
> cached_request_hdr_loc_=0x2b84181666e8
> Step 3: Transaction 0x2b8430214d70 now gets TXN_CLOSE event and tries to call 
> MLocRelease on 0x2b846c026a58's url handle which results in an assert failure 
> in MLocRelease API
> [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.utils_internal.cc:53) [handleTransactionEvents()] Got event 60012 
> on continuation 0x2d58950 for transaction (ats pointer 0x2b8430214d70, object 
> 0x2b8444014450)
> [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.Transaction.cc:88) [~Transaction()] Transaction 
> tshttptxn=0x2b8430214d70 destroying Transaction object 0x2b8444014450, 
> state_=0x2b8444002130, state_->client_request_hdr_buf_=0x2b8430215080, 
> state_->client_request_hdr_loc_=0x2b8468263f18, 
> state_->server_request_hdr_buf_=(nil), 
> state_->server_response_hdr_buf_=(nil), 
> state_->client_response_hdr_buf_=0x2b84302150c0, 
> state_->cached_request_hdr_buf_=0x2b8434020878, 
> state_->cached_response_hdr_buf_=0x2b8434020880
> [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DEBUG: (atscppapi) 
> TSHandleMLocRelease bufp=0x2b8430215080, parent=(nil), mloc=0x2b8468263f18, 
> obj type 3
> [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.Transaction.cc:100) [~Transaction()] Releasing client response
> [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.Transaction.cc:104) [~Transaction()] Releasing cached request
> [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.Transaction.cc:108) [~Transaction()] Releasing cached response
> [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: 
> (atscppapi.Request.cc:181) [~Request()] Destroying request object with no 
> destroy_buf 0x2b84440021d8 on hdr_buf=0x2b8434020878, hdr_loc=0x2b846c026a58, 
> url_loc=0x2b846c026cd8
> [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DEBUG: (atscppapi) 
> TSHandleMLocRelease bufp=0x2b8434020878, parent=0x2b846c026a58, 
> mloc=0x2b846c026cd8, obj type 5
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to