[jira] [Created] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession
Susan Hinrichs created TS-3784: -- Summary: Unpleasant debug assert in when starting up a SpdyClientSession Key: TS-3784 URL: https://issues.apache.org/jira/browse/TS-3784 Project: Traffic Server Issue Type: Bug Components: SPDY Reporter: Susan Hinrichs Noticed this while trying to reproduce [~oknet]'s issue on TS-3667. I have a callback set on the SNI hook. It selects a new certificate and reenables the vc before returning. The stack is below. The assert is because the current thread does not hold the read.vio mutex. In fact no thread holds the read vio mutex. For HttpClientSession and Http2ClientSession, they use the VC's mutex when setting up the vio's, so when the do_io_reads occur the mutex is automatically already held. If I change SpdyClientSession to use the VC mutex instead of creating a new mutex, this assert does not get triggered. Not clear whether this is causing any real issues, but it seems cleaner to follow the mutex assignment strategy of the other protocols. Here is the stack {code} #0 0x00351e4328a5 in raise () from /lib64/libc.so.6 #1 0x00351e434085 in abort () from /lib64/libc.so.6 #2 0x77dda215 in ink_die_die_die () at ink_error.cc:43 #3 0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag __va_list_tag *) ( fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at ink_error.cc:65 #4 0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x77dd7f12 in _ink_assert ( expression=0x826e48 vio-mutex-thread_holding == this_ethread() thread, file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37 #6 0x0077b4d7 in UnixNetVConnection::set_enabled (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:895 #7 0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:788 #8 0x00509755 in VIO::reenable (this=0x7fffb801c660) at ../iocore/eventsystem/P_VIO.h:112 #9 0x0077a1da in UnixNetVConnection::do_io_read (this=0x7fffb801c540, c=0x7fffd402e3c0, nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628 #10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at SpdyClientSession.cc:210 #11 0x0054a1fa in ProxyClientSession::handle_api_return (this=0x7fffd402e3c0, event=6) at ProxyClientSession.cc:167 #12 0x0054a142 in ProxyClientSession::do_api_callout (this=0x7fffd402e3c0, id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147 #13 0x00639303 in SpdyClientSession::new_connection (this=0x7fffd402e3c0, new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at SpdyClientSession.cc:195 #14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, event=202, edata=0x7fffb801c540) at SpdySessionAccept.cc:48 #15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, event=202, data=0x7fffb801c540) at ../iocore/eventsystem/I_Continuation.h:146 #16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, edata=0x7fffb801c540) at SSLNextProtocolAccept.cc:32 #17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent (this=0x7fffd40008e0, event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99 #18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, event=102, data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146 #19 0x0077871e in read_signal_and_update (event=102, vc=0x7fffb801c540) ---Type return to continue, or q return to quit--- at UnixNetVConnection.cc:145 #20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, vc=0x7fffb801c540) at UnixNetVConnection.cc:206 #21 0x0077baac in UnixNetVConnection::readSignalDone (this=0x7fffb801c540, event=102, nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006 #22 0x0075e559 in SSLNetVConnection::net_read_io (this=0x7fffb801c540, nh=0x7fffef9b2be0, lthread=0x7fffef9af010) at SSLNetVConnection.cc:543 #23 0x00770b52 in NetHandler::mainNetEvent (this=0x7fffef9b2be0, event=5, e=0x1153690) at UnixNet.cc:516 #24 0x0050970e in Continuation::handleEvent (this=0x7fffef9b2be0, event=5, data=0x1153690) at ../iocore/eventsystem/I_Continuation.h:146 #25 0x0079aefa in EThread::process_event (this=0x7fffef9af010, e=0x1153690, calling_code=5) at UnixEThread.cc:128 #26 0x0079b51b in EThread::execute (this=0x7fffef9af010) at UnixEThread.cc:252 #27 0x0079a414 in spawn_thread_internal (a=0x1532c60) at Thread.cc:86 #28 0x00351e807851 in start_thread () from /lib64/libpthread.so.0 #29 0x00351e4e890d in clone () from /lib64/libc.so.6 {code} -- This message was sent by
[jira] [Updated] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession
[ https://issues.apache.org/jira/browse/TS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3784: --- Description: Noticed this while trying to reproduce [~oknet]'s issue on TS-3667. I have a callback set on the SNI hook. It selects a new certificate and reenables the vc before returning. The stack is below. The assert is because the current thread does not hold the read.vio mutex. In fact no thread holds the read vio mutex. For HttpClientSession and Http2ClientSession, they use the VC's mutex instead of creating new mutex. So that shared mutex is used when setting up the vio's, so when the do_io_reads occur the mutex is automatically already held. If I change SpdyClientSession to use the VC mutex instead of creating a new mutex, this assert does not get triggered. Not clear whether this is causing any real issues, but it seems cleaner to follow the mutex assignment strategy of the other protocols. Here is the stack {code} #0 0x00351e4328a5 in raise () from /lib64/libc.so.6 #1 0x00351e434085 in abort () from /lib64/libc.so.6 #2 0x77dda215 in ink_die_die_die () at ink_error.cc:43 #3 0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag __va_list_tag *) ( fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at ink_error.cc:65 #4 0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x77dd7f12 in _ink_assert ( expression=0x826e48 vio-mutex-thread_holding == this_ethread() thread, file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37 #6 0x0077b4d7 in UnixNetVConnection::set_enabled (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:895 #7 0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:788 #8 0x00509755 in VIO::reenable (this=0x7fffb801c660) at ../iocore/eventsystem/P_VIO.h:112 #9 0x0077a1da in UnixNetVConnection::do_io_read (this=0x7fffb801c540, c=0x7fffd402e3c0, nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628 #10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at SpdyClientSession.cc:210 #11 0x0054a1fa in ProxyClientSession::handle_api_return (this=0x7fffd402e3c0, event=6) at ProxyClientSession.cc:167 #12 0x0054a142 in ProxyClientSession::do_api_callout (this=0x7fffd402e3c0, id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147 #13 0x00639303 in SpdyClientSession::new_connection (this=0x7fffd402e3c0, new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at SpdyClientSession.cc:195 #14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, event=202, edata=0x7fffb801c540) at SpdySessionAccept.cc:48 #15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, event=202, data=0x7fffb801c540) at ../iocore/eventsystem/I_Continuation.h:146 #16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, edata=0x7fffb801c540) at SSLNextProtocolAccept.cc:32 #17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent (this=0x7fffd40008e0, event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99 #18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, event=102, data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146 #19 0x0077871e in read_signal_and_update (event=102, vc=0x7fffb801c540) ---Type return to continue, or q return to quit--- at UnixNetVConnection.cc:145 #20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, vc=0x7fffb801c540) at UnixNetVConnection.cc:206 #21 0x0077baac in UnixNetVConnection::readSignalDone (this=0x7fffb801c540, event=102, nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006 #22 0x0075e559 in SSLNetVConnection::net_read_io (this=0x7fffb801c540, nh=0x7fffef9b2be0, lthread=0x7fffef9af010) at SSLNetVConnection.cc:543 #23 0x00770b52 in NetHandler::mainNetEvent (this=0x7fffef9b2be0, event=5, e=0x1153690) at UnixNet.cc:516 #24 0x0050970e in Continuation::handleEvent (this=0x7fffef9b2be0, event=5, data=0x1153690) at ../iocore/eventsystem/I_Continuation.h:146 #25 0x0079aefa in EThread::process_event (this=0x7fffef9af010, e=0x1153690, calling_code=5) at UnixEThread.cc:128 #26 0x0079b51b in EThread::execute (this=0x7fffef9af010) at UnixEThread.cc:252 #27 0x0079a414 in spawn_thread_internal (a=0x1532c60) at Thread.cc:86 #28 0x00351e807851 in start_thread () from /lib64/libpthread.so.0 #29 0x00351e4e890d in clone () from /lib64/libc.so.6 {code} was: Noticed this while trying to reproduce [~oknet]'s issue on TS-3667. I have a callback set on the SNI hook. It selects a new certificate
[jira] [Commented] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases
[ https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635316#comment-14635316 ] Susan Hinrichs commented on TS-3667: BTW in trying to reproduce this case I ran across an unsettling debug assert in the SPDY client logic. I'll file a separate issue for that. SSL Handhake read does not correctly handle EOF and error cases --- Key: TS-3667 URL: https://issues.apache.org/jira/browse/TS-3667 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.2.0, 5.3.0 Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 5.3.1, 6.0.0 Attachments: ts-3667.diff Reported by [~esproul] and postwait. The return value of SSLNetVConnection::read_raw_data() is being ignored. So EOF and errors are not terminated, but rather spin until the inactivity timeout is reached. EAGAIN is not being descheduled until more data is available. This results in higher CPU utilization and hitting the SSL_error() function much more than it needs to be hit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession
[ https://issues.apache.org/jira/browse/TS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635343#comment-14635343 ] Susan Hinrichs commented on TS-3784: Here is the patch that eliminates the debug assert for me {code} diff --git a/proxy/spdy/SpdyClientSession.cc b/proxy/spdy/SpdyClientSession.cc index 2f8720e..fe5c732 100644 --- a/proxy/spdy/SpdyClientSession.cc +++ b/proxy/spdy/SpdyClientSession.cc @@ -94,7 +94,8 @@ SpdyClientSession::init(NetVConnection *netvc) { int r; - this-mutex = new_ProxyMutex(); + //this-mutex = new_ProxyMutex(); + this-mutex = netvc-mutex; this-vc = netvc; this-req_map.clear(); {code} Unpleasant debug assert in when starting up a SpdyClientSession --- Key: TS-3784 URL: https://issues.apache.org/jira/browse/TS-3784 Project: Traffic Server Issue Type: Bug Components: SPDY Reporter: Susan Hinrichs Noticed this while trying to reproduce [~oknet]'s issue on TS-3667. I have a callback set on the SNI hook. It selects a new certificate and reenables the vc before returning. The stack is below. The assert is because the current thread does not hold the read.vio mutex. In fact no thread holds the read vio mutex. For HttpClientSession and Http2ClientSession, they use the VC's mutex when setting up the vio's, so when the do_io_reads occur the mutex is automatically already held. If I change SpdyClientSession to use the VC mutex instead of creating a new mutex, this assert does not get triggered. Not clear whether this is causing any real issues, but it seems cleaner to follow the mutex assignment strategy of the other protocols. Here is the stack {code} #0 0x00351e4328a5 in raise () from /lib64/libc.so.6 #1 0x00351e434085 in abort () from /lib64/libc.so.6 #2 0x77dda215 in ink_die_die_die () at ink_error.cc:43 #3 0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag __va_list_tag *) ( fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at ink_error.cc:65 #4 0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x77dd7f12 in _ink_assert ( expression=0x826e48 vio-mutex-thread_holding == this_ethread() thread, file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37 #6 0x0077b4d7 in UnixNetVConnection::set_enabled (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:895 #7 0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:788 #8 0x00509755 in VIO::reenable (this=0x7fffb801c660) at ../iocore/eventsystem/P_VIO.h:112 #9 0x0077a1da in UnixNetVConnection::do_io_read (this=0x7fffb801c540, c=0x7fffd402e3c0, nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628 #10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at SpdyClientSession.cc:210 #11 0x0054a1fa in ProxyClientSession::handle_api_return (this=0x7fffd402e3c0, event=6) at ProxyClientSession.cc:167 #12 0x0054a142 in ProxyClientSession::do_api_callout (this=0x7fffd402e3c0, id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147 #13 0x00639303 in SpdyClientSession::new_connection (this=0x7fffd402e3c0, new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at SpdyClientSession.cc:195 #14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, event=202, edata=0x7fffb801c540) at SpdySessionAccept.cc:48 #15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, event=202, data=0x7fffb801c540) at ../iocore/eventsystem/I_Continuation.h:146 #16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, edata=0x7fffb801c540) at SSLNextProtocolAccept.cc:32 #17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent (this=0x7fffd40008e0, event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99 #18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, event=102, data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146 #19 0x0077871e in read_signal_and_update (event=102, vc=0x7fffb801c540) ---Type return to continue, or q return to quit--- at UnixNetVConnection.cc:145 #20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, vc=0x7fffb801c540) at UnixNetVConnection.cc:206 #21 0x0077baac in UnixNetVConnection::readSignalDone (this=0x7fffb801c540, event=102, nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006 #22 0x0075e559 in SSLNetVConnection::net_read_io (this=0x7fffb801c540, nh=0x7fffef9b2be0,
[jira] [Resolved] (TS-3788) SNI callbacks stall after TS-3667 fix
[ https://issues.apache.org/jira/browse/TS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3788. Resolution: Fixed SNI callbacks stall after TS-3667 fix - Key: TS-3788 URL: https://issues.apache.org/jira/browse/TS-3788 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 6.1.0 Reported by [~oknet] and the main discussion is in the TS-3667. Due to changes in the fix for TS-3667, the EAGAIN would get checked before calling SSL_accept. If SSL_accept state machine needed to write data, it would never get triggered and the handshake would stall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TS-3790) action=tunnel in ssl_multicert.config will cause crash
[ https://issues.apache.org/jira/browse/TS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3790. Resolution: Fixed action=tunnel in ssl_multicert.config will cause crash -- Key: TS-3790 URL: https://issues.apache.org/jira/browse/TS-3790 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 6.1.0 Attachments: ts-3790.diff Enabled an old line in my ssl_multicert.config and accidentally tested the action=tunnel feature. It caused the traffic_server process to crash. The code was assuming that a handShakeBuffer must be present if we are deciding to do a blind tunnel, but that is only the case if the decision is made in the SNI callback. I'm going to attach a patch that fixes the problem. Example line that will trigger the issue. Packets addressed to 1.2.3.4 will try to convert to blind tunnel before any SSL handshake processing is attempted. {code} dest_ip=1.2.3.4 action=tunnel ssl_cert_name=servercert.pem ssl_key_name=privkey.pem {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3775) ASAN crash while running regression test Cache_vol
[ https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637609#comment-14637609 ] Susan Hinrichs commented on TS-3775: Sigh. Committing a number of smaller fixes. Looks like I mis-read my notes. Will update commits by hand. ASAN crash while running regression test Cache_vol -- Key: TS-3775 URL: https://issues.apache.org/jira/browse/TS-3775 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Susan Hinrichs Assignee: Susan Hinrichs Attachments: ts-3775.diff Seen while running master built with ASAN on FC 21. I have a patch which I'll attach and discuss in comment. {code} REGRESSION TEST Cache_vol started RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1]) #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702 #1 0x989545 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #2 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #5 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #6 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #8 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #9 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #10 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #11 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c) 0x6048e9e0 is located 16 bytes inside of 40-byte region [0x6048e9d0,0x6048e9f8) freed by thread T2 ([ET_NET 1]) here: #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0x9c84ac in CacheDisk::delete_volume(int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:330 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684 #3 0x989455 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #10 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #11 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #12 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #13 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) previously allocated by thread T2 ([ET_NET 1]) here: #0 0x76f5714f in operator new(unsigned long) (/lib64/libasan.so.1+0x5814f) #1 0x9c770d in CacheDisk::create_volume(int, long, int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:296 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023 #3 0x989b41 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2877 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in
[jira] [Resolved] (TS-3654) ASAN heap-use-after-free in cache-hosting (regression)
[ https://issues.apache.org/jira/browse/TS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3654. Resolution: Fixed ASAN heap-use-after-free in cache-hosting (regression) -- Key: TS-3654 URL: https://issues.apache.org/jira/browse/TS-3654 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: Leif Hedstrom Assignee: Susan Hinrichs Fix For: 6.1.0 {code} RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==3733==ERROR: AddressSanitizer: heap-use-after-free on address 0x604a2960 at pc 0xa7ce83 bp 0x7f3c7f946980 sp 0x7f3c7f946970 READ of size 8 at 0x604a2960 thread T3 ([ET_NET 2]) #0 0xa7ce82 in cplist_update ../../../../iocore/cache/Cache.cc:3230 #1 0xa7ce82 in cplist_reconfigure() ../../../../iocore/cache/Cache.cc:3374 #2 0xac619e in execute_and_verify(RegressionTest*) ../../../../iocore/cache/CacheHosting.cc:994 #3 0xac75f8 in RegressionTest_Cache_vol(RegressionTest*, int, int*) ../../../../iocore/cache/CacheHosting.cc:840 #4 0x7f3c8480b4d2 in start_test ../../../../lib/ts/Regression.cc:77 #5 0x7f3c8480b4d2 in RegressionTest::run_some() ../../../../lib/ts/Regression.cc:125 #6 0x7f3c8480b9b6 in RegressionTest::check_status() ../../../../lib/ts/Regression.cc:140 #7 0x57b5b4 in RegressionCont::mainEvent(int, Event*) ../../../proxy/Main.cc:1220 #8 0xc8b86e in Continuation::handleEvent(int, void*) ../../../../iocore/eventsystem/I_Continuation.h:145 #9 0xc8b86e in EThread::process_event(Event*, int) ../../../../iocore/eventsystem/UnixEThread.cc:128 #10 0xc8da67 in EThread::execute() ../../../../iocore/eventsystem/UnixEThread.cc:207 #11 0xc8a488 in spawn_thread_internal ../../../../iocore/eventsystem/Thread.cc:85 #12 0x7f3c84392529 in start_thread (/lib64/libpthread.so.0+0x3813e07529) #13 0x381370022c in __clone (/lib64/libc.so.6+0x381370022c) 0x604a2960 is located 16 bytes inside of 40-byte region [0x604a2950,0x604a2978) freed by thread T3 ([ET_NET 2]) here: #0 0x7f3c84aaf64f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0xabbd16 in CacheDisk::delete_volume(int) ../../../../iocore/cache/CacheDisk.cc:330 #2 0xa7bfe0 in cplist_update ../../../../iocore/cache/Cache.cc:3212 #3 0xa7bfe0 in cplist_reconfigure() ../../../../iocore/cache/Cache.cc:3374 #4 0xac619e in execute_and_verify(RegressionTest*) ../../../../iocore/cache/CacheHosting.cc:994 #5 0xac75f8 in RegressionTest_Cache_vol(RegressionTest*, int, int*) ../../../../iocore/cache/CacheHosting.cc:840 #6 0x7f3c8480b4d2 in start_test ../../../../lib/ts/Regression.cc:77 #7 0x7f3c8480b4d2 in RegressionTest::run_some() ../../../../lib/ts/Regression.cc:125 #8 0x7f3c8480b9b6 in RegressionTest::check_status() ../../../../lib/ts/Regression.cc:140 #9 0x57b5b4 in RegressionCont::mainEvent(int, Event*) ../../../proxy/Main.cc:1220 #10 0xc8b86e in Continuation::handleEvent(int, void*) ../../../../iocore/eventsystem/I_Continuation.h:145 #11 0xc8b86e in EThread::process_event(Event*, int) ../../../../iocore/eventsystem/UnixEThread.cc:128 #12 0xc8da67 in EThread::execute() ../../../../iocore/eventsystem/UnixEThread.cc:207 #13 0xc8a488 in spawn_thread_internal ../../../../iocore/eventsystem/Thread.cc:85 #14 0x7f3c84392529 in start_thread (/lib64/libpthread.so.0+0x3813e07529) previously allocated by thread T3 ([ET_NET 2]) here: #0 0x7f3c84aaf14f in operator new(unsigned long) (/lib64/libasan.so.1+0x5814f) #1 0xaba5ca in CacheDisk::create_volume(int, long, int) ../../../../iocore/cache/CacheDisk.cc:296 #2 0xa74f81 in create_volume ../../../../iocore/cache/Cache.cc:3551 #3 0xa7ca20 in cplist_reconfigure() ../../../../iocore/cache/Cache.cc:3405 #4 0xac619e in execute_and_verify(RegressionTest*) ../../../../iocore/cache/CacheHosting.cc:994 #5 0xac75f8 in RegressionTest_Cache_vol(RegressionTest*, int, int*) ../../../../iocore/cache/CacheHosting.cc:840 #6 0x7f3c8480b4d2 in start_test ../../../../lib/ts/Regression.cc:77 #7 0x7f3c8480b4d2 in RegressionTest::run_some() ../../../../lib/ts/Regression.cc:125 #8 0x7f3c8480b9b6 in RegressionTest::check_status() ../../../../lib/ts/Regression.cc:140 #9 0x57b5b4 in RegressionCont::mainEvent(int, Event*) ../../../proxy/Main.cc:1220 #10 0xc8b86e in
[jira] [Resolved] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession
[ https://issues.apache.org/jira/browse/TS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3784. Resolution: Fixed Unpleasant debug assert in when starting up a SpdyClientSession --- Key: TS-3784 URL: https://issues.apache.org/jira/browse/TS-3784 Project: Traffic Server Issue Type: Bug Components: SPDY Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 6.1.0 Noticed this while trying to reproduce [~oknet]'s issue on TS-3667. I have a callback set on the SNI hook. It selects a new certificate and reenables the vc before returning. The stack is below. The assert is because the current thread does not hold the read.vio mutex. In fact no thread holds the read vio mutex. For HttpClientSession and Http2ClientSession, they use the VC's mutex instead of creating new mutex. So that shared mutex is used when setting up the vio's, so when the do_io_reads occur the mutex is automatically already held. If I change SpdyClientSession to use the VC mutex instead of creating a new mutex, this assert does not get triggered. Not clear whether this is causing any real issues, but it seems cleaner to follow the mutex assignment strategy of the other protocols. Here is the stack {code} #0 0x00351e4328a5 in raise () from /lib64/libc.so.6 #1 0x00351e434085 in abort () from /lib64/libc.so.6 #2 0x77dda215 in ink_die_die_die () at ink_error.cc:43 #3 0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag __va_list_tag *) ( fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at ink_error.cc:65 #4 0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x77dd7f12 in _ink_assert ( expression=0x826e48 vio-mutex-thread_holding == this_ethread() thread, file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37 #6 0x0077b4d7 in UnixNetVConnection::set_enabled (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:895 #7 0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:788 #8 0x00509755 in VIO::reenable (this=0x7fffb801c660) at ../iocore/eventsystem/P_VIO.h:112 #9 0x0077a1da in UnixNetVConnection::do_io_read (this=0x7fffb801c540, c=0x7fffd402e3c0, nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628 #10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at SpdyClientSession.cc:210 #11 0x0054a1fa in ProxyClientSession::handle_api_return (this=0x7fffd402e3c0, event=6) at ProxyClientSession.cc:167 #12 0x0054a142 in ProxyClientSession::do_api_callout (this=0x7fffd402e3c0, id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147 #13 0x00639303 in SpdyClientSession::new_connection (this=0x7fffd402e3c0, new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at SpdyClientSession.cc:195 #14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, event=202, edata=0x7fffb801c540) at SpdySessionAccept.cc:48 #15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, event=202, data=0x7fffb801c540) at ../iocore/eventsystem/I_Continuation.h:146 #16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, edata=0x7fffb801c540) at SSLNextProtocolAccept.cc:32 #17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent (this=0x7fffd40008e0, event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99 #18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, event=102, data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146 #19 0x0077871e in read_signal_and_update (event=102, vc=0x7fffb801c540) ---Type return to continue, or q return to quit--- at UnixNetVConnection.cc:145 #20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, vc=0x7fffb801c540) at UnixNetVConnection.cc:206 #21 0x0077baac in UnixNetVConnection::readSignalDone (this=0x7fffb801c540, event=102, nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006 #22 0x0075e559 in SSLNetVConnection::net_read_io (this=0x7fffb801c540, nh=0x7fffef9b2be0, lthread=0x7fffef9af010) at SSLNetVConnection.cc:543 #23 0x00770b52 in NetHandler::mainNetEvent (this=0x7fffef9b2be0, event=5, e=0x1153690) at UnixNet.cc:516 #24 0x0050970e in Continuation::handleEvent (this=0x7fffef9b2be0, event=5, data=0x1153690) at ../iocore/eventsystem/I_Continuation.h:146 #25 0x0079aefa in EThread::process_event
[jira] [Updated] (TS-3788) SNI callbacks stall after TS-3667 fix
[ https://issues.apache.org/jira/browse/TS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3788: --- Backport to Version: 5.3.2, 6.0.0 SNI callbacks stall after TS-3667 fix - Key: TS-3788 URL: https://issues.apache.org/jira/browse/TS-3788 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 6.1.0 Reported by [~oknet] and the main discussion is in the TS-3667. Due to changes in the fix for TS-3667, the EAGAIN would get checked before calling SSL_accept. If SSL_accept state machine needed to write data, it would never get triggered and the handshake would stall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession
[ https://issues.apache.org/jira/browse/TS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637616#comment-14637616 ] Susan Hinrichs commented on TS-3784: I mis-read my notes when setting up the commit notes on this one. Here is the commit for this one. Commit 6f66b7a18234a93e810d8ef2ce23144b9b3446f4 in trafficserver's branch refs/heads/master from shinrich [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=6f66b7a ] TS-3775: Adjust the mutex assignment for SpdyClientSession to avoid unlocked read vio. Unpleasant debug assert in when starting up a SpdyClientSession --- Key: TS-3784 URL: https://issues.apache.org/jira/browse/TS-3784 Project: Traffic Server Issue Type: Bug Components: SPDY Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 6.1.0 Noticed this while trying to reproduce [~oknet]'s issue on TS-3667. I have a callback set on the SNI hook. It selects a new certificate and reenables the vc before returning. The stack is below. The assert is because the current thread does not hold the read.vio mutex. In fact no thread holds the read vio mutex. For HttpClientSession and Http2ClientSession, they use the VC's mutex instead of creating new mutex. So that shared mutex is used when setting up the vio's, so when the do_io_reads occur the mutex is automatically already held. If I change SpdyClientSession to use the VC mutex instead of creating a new mutex, this assert does not get triggered. Not clear whether this is causing any real issues, but it seems cleaner to follow the mutex assignment strategy of the other protocols. Here is the stack {code} #0 0x00351e4328a5 in raise () from /lib64/libc.so.6 #1 0x00351e434085 in abort () from /lib64/libc.so.6 #2 0x77dda215 in ink_die_die_die () at ink_error.cc:43 #3 0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag __va_list_tag *) ( fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at ink_error.cc:65 #4 0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x77dd7f12 in _ink_assert ( expression=0x826e48 vio-mutex-thread_holding == this_ethread() thread, file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37 #6 0x0077b4d7 in UnixNetVConnection::set_enabled (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:895 #7 0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, vio=0x7fffb801c660) at UnixNetVConnection.cc:788 #8 0x00509755 in VIO::reenable (this=0x7fffb801c660) at ../iocore/eventsystem/P_VIO.h:112 #9 0x0077a1da in UnixNetVConnection::do_io_read (this=0x7fffb801c540, c=0x7fffd402e3c0, nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628 #10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at SpdyClientSession.cc:210 #11 0x0054a1fa in ProxyClientSession::handle_api_return (this=0x7fffd402e3c0, event=6) at ProxyClientSession.cc:167 #12 0x0054a142 in ProxyClientSession::do_api_callout (this=0x7fffd402e3c0, id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147 #13 0x00639303 in SpdyClientSession::new_connection (this=0x7fffd402e3c0, new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at SpdyClientSession.cc:195 #14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, event=202, edata=0x7fffb801c540) at SpdySessionAccept.cc:48 #15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, event=202, data=0x7fffb801c540) at ../iocore/eventsystem/I_Continuation.h:146 #16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, edata=0x7fffb801c540) at SSLNextProtocolAccept.cc:32 #17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent (this=0x7fffd40008e0, event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99 #18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, event=102, data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146 #19 0x0077871e in read_signal_and_update (event=102, vc=0x7fffb801c540) ---Type return to continue, or q return to quit--- at UnixNetVConnection.cc:145 #20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, vc=0x7fffb801c540) at UnixNetVConnection.cc:206 #21 0x0077baac in UnixNetVConnection::readSignalDone (this=0x7fffb801c540, event=102, nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006 #22 0x0075e559 in SSLNetVConnection::net_read_io (this=0x7fffb801c540,
[jira] [Comment Edited] (TS-3775) ASAN crash while running regression test Cache_vol
[ https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637609#comment-14637609 ] Susan Hinrichs edited comment on TS-3775 at 7/22/15 9:04 PM: - Sigh. Committing a number of smaller fixes. Looks like I mis-read my notes. Will update commits by hand. Fix for this issue is not yet committed. was (Author: shinrich): Sigh. Committing a number of smaller fixes. Looks like I mis-read my notes. Will update commits by hand. ASAN crash while running regression test Cache_vol -- Key: TS-3775 URL: https://issues.apache.org/jira/browse/TS-3775 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Susan Hinrichs Assignee: Susan Hinrichs Attachments: ts-3775.diff Seen while running master built with ASAN on FC 21. I have a patch which I'll attach and discuss in comment. {code} REGRESSION TEST Cache_vol started RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1]) #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702 #1 0x989545 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #2 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #5 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #6 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #8 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #9 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #10 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #11 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c) 0x6048e9e0 is located 16 bytes inside of 40-byte region [0x6048e9d0,0x6048e9f8) freed by thread T2 ([ET_NET 1]) here: #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0x9c84ac in CacheDisk::delete_volume(int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:330 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684 #3 0x989455 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #10 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #11 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #12 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #13 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) previously allocated by thread T2 ([ET_NET 1]) here: #0 0x76f5714f in operator new(unsigned long) (/lib64/libasan.so.1+0x5814f) #1 0x9c770d in CacheDisk::create_volume(int, long, int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:296 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023
[jira] [Created] (TS-3788) SNI callbacks stall after TS-3667 fix
Susan Hinrichs created TS-3788: -- Summary: SNI callbacks stall after TS-3667 fix Key: TS-3788 URL: https://issues.apache.org/jira/browse/TS-3788 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Susan Hinrichs Reported by [~oknet] and the main discussion is in the TS-3667. Due to changes in the fix for TS-3667, the EAGAIN would get checked before calling SSL_accept. If SSL_accept state machine needed to write data, it would never get triggered and the handshake would stall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3788) SNI callbacks stall after TS-3667 fix
[ https://issues.apache.org/jira/browse/TS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3788: -- Assignee: Susan Hinrichs SNI callbacks stall after TS-3667 fix - Key: TS-3788 URL: https://issues.apache.org/jira/browse/TS-3788 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Reported by [~oknet] and the main discussion is in the TS-3667. Due to changes in the fix for TS-3667, the EAGAIN would get checked before calling SSL_accept. If SSL_accept state machine needed to write data, it would never get triggered and the handshake would stall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases
[ https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs closed TS-3667. -- Resolution: Fixed Opened a new bug TS-3788 to track the problem noted by [~oknet] SSL Handhake read does not correctly handle EOF and error cases --- Key: TS-3667 URL: https://issues.apache.org/jira/browse/TS-3667 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.2.0, 5.3.0 Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 6.0.0, 5.3.1 Attachments: ts-3667.diff Reported by [~esproul] and postwait. The return value of SSLNetVConnection::read_raw_data() is being ignored. So EOF and errors are not terminated, but rather spin until the inactivity timeout is reached. EAGAIN is not being descheduled until more data is available. This results in higher CPU utilization and hitting the SSL_error() function much more than it needs to be hit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases
[ https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637019#comment-14637019 ] Susan Hinrichs commented on TS-3667: Filed TS-3784 to track the locking debug assert. SSL Handhake read does not correctly handle EOF and error cases --- Key: TS-3667 URL: https://issues.apache.org/jira/browse/TS-3667 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.2.0, 5.3.0 Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 5.3.1, 6.0.0 Attachments: ts-3667.diff Reported by [~esproul] and postwait. The return value of SSLNetVConnection::read_raw_data() is being ignored. So EOF and errors are not terminated, but rather spin until the inactivity timeout is reached. EAGAIN is not being descheduled until more data is available. This results in higher CPU utilization and hitting the SSL_error() function much more than it needs to be hit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3775) ASAN crash while running regression test Cache_vol
Susan Hinrichs created TS-3775: -- Summary: ASAN crash while running regression test Cache_vol Key: TS-3775 URL: https://issues.apache.org/jira/browse/TS-3775 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Susan Hinrichs Seen while running master built with ASAN on FC 21. I have a patch which I'll attach and discuss in comment. {code} REGRESSION TEST Cache_vol started RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1]) #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702 #1 0x989545 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #2 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #5 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #6 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #8 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #9 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #10 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #11 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c) 0x6048e9e0 is located 16 bytes inside of 40-byte region [0x6048e9d0,0x6048e9f8) freed by thread T2 ([ET_NET 1]) here: #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0x9c84ac in CacheDisk::delete_volume(int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:330 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684 #3 0x989455 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #10 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #11 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #12 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #13 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) previously allocated by thread T2 ([ET_NET 1]) here: #0 0x76f5714f in operator new(unsigned long) (/lib64/libasan.so.1+0x5814f) #1 0x9c770d in CacheDisk::create_volume(int, long, int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:296 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023 #3 0x989b41 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2877 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*)
[jira] [Updated] (TS-3775) ASAN crash while running regression test Cache_vol
[ https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3775: --- Attachment: ts-3775.diff ts-3775.diff NULL's out the disk_vol entry after it is deleted in CacheDisk::delete_volume. This method shifts down the other elements in the array to cover over the deleted item and decrements header-num_volumes. But cplist_update uses gndisks as the upper bound when iterating over the disk_vols array, and this number does not get decremented. So if the item deleted is the last item in the disk_vol array it does not get overwritten and the deleted object will get accessed in the next call to cplist_update. While this fixes the immediate problem since cplist_update does do a null check, there is probably a more artful solution. Having multiple values tracking the length of disk_vol seems bad. ASAN crash while running regression test Cache_vol -- Key: TS-3775 URL: https://issues.apache.org/jira/browse/TS-3775 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Susan Hinrichs Attachments: ts-3775.diff Seen while running master built with ASAN on FC 21. I have a patch which I'll attach and discuss in comment. {code} REGRESSION TEST Cache_vol started RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1]) #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702 #1 0x989545 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #2 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #5 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #6 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #8 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #9 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #10 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #11 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c) 0x6048e9e0 is located 16 bytes inside of 40-byte region [0x6048e9d0,0x6048e9f8) freed by thread T2 ([ET_NET 1]) here: #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0x9c84ac in CacheDisk::delete_volume(int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:330 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684 #3 0x989455 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #10 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #11 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #12 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #13 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) previously
[jira] [Resolved] (TS-1007) SSN Close called before TXN Close
[ https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-1007. Resolution: Fixed SSN Close called before TXN Close - Key: TS-1007 URL: https://issues.apache.org/jira/browse/TS-1007 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.0.1 Reporter: Nick Kew Assignee: Susan Hinrichs Labels: incompatible Fix For: 6.0.0 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the SSN_CLOSE_HOOK is called first of the two. This messes up normal cleanups! Details: Register a SSN_START event globally In the SSN START, add a TXN_START and a SSN_CLOSE In the TXN START, add a TXN_CLOSE Stepping through, I see the order of events actually called, for the simple case of a one-off HTTP request with no keepalive: SSN_START TXN_START SSN_END TXN_END Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the TXN! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-1007) SSN Close called before TXN Close
[ https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-1007: --- Fix Version/s: (was: 6.0.0) 6.1.0 SSN Close called before TXN Close - Key: TS-1007 URL: https://issues.apache.org/jira/browse/TS-1007 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.0.1 Reporter: Nick Kew Assignee: Susan Hinrichs Labels: incompatible Fix For: 6.1.0 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the SSN_CLOSE_HOOK is called first of the two. This messes up normal cleanups! Details: Register a SSN_START event globally In the SSN START, add a TXN_START and a SSN_CLOSE In the TXN START, add a TXN_CLOSE Stepping through, I see the order of events actually called, for the simple case of a one-off HTTP request with no keepalive: SSN_START TXN_START SSN_END TXN_END Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the TXN! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-1007) SSN Close called before TXN Close
[ https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-1007: --- Backport to Version: 6.0.0 SSN Close called before TXN Close - Key: TS-1007 URL: https://issues.apache.org/jira/browse/TS-1007 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.0.1 Reporter: Nick Kew Assignee: Susan Hinrichs Labels: incompatible Fix For: 6.1.0 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the SSN_CLOSE_HOOK is called first of the two. This messes up normal cleanups! Details: Register a SSN_START event globally In the SSN START, add a TXN_START and a SSN_CLOSE In the TXN START, add a TXN_CLOSE Stepping through, I see the order of events actually called, for the simple case of a one-off HTTP request with no keepalive: SSN_START TXN_START SSN_END TXN_END Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the TXN! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (TS-3775) ASAN crash while running regression test Cache_vol
[ https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reopened TS-3775: Assignee: Susan Hinrichs Haven't yet committed the diff. ASAN crash while running regression test Cache_vol -- Key: TS-3775 URL: https://issues.apache.org/jira/browse/TS-3775 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Susan Hinrichs Assignee: Susan Hinrichs Attachments: ts-3775.diff Seen while running master built with ASAN on FC 21. I have a patch which I'll attach and discuss in comment. {code} REGRESSION TEST Cache_vol started RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1]) #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702 #1 0x989545 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #2 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #5 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #6 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #8 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #9 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #10 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #11 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c) 0x6048e9e0 is located 16 bytes inside of 40-byte region [0x6048e9d0,0x6048e9f8) freed by thread T2 ([ET_NET 1]) here: #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0x9c84ac in CacheDisk::delete_volume(int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:330 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684 #3 0x989455 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #10 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #11 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #12 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #13 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) previously allocated by thread T2 ([ET_NET 1]) here: #0 0x76f5714f in operator new(unsigned long) (/lib64/libasan.so.1+0x5814f) #1 0x9c770d in CacheDisk::create_volume(int, long, int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:296 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023 #3 0x989b41 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2877 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
[jira] [Issue Comment Deleted] (TS-3775) ASAN crash while running regression test Cache_vol
[ https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3775: --- Comment: was deleted (was: Haven't yet committed the diff.) ASAN crash while running regression test Cache_vol -- Key: TS-3775 URL: https://issues.apache.org/jira/browse/TS-3775 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Susan Hinrichs Assignee: Susan Hinrichs Attachments: ts-3775.diff Seen while running master built with ASAN on FC 21. I have a patch which I'll attach and discuss in comment. {code} REGRESSION TEST Cache_vol started RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1]) #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702 #1 0x989545 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #2 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #5 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #6 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #8 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #9 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #10 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #11 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c) 0x6048e9e0 is located 16 bytes inside of 40-byte region [0x6048e9d0,0x6048e9f8) freed by thread T2 ([ET_NET 1]) here: #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0x9c84ac in CacheDisk::delete_volume(int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:330 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684 #3 0x989455 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #10 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #11 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #12 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #13 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) previously allocated by thread T2 ([ET_NET 1]) here: #0 0x76f5714f in operator new(unsigned long) (/lib64/libasan.so.1+0x5814f) #1 0x9c770d in CacheDisk::create_volume(int, long, int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:296 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023 #3 0x989b41 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2877 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
[jira] [Closed] (TS-3775) ASAN crash while running regression test Cache_vol
[ https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs closed TS-3775. -- Resolution: Duplicate ASAN crash while running regression test Cache_vol -- Key: TS-3775 URL: https://issues.apache.org/jira/browse/TS-3775 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Susan Hinrichs Assignee: Susan Hinrichs Attachments: ts-3775.diff Seen while running master built with ASAN on FC 21. I have a patch which I'll attach and discuss in comment. {code} REGRESSION TEST Cache_vol started RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1]) #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702 #1 0x989545 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #2 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #5 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #6 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #8 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #9 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #10 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #11 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c) 0x6048e9e0 is located 16 bytes inside of 40-byte region [0x6048e9d0,0x6048e9f8) freed by thread T2 ([ET_NET 1]) here: #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0x9c84ac in CacheDisk::delete_volume(int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:330 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684 #3 0x989455 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #10 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #11 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #12 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #13 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) previously allocated by thread T2 ([ET_NET 1]) here: #0 0x76f5714f in operator new(unsigned long) (/lib64/libasan.so.1+0x5814f) #1 0x9c770d in CacheDisk::create_volume(int, long, int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:296 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023 #3 0x989b41 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2877 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test
[jira] [Reopened] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases
[ https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reopened TS-3667: Reopening to address the patch [~oknet] provides. SSL Handhake read does not correctly handle EOF and error cases --- Key: TS-3667 URL: https://issues.apache.org/jira/browse/TS-3667 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.2.0, 5.3.0 Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 5.3.1, 6.0.0 Attachments: ts-3667.diff Reported by [~esproul] and postwait. The return value of SSLNetVConnection::read_raw_data() is being ignored. So EOF and errors are not terminated, but rather spin until the inactivity timeout is reached. EAGAIN is not being descheduled until more data is available. This results in higher CPU utilization and hitting the SSL_error() function much more than it needs to be hit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3775) ASAN crash while running regression test Cache_vol
[ https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632421#comment-14632421 ] Susan Hinrichs commented on TS-3775: Yes, looks like the same thing. ASAN crash while running regression test Cache_vol -- Key: TS-3775 URL: https://issues.apache.org/jira/browse/TS-3775 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Susan Hinrichs Attachments: ts-3775.diff Seen while running master built with ASAN on FC 21. I have a patch which I'll attach and discuss in comment. {code} REGRESSION TEST Cache_vol started RPRINT Cache_vol: 1 128 Megabyte Volumes RPRINT Cache_vol: Not enough space for 10 volume RPRINT Cache_vol: Random Volumes after clearing the disks RPRINT Cache_vol: volume=1 scheme=http size=128 RPRINT Cache_vol: Random Volumes without clearing the disks RPRINT Cache_vol: volume=1 scheme=rtsp size=128 = ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1]) #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702 #1 0x989545 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #2 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #5 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #6 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #8 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #9 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #10 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #11 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c) 0x6048e9e0 is located 16 bytes inside of 40-byte region [0x6048e9d0,0x6048e9f8) freed by thread T2 ([ET_NET 1]) here: #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f) #1 0x9c84ac in CacheDisk::delete_volume(int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:330 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684 #3 0x989455 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2846 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78 #7 0x76cb55f1 in RegressionTest::run_some() /home/shinrich/ats/lib/ts/Regression.cc:126 #8 0x76cb5b00 in RegressionTest::check_status() /home/shinrich/ats/lib/ts/Regression.cc:141 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) /home/shinrich/ats/proxy/Main.cc:1210 #10 0xb6b771 in Continuation::handleEvent(int, void*) /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146 #11 0xb6b771 in EThread::process_event(Event*, int) /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128 #12 0xb6d3a6 in EThread::execute() /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207 #13 0xb69da1 in spawn_thread_internal /home/shinrich/ats/iocore/eventsystem/Thread.cc:86 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529) previously allocated by thread T2 ([ET_NET 1]) here: #0 0x76f5714f in operator new(unsigned long) (/lib64/libasan.so.1+0x5814f) #1 0x9c770d in CacheDisk::create_volume(int, long, int) /home/shinrich/ats/iocore/cache/CacheDisk.cc:296 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023 #3 0x989b41 in cplist_reconfigure() /home/shinrich/ats/iocore/cache/Cache.cc:2877 #4 0x9d1186 in execute_and_verify(RegressionTest*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:996 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) /home/shinrich/ats/iocore/cache/CacheHosting.cc:842 #6
[jira] [Commented] (TS-3710) Crash in TLS with 6.0.0, related to the session cleanup additions
[ https://issues.apache.org/jira/browse/TS-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633650#comment-14633650 ] Susan Hinrichs commented on TS-3710: I'd definitely try the do_io_read(NULL, 0 , NULL) in ioCompleteEvent before the send_plugin_event calls. That should clear the read.vio._cont before the trampoline is deleted. Based on the cores I looked at over the weekend, it definitely looked like the problem continuation was the SSLNextProtocolTrampoline. Although that was the last patch that [~zwoop] tried. While it might have slowed down the problem. It did not stop it completely. Crash in TLS with 6.0.0, related to the session cleanup additions - Key: TS-3710 URL: https://issues.apache.org/jira/browse/TS-3710 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.3.0 Reporter: Leif Hedstrom Assignee: Susan Hinrichs Priority: Critical Labels: yahoo Fix For: 6.1.0 Attachments: ts-3710-2.diff, ts-3710-final-2.diff, ts-3710.diff {code} ==9570==ERROR: AddressSanitizer: heap-use-after-free on address 0x60649f48 at pc 0xb9f969 bp 0x2b8dbc348920 sp 0x2b8dbc348918 READ of size 8 at 0x60649f48 thread T8 ([ET_NET 7]) #0 0xb9f968 in Continuation::handleEvent(int, void*) ../../iocore/eventsystem/I_Continuation.h:145 #1 0xb9f968 in read_signal_and_update /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142 #2 0xb9f968 in UnixNetVConnection::mainEvent(int, Event*) /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1115 #3 0xb7daf7 in Continuation::handleEvent(int, void*) ../../iocore/eventsystem/I_Continuation.h:145 #4 0xb7daf7 in InactivityCop::check_inactivity(int, Event*) /usr/local/src/trafficserver/iocore/net/UnixNet.cc:102 #5 0xc21ffe in Continuation::handleEvent(int, void*) /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 #6 0xc21ffe in EThread::process_event(Event*, int) /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 #7 0xc241f7 in EThread::execute() /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207 #8 0xc20c18 in spawn_thread_internal /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 #9 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) #10 0x2b8db585f1ac in __clone (/lib64/libc.so.6+0xf61ac) 0x60649f48 is located 8 bytes inside of 56-byte region [0x60649f40,0x60649f78) freed by thread T8 ([ET_NET 7]) here: #0 0x2b8db1bf3117 in operator delete(void*) ../../.././libsanitizer/asan/asan_new_delete.cc:81 #1 0xb5b20e in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89 #2 0xbb2eef in Continuation::handleEvent(int, void*) ../../iocore/eventsystem/I_Continuation.h:145 #3 0xbb2eef in read_signal_and_update /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142 #4 0xbb2eef in read_signal_done /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:203 #5 0xbb2eef in UnixNetVConnection::readSignalDone(int, NetHandler*) /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:957 #6 0xb55d6d in SSLNetVConnection::net_read_io(NetHandler*, EThread*) /usr/local/src/trafficserver/iocore/net/SSLNetVConnection.cc:480 #7 0xb748fc in NetHandler::mainNetEvent(int, Event*) /usr/local/src/trafficserver/iocore/net/UnixNet.cc:516 #8 0xc24e89 in Continuation::handleEvent(int, void*) /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 #9 0xc24e89 in EThread::process_event(Event*, int) /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 #10 0xc24e89 in EThread::execute() /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252 #11 0xc20c18 in spawn_thread_internal /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 #12 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) previously allocated by thread T8 ([ET_NET 7]) here: #0 0x2b8db1bf2c9f in operator new(unsigned long) ../../.././libsanitizer/asan/asan_new_delete.cc:50 #1 0xb59f8b in SSLNextProtocolAccept::mainEvent(int, void*) /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:134 #2 0xb888e9 in Continuation::handleEvent(int, void*) ../../iocore/eventsystem/I_Continuation.h:145 #3 0xb888e9 in NetAccept::acceptFastEvent(int, void*) /usr/local/src/trafficserver/iocore/net/UnixNetAccept.cc:466 #4 0xc24e89 in Continuation::handleEvent(int, void*) /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 #5 0xc24e89 in
[jira] [Commented] (TS-1007) SSN Close called before TXN Close
[ https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625200#comment-14625200 ] Susan Hinrichs commented on TS-1007: I think TS-3612 will address the nested session open/session close cases of SPDY, H2 and similar future protocols. SSN Close called before TXN Close - Key: TS-1007 URL: https://issues.apache.org/jira/browse/TS-1007 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.0.1 Reporter: Nick Kew Assignee: Susan Hinrichs Labels: incompatible Fix For: 6.0.0 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the SSN_CLOSE_HOOK is called first of the two. This messes up normal cleanups! Details: Register a SSN_START event globally In the SSN START, add a TXN_START and a SSN_CLOSE In the TXN START, add a TXN_CLOSE Stepping through, I see the order of events actually called, for the simple case of a one-off HTTP request with no keepalive: SSN_START TXN_START SSN_END TXN_END Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the TXN! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-1007) SSN Close called before TXN Close
[ https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625025#comment-14625025 ] Susan Hinrichs commented on TS-1007: I've made a fix so that the transaction close occurs before the session close. We moved the ua_session-do_io_close into the HttpSM::kill_this(). We still get nested sessions for the SPDY and H2 cases. I'll file another bug to track that since it isn't the same issue as this one. SSN Close called before TXN Close - Key: TS-1007 URL: https://issues.apache.org/jira/browse/TS-1007 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.0.1 Reporter: Nick Kew Assignee: Susan Hinrichs Labels: incompatible Fix For: 6.0.0 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the SSN_CLOSE_HOOK is called first of the two. This messes up normal cleanups! Details: Register a SSN_START event globally In the SSN START, add a TXN_START and a SSN_CLOSE In the TXN START, add a TXN_CLOSE Stepping through, I see the order of events actually called, for the simple case of a one-off HTTP request with no keepalive: SSN_START TXN_START SSN_END TXN_END Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the TXN! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3746) We need to make proxy.config.ssl.client.verify.server overridable
[ https://issues.apache.org/jira/browse/TS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625047#comment-14625047 ] Susan Hinrichs commented on TS-3746: By the time you are taking an already existing session out of the pool, the certificate has been verified (or not). I guess you could set up remap rules for the same domain that resolve to the same origin server domain with conflicting values for the verify. So whether the origin server certificate is verified depends which remap rule initiated the connection. But if the user is really concerned about only verifying certs for one set of domains vs another, I wouldn't think he would write such a conflicting set of remap rules. Agreed just a list of origins would be more straightforward in some sense, but since so much already hangs on the remap rules that is kind of the obvious place for it in the minds of many current ATS deployers. [~persiaAziz] and [~davet] are testing a version using the override config approach. Should have a PR for review soon. We need to make proxy.config.ssl.client.verify.server overridable - Key: TS-3746 URL: https://issues.apache.org/jira/browse/TS-3746 Project: Traffic Server Issue Type: New Feature Components: Configuration Reporter: Syeda Persia Aziz Labels: Yahoo Fix For: sometime We need to make proxy.config.ssl.client.verify.server overridable. Some origin servers need validation to avoid MITM attacks while others don't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3746) We need to make proxy.config.ssl.client.verify.server overridable
[ https://issues.apache.org/jira/browse/TS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3746: --- Assignee: Dave Thompson We need to make proxy.config.ssl.client.verify.server overridable - Key: TS-3746 URL: https://issues.apache.org/jira/browse/TS-3746 Project: Traffic Server Issue Type: New Feature Components: Configuration Reporter: Syeda Persia Aziz Assignee: Dave Thompson Labels: Yahoo Fix For: sometime We need to make proxy.config.ssl.client.verify.server overridable. Some origin servers need validation to avoid MITM attacks while others don't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases
[ https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634109#comment-14634109 ] Susan Hinrichs commented on TS-3667: [~oknet] how do things fail for you without this patch? I don't doubt that you have a problem. But from your fix, it isn't immediately obvious to me what the fix was. Thanks. SSL Handhake read does not correctly handle EOF and error cases --- Key: TS-3667 URL: https://issues.apache.org/jira/browse/TS-3667 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.2.0, 5.3.0 Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 5.3.1, 6.0.0 Attachments: ts-3667.diff Reported by [~esproul] and postwait. The return value of SSLNetVConnection::read_raw_data() is being ignored. So EOF and errors are not terminated, but rather spin until the inactivity timeout is reached. EAGAIN is not being descheduled until more data is available. This results in higher CPU utilization and hitting the SSL_error() function much more than it needs to be hit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TS-3746) We need to make proxy.config.ssl.client.verify.server overridable
[ https://issues.apache.org/jira/browse/TS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs closed TS-3746. -- Resolution: Won't Fix Fix Version/s: (was: sometime) See discussion on PR. We will not pursue this further here. We need to make proxy.config.ssl.client.verify.server overridable - Key: TS-3746 URL: https://issues.apache.org/jira/browse/TS-3746 Project: Traffic Server Issue Type: New Feature Components: Configuration Reporter: Syeda Persia Aziz Assignee: Dave Thompson Labels: Yahoo We need to make proxy.config.ssl.client.verify.server overridable. Some origin servers need validation to avoid MITM attacks while others don't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TS-3596) TSHttpTxnPluginTagGet() returns fetchSM over H2
[ https://issues.apache.org/jira/browse/TS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3596. Resolution: Fixed Fixed via TS-3476 TSHttpTxnPluginTagGet() returns fetchSM over H2 - Key: TS-3596 URL: https://issues.apache.org/jira/browse/TS-3596 Project: Traffic Server Issue Type: Bug Components: HTTP/2 Reporter: Scott Beardsley Labels: yahoo Fix For: 6.1.0 This should probably return something else, right? Maybe HTTP2 instead? We would like a way to identify H2 requests from SPDY and/or H1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3777) TSHttpConnect and POST request does not fire TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS
[ https://issues.apache.org/jira/browse/TS-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710072#comment-14710072 ] Susan Hinrichs commented on TS-3777: Good point. Rearranged code to do so. TSHttpConnect and POST request does not fire TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS Key: TS-3777 URL: https://issues.apache.org/jira/browse/TS-3777 Project: Traffic Server Issue Type: Bug Components: TS API Reporter: Daniel Vitor Morilha Assignee: Susan Hinrichs Labels: yahoo Fix For: 6.1.0 Attachments: ts-3777-2.diff, ts-3777-3.diff, ts-3777-4.diff, ts-3777.diff When using TSHttpConnect to connect to ATS itself (internal vconnection), sending a POST request and receiving a CHUNKED response. ATS does not fire neither TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS. Trying to close the vconnection from the plug-in after receiving the last chunk (\r\n0\r\n) results into the PluginVC repeating the following message: {noformat} [Jul 14 21:24:06.094] Server {0x77fbe800} DEBUG: (pvc_event) [0] Passive: Received event 1 {noformat} I am glad to provide an example if that helps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TS-3970) Core in PluginVC
[ https://issues.apache.org/jira/browse/TS-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs closed TS-3970. -- Resolution: Invalid Turns out the problem was in the plugin. Their clean up code called TSIOBufferDestroy() before TSIOBufferReaderFree() on a reader for that buffer. This revealed a use after free error that was found quickly with ASAN. With this error, it was possible to have the Buffer reallocated and effectively have readers on the newly reallocated buffer cleared randomly. > Core in PluginVC > > > Key: TS-3970 > URL: https://issues.apache.org/jira/browse/TS-3970 > Project: Traffic Server > Issue Type: Bug >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Labels: crash > Fix For: 6.1.0 > > Attachments: ts-3970.diff > > > One of our plugins moving from 5.0.1 to 5.3.x (plus 6.0 backports) started > seeing the following stack trace with high frequency. > {code} > Program terminated with signal 11, Segmentation fault. > #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, > other_side_call=true) at PluginVC.cc:638 > in PluginVC.cc > #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, > other_side_call=true) at PluginVC.cc:638 > #1 0x0054be2a in PluginVC::process_write_side (this=0x2b9ace2f3a40, > other_side_call=false) at PluginVC.cc:555 > #2 0x0054acdb in PluginVC::main_handler (this=0x2b9ace2f3a40, > event=1, data=0x2b9b1e32e930) at PluginVC.cc:208 > #3 0x00510c84 in Continuation::handleEvent (this=0x2b9ace2f3a40, > event=1, data=0x2b9b1e32e930) at ../iocore/eventsystem/I_Continuation.h:145 > #4 0x0079a2a6 in EThread::process_event (this=0x2b9ab63d9010, > e=0x2b9b1e32e930, calling_code=1) at UnixEThread.cc:128 > #5 0x0079a474 in EThread::execute (this=0x2b9ab63d9010) at > UnixEThread.cc:179 > #6 0x00799851 in spawn_thread_internal (a=0x2fea360) at Thread.cc:85 > #7 0x2b9ab457e9d1 in start_thread () from /lib64/libpthread.so.0 > #8 0x0030d38e88fd in clone () from /lib64/libc.so.6 > {code} > The output buffer fetched by PluginVC::process_read_side was NULL. > I think they reason this appears in 5.3 is due to the fix for TS-3522. > Before that change only one do_io_read was made very early to set up the read > from server. This bug fix delays the real read to later and pulls mbuf out > of server_buffer_reader. In some cases for this plugin, the mbuf is NULL by > the time we get there. > I fixed the core by using server_session->read_buffer in the do_io_read > instead of server_buffer_reader->mbuf. This seems to fix the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3970) Core in PluginVC
[ https://issues.apache.org/jira/browse/TS-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3970: -- Assignee: Susan Hinrichs > Core in PluginVC > > > Key: TS-3970 > URL: https://issues.apache.org/jira/browse/TS-3970 > Project: Traffic Server > Issue Type: Bug >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > > One of our plugins moving from 5.0.1 to 5.3.x (plus 6.0 backports) started > seeing the following stack trace with high frequency. > {code} > Program terminated with signal 11, Segmentation fault. > #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, > other_side_call=true) at PluginVC.cc:638 > in PluginVC.cc > #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, > other_side_call=true) at PluginVC.cc:638 > #1 0x0054be2a in PluginVC::process_write_side (this=0x2b9ace2f3a40, > other_side_call=false) at PluginVC.cc:555 > #2 0x0054acdb in PluginVC::main_handler (this=0x2b9ace2f3a40, > event=1, data=0x2b9b1e32e930) at PluginVC.cc:208 > #3 0x00510c84 in Continuation::handleEvent (this=0x2b9ace2f3a40, > event=1, data=0x2b9b1e32e930) at ../iocore/eventsystem/I_Continuation.h:145 > #4 0x0079a2a6 in EThread::process_event (this=0x2b9ab63d9010, > e=0x2b9b1e32e930, calling_code=1) at UnixEThread.cc:128 > #5 0x0079a474 in EThread::execute (this=0x2b9ab63d9010) at > UnixEThread.cc:179 > #6 0x00799851 in spawn_thread_internal (a=0x2fea360) at Thread.cc:85 > #7 0x2b9ab457e9d1 in start_thread () from /lib64/libpthread.so.0 > #8 0x0030d38e88fd in clone () from /lib64/libc.so.6 > {code} > The output buffer fetched by PluginVC::process_read_side was NULL. > I think they reason this appears in 5.3 is due to the fix for TS-3522. > Before that change only one do_io_read was made very early to set up the read > from server. This bug fix delays the real read to later and pulls mbuf out > of server_buffer_reader. In some cases for this plugin, the mbuf is NULL by > the time we get there. > I fixed the core by using server_session->read_buffer in the do_io_read > instead of server_buffer_reader->mbuf. This seems to fix the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3970) Core in PluginVC
[ https://issues.apache.org/jira/browse/TS-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3970: --- Attachment: ts-3970.diff ts-3970.diff contains the code changes that fixed this crash on our build. > Core in PluginVC > > > Key: TS-3970 > URL: https://issues.apache.org/jira/browse/TS-3970 > Project: Traffic Server > Issue Type: Bug >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Attachments: ts-3970.diff > > > One of our plugins moving from 5.0.1 to 5.3.x (plus 6.0 backports) started > seeing the following stack trace with high frequency. > {code} > Program terminated with signal 11, Segmentation fault. > #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, > other_side_call=true) at PluginVC.cc:638 > in PluginVC.cc > #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, > other_side_call=true) at PluginVC.cc:638 > #1 0x0054be2a in PluginVC::process_write_side (this=0x2b9ace2f3a40, > other_side_call=false) at PluginVC.cc:555 > #2 0x0054acdb in PluginVC::main_handler (this=0x2b9ace2f3a40, > event=1, data=0x2b9b1e32e930) at PluginVC.cc:208 > #3 0x00510c84 in Continuation::handleEvent (this=0x2b9ace2f3a40, > event=1, data=0x2b9b1e32e930) at ../iocore/eventsystem/I_Continuation.h:145 > #4 0x0079a2a6 in EThread::process_event (this=0x2b9ab63d9010, > e=0x2b9b1e32e930, calling_code=1) at UnixEThread.cc:128 > #5 0x0079a474 in EThread::execute (this=0x2b9ab63d9010) at > UnixEThread.cc:179 > #6 0x00799851 in spawn_thread_internal (a=0x2fea360) at Thread.cc:85 > #7 0x2b9ab457e9d1 in start_thread () from /lib64/libpthread.so.0 > #8 0x0030d38e88fd in clone () from /lib64/libc.so.6 > {code} > The output buffer fetched by PluginVC::process_read_side was NULL. > I think they reason this appears in 5.3 is due to the fix for TS-3522. > Before that change only one do_io_read was made very early to set up the read > from server. This bug fix delays the real read to later and pulls mbuf out > of server_buffer_reader. In some cases for this plugin, the mbuf is NULL by > the time we get there. > I fixed the core by using server_session->read_buffer in the do_io_read > instead of server_buffer_reader->mbuf. This seems to fix the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3970) Core in PluginVC
Susan Hinrichs created TS-3970: -- Summary: Core in PluginVC Key: TS-3970 URL: https://issues.apache.org/jira/browse/TS-3970 Project: Traffic Server Issue Type: Bug Reporter: Susan Hinrichs One of our plugins moving from 5.0.1 to 5.3.x (plus 6.0 backports) started seeing the following stack trace with high frequency. {code} Program terminated with signal 11, Segmentation fault. #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, other_side_call=true) at PluginVC.cc:638 in PluginVC.cc #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, other_side_call=true) at PluginVC.cc:638 #1 0x0054be2a in PluginVC::process_write_side (this=0x2b9ace2f3a40, other_side_call=false) at PluginVC.cc:555 #2 0x0054acdb in PluginVC::main_handler (this=0x2b9ace2f3a40, event=1, data=0x2b9b1e32e930) at PluginVC.cc:208 #3 0x00510c84 in Continuation::handleEvent (this=0x2b9ace2f3a40, event=1, data=0x2b9b1e32e930) at ../iocore/eventsystem/I_Continuation.h:145 #4 0x0079a2a6 in EThread::process_event (this=0x2b9ab63d9010, e=0x2b9b1e32e930, calling_code=1) at UnixEThread.cc:128 #5 0x0079a474 in EThread::execute (this=0x2b9ab63d9010) at UnixEThread.cc:179 #6 0x00799851 in spawn_thread_internal (a=0x2fea360) at Thread.cc:85 #7 0x2b9ab457e9d1 in start_thread () from /lib64/libpthread.so.0 #8 0x0030d38e88fd in clone () from /lib64/libc.so.6 {code} The output buffer fetched by PluginVC::process_read_side was NULL. I think they reason this appears in 5.3 is due to the fix for TS-3522. Before that change only one do_io_read was made very early to set up the read from server. This bug fix delays the real read to later and pulls mbuf out of server_buffer_reader. In some cases for this plugin, the mbuf is NULL by the time we get there. I fixed the core by using server_session->read_buffer in the do_io_read instead of server_buffer_reader->mbuf. This seems to fix the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.
[ https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948975#comment-14948975 ] Susan Hinrichs commented on TS-3072: I ran some tests on my test harness machines. It is configured in proxy mode (no caching) making GET requests. 1KB objects are exchanged. Three requests per connection. No SSL. Single 1Gb interface. I ran my clients at not quite resource exhaustion to measure a steady state performance. Run cases: * Base: A build without client_ip debug code. diags.debug enabled = 0 * New0: A build with client_ip debug code. diags.debug enabled = 0 * New2-mismatch: A build with client_ip debug code. diags.debug.enabled = 2 and diags.debug.client_ip = IP address not involved in the test. I also tried running with an IP matching one of my test clients and http tag set, the client fell over. Base and New0 had very similar performance. About 56,600 rps. New2-mismatch was around 53,900 rps. So enabling the client_ip even if nothing matches incurs around a 5% performance penalty. But this is only seen if the configuration is explicitly set from 0 to 2. In the depths of an investigation, this might be an acceptable performance penalty. > Debug logging for a single connection in production traffic. > > > Key: TS-3072 > URL: https://issues.apache.org/jira/browse/TS-3072 > Project: Traffic Server > Issue Type: Improvement > Components: Core, Logging >Affects Versions: 5.0.1 >Reporter: Sudheer Vinukonda >Assignee: Susan Hinrichs > Labels: Yahoo > Fix For: sometime > > Attachments: ts-3072.diff > > > Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is > really hard to isolate/debug with the high traffic. Turning on debug logs in > traffic is unfortunately not an option due to performance impacts. Even if > you took a performance hit and turned on the logs, it is just as hard to > separate out the logs for a single connection/transaction among the millions > of the logs output in a short period of time. > I think it would be good if there's a way to turn on debug logs in a > controlled manner in production environment. One simple option is to support > a config setting for example, with a client-ip, which when set, would turn on > debug logs for any connection made by just that one client. If needed, > instead of one client-ip, we may allow configuring up to 'n' (say, 5) > client-ips. > If there are other ideas, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3072) Debug logging for a single connection in production traffic.
[ https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3072: -- Assignee: Susan Hinrichs > Debug logging for a single connection in production traffic. > > > Key: TS-3072 > URL: https://issues.apache.org/jira/browse/TS-3072 > Project: Traffic Server > Issue Type: Improvement > Components: Core, Logging >Affects Versions: 5.0.1 >Reporter: Sudheer Vinukonda >Assignee: Susan Hinrichs > Labels: Yahoo > Fix For: sometime > > Attachments: ts-3072.diff > > > Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is > really hard to isolate/debug with the high traffic. Turning on debug logs in > traffic is unfortunately not an option due to performance impacts. Even if > you took a performance hit and turned on the logs, it is just as hard to > separate out the logs for a single connection/transaction among the millions > of the logs output in a short period of time. > I think it would be good if there's a way to turn on debug logs in a > controlled manner in production environment. One simple option is to support > a config setting for example, with a client-ip, which when set, would turn on > debug logs for any connection made by just that one client. If needed, > instead of one client-ip, we may allow configuring up to 'n' (say, 5) > client-ips. > If there are other ideas, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.
[ https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949174#comment-14949174 ] Susan Hinrichs commented on TS-3072: Poking at my test some more and the numbers are a bit higher. But based on my math, the single 1Gbps connection puts a hard limit of 65536 rps when running without caching and having each request exchange 1KB. Max bytes sent in a second = 1024*1024*1024/8 > number of bytes sent in T transactions = 1024*T*2 1024*1024/16 > T 65536 > T > Debug logging for a single connection in production traffic. > > > Key: TS-3072 > URL: https://issues.apache.org/jira/browse/TS-3072 > Project: Traffic Server > Issue Type: Improvement > Components: Core, Logging >Affects Versions: 5.0.1 >Reporter: Sudheer Vinukonda >Assignee: Susan Hinrichs > Labels: Yahoo > Fix For: sometime > > Attachments: ts-3072.diff > > > Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is > really hard to isolate/debug with the high traffic. Turning on debug logs in > traffic is unfortunately not an option due to performance impacts. Even if > you took a performance hit and turned on the logs, it is just as hard to > separate out the logs for a single connection/transaction among the millions > of the logs output in a short period of time. > I think it would be good if there's a way to turn on debug logs in a > controlled manner in production environment. One simple option is to support > a config setting for example, with a client-ip, which when set, would turn on > debug logs for any connection made by just that one client. If needed, > instead of one client-ip, we may allow configuring up to 'n' (say, 5) > client-ips. > If there are other ideas, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error
[ https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947457#comment-14947457 ] Susan Hinrichs commented on TS-3894: My day of typo's writing up git commit comments. This is the commit that belongs to this issue. commit b3fab36196dc143283364b56b0db802e4dd81bad Author: shinrichDate: Tue Oct 6 14:00:44 2015 -0500 TS-3984 - Missing NULL checks in HttpSM::handler_server_setup_error. > Missing NULL checks in HttpSM::handle_server_setup_error > > > Key: TS-3894 > URL: https://issues.apache.org/jira/browse/TS-3894 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Fix For: 6.1.0 > > > In error cases, there may not be a consumer when expected. Missing NULL > checks on the consumer variable c can result in crashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error
[ https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3894. Resolution: Fixed > Missing NULL checks in HttpSM::handle_server_setup_error > > > Key: TS-3894 > URL: https://issues.apache.org/jira/browse/TS-3894 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Fix For: 6.1.0 > > > In error cases, there may not be a consumer when expected. Missing NULL > checks on the consumer variable c can result in crashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TS-3710) Crash in TLS with 6.0.0, related to the session cleanup additions
[ https://issues.apache.org/jira/browse/TS-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs closed TS-3710. -- Resolution: Fixed > Crash in TLS with 6.0.0, related to the session cleanup additions > - > > Key: TS-3710 > URL: https://issues.apache.org/jira/browse/TS-3710 > Project: Traffic Server > Issue Type: Bug > Components: SSL >Affects Versions: 5.3.0 >Reporter: Leif Hedstrom >Assignee: Susan Hinrichs >Priority: Critical > Labels: yahoo > Fix For: 6.1.0 > > Attachments: ts-3710-2.diff, ts-3710-8-26-15.diff, > ts-3710-final-2.diff, ts-3710.diff > > > {code} > ==9570==ERROR: AddressSanitizer: heap-use-after-free on address > 0x60649f48 at pc 0xb9f969 bp 0x2b8dbc348920 sp 0x2b8dbc348918 > READ of size 8 at 0x60649f48 thread T8 ([ET_NET 7]) > #0 0xb9f968 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #1 0xb9f968 in read_signal_and_update > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142 > #2 0xb9f968 in UnixNetVConnection::mainEvent(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1115 > #3 0xb7daf7 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #4 0xb7daf7 in InactivityCop::check_inactivity(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNet.cc:102 > #5 0xc21ffe in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #6 0xc21ffe in EThread::process_event(Event*, int) > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #7 0xc241f7 in EThread::execute() > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207 > #8 0xc20c18 in spawn_thread_internal > /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 > #9 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > #10 0x2b8db585f1ac in __clone (/lib64/libc.so.6+0xf61ac) > 0x60649f48 is located 8 bytes inside of 56-byte region > [0x60649f40,0x60649f78) > freed by thread T8 ([ET_NET 7]) here: > #0 0x2b8db1bf3117 in operator delete(void*) > ../../.././libsanitizer/asan/asan_new_delete.cc:81 > #1 0xb5b20e in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89 > #2 0xbb2eef in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #3 0xbb2eef in read_signal_and_update > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142 > #4 0xbb2eef in read_signal_done > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:203 > #5 0xbb2eef in UnixNetVConnection::readSignalDone(int, NetHandler*) > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:957 > #6 0xb55d6d in SSLNetVConnection::net_read_io(NetHandler*, EThread*) > /usr/local/src/trafficserver/iocore/net/SSLNetVConnection.cc:480 > #7 0xb748fc in NetHandler::mainNetEvent(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNet.cc:516 > #8 0xc24e89 in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #9 0xc24e89 in EThread::process_event(Event*, int) > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #10 0xc24e89 in EThread::execute() > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252 > #11 0xc20c18 in spawn_thread_internal > /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 > #12 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > previously allocated by thread T8 ([ET_NET 7]) here: > #0 0x2b8db1bf2c9f in operator new(unsigned long) > ../../.././libsanitizer/asan/asan_new_delete.cc:50 > #1 0xb59f8b in SSLNextProtocolAccept::mainEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:134 > #2 0xb888e9 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #3 0xb888e9 in NetAccept::acceptFastEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/UnixNetAccept.cc:466 > #4 0xc24e89 in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #5 0xc24e89 in EThread::process_event(Event*, int) > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #6 0xc24e89 in EThread::execute() > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252 > #7 0xc20c18 in spawn_thread_internal > /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 > #8 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > Thread
[jira] [Commented] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error
[ https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945581#comment-14945581 ] Susan Hinrichs commented on TS-3894: We have been running with this change in production starting 9/4/2015. Have not seen this crash since. > Missing NULL checks in HttpSM::handle_server_setup_error > > > Key: TS-3894 > URL: https://issues.apache.org/jira/browse/TS-3894 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Fix For: 6.1.0 > > > In error cases, there may not be a consumer when expected. Missing NULL > checks on the consumer variable c can result in crashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TS-3957) Core dump from SpdyClientSession::state_session_start
[ https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3957. Resolution: Fixed > Core dump from SpdyClientSession::state_session_start > - > > Key: TS-3957 > URL: https://issues.apache.org/jira/browse/TS-3957 > Project: Traffic Server > Issue Type: Bug > Components: SPDY >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Labels: yahoo > Fix For: 6.1.0 > > > We see this in production on machines under swap, so the timings are very > distorted. > {code} > gdb) bt > #0 0x in ?? () > #1 0x0064a5dc in SpdyClientSession::state_session_start > (this=0x2b234fbe8030) > at SpdyClientSession.cc:211 > #2 0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, > event=1, > data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145 > #3 0x0079a066 in EThread::process_event (this=0x2b21170a2010, > e=0x2b23eda76630, > calling_code=1) at UnixEThread.cc:128 > #4 0x0079a234 in EThread::execute (this=0x2b21170a2010) at > UnixEThread.cc:179 > #5 0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85 > #6 0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0 > #7 0x003827ee88fd in clone () from /lib64/libc.so.6 > {code} > After poking around on the core some more [~amc] and I determined that the vc > referenced by the SpdyClientSession was a freed object (the vtable pointer > was swizzled out to be the freelist next pointer). > We assume that the swapping is causing very odd event timing. We replaced > the schedule_immediate with a direct call that that seemed to solve our crash > in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3957) Core dump from SpdyClientSession::state_session_start
[ https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3957: --- Fix Version/s: 6.1.0 > Core dump from SpdyClientSession::state_session_start > - > > Key: TS-3957 > URL: https://issues.apache.org/jira/browse/TS-3957 > Project: Traffic Server > Issue Type: Bug > Components: SPDY >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Labels: yahoo > Fix For: 6.1.0 > > > We see this in production on machines under swap, so the timings are very > distorted. > {code} > gdb) bt > #0 0x in ?? () > #1 0x0064a5dc in SpdyClientSession::state_session_start > (this=0x2b234fbe8030) > at SpdyClientSession.cc:211 > #2 0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, > event=1, > data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145 > #3 0x0079a066 in EThread::process_event (this=0x2b21170a2010, > e=0x2b23eda76630, > calling_code=1) at UnixEThread.cc:128 > #4 0x0079a234 in EThread::execute (this=0x2b21170a2010) at > UnixEThread.cc:179 > #5 0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85 > #6 0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0 > #7 0x003827ee88fd in clone () from /lib64/libc.so.6 > {code} > After poking around on the core some more [~amc] and I determined that the vc > referenced by the SpdyClientSession was a freed object (the vtable pointer > was swizzled out to be the freelist next pointer). > We assume that the swapping is causing very odd event timing. We replaced > the schedule_immediate with a direct call that that seemed to solve our crash > in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TS-3901) Leaking connections from HttpSessionManager
[ https://issues.apache.org/jira/browse/TS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3901. Resolution: Fixed > Leaking connections from HttpSessionManager > --- > > Key: TS-3901 > URL: https://issues.apache.org/jira/browse/TS-3901 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Labels: yahoo > Fix For: 6.1.0 > > Attachments: ts-3901.diff > > > Observed in production. Got the following warnings in diags.log > "Connection leak from http keep-alive system" > Our connections to origin would increase and the number of connections in > CLOSE_WAIT were enormous. > I think the issue was when the origin URL was http with default port. That > URL was remapped to https with default port. The default port stored in > HttpServerSession->server_ip was not updated. > When the connection was closed or timed out of the session pool, it would be > looked up with port 443. But the session was stored via the server_ip value > with port 80 and would never match. > Relatively small change in HTTPHdr::_file_target_cache. > Running the fix in production to verify early results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error
[ https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945581#comment-14945581 ] Susan Hinrichs edited comment on TS-3894 at 10/6/15 7:06 PM: - We have been running with this change in production starting 9/4/2015. Have not seen this crash since. The original crash stack was: {code} gdb) bt #0 0x005f5a45 in HttpSM::handle_server_setup_error (this=0x2bada4297f70, event=105, data=0x2bad410af588) at HttpSM.cc:5278 #1 0x005e98f9 in HttpSM::state_read_server_response_header (this=0x2bada4297f70, event=105, data=0x2bad410af588) at HttpSM.cc:1824 #2 0x005ec306 in HttpSM::main_handler (this=0x2bada4297f70, event=105, data=0x2bad410af588) at HttpSM.cc:2619 #3 0x00510de4 in Continuation::handleEvent (this=0x2bada4297f70, event=105, data=0x2bad410af588) at ../iocore/eventsystem/I_Continuation.h:145 #4 0x00778965 in read_signal_and_update (event=105, vc=0x2bad410af470) at UnixNetVConnection.cc:148 #5 0x0077bfdb in UnixNetVConnection::mainEvent (this=0x2bad410af470, event=1, e=0x17c5c90) at UnixNetVConnection.cc:1171 #6 0x00510de4 in Continuation::handleEvent (this=0x2bad410af470, event=1, data=0x17c5c90) at ../iocore/eventsystem/I_Continuation.h:145 #7 0x00772d47 in InactivityCop::check_inactivity (this=0x169b440, event=2, e=0x17c5c90) at UnixNet.cc:107 #8 0x00510de4 in Continuation::handleEvent (this=0x169b440, event=2, data=0x17c5c90) at ../iocore/eventsystem/I_Continuation.h:145 #9 0x007997ee in EThread::process_event (this=0x2baa860c4010, e=0x17c5c90, calling_code=2) at UnixEThread.cc:128 #10 0x00799b09 in EThread::execute (this=0x2baa860c4010) at UnixEThread.cc:207 #11 0x00798d99 in spawn_thread_internal (a=0x1691510) at Thread.cc:85 #12 0x2baa8491c9d1 in start_thread () from /lib64/libpthread.so.0 #13 0x0039522e88fd in clone () from /lib64/libc.so.6 {code} was (Author: shinrich): We have been running with this change in production starting 9/4/2015. Have not seen this crash since. > Missing NULL checks in HttpSM::handle_server_setup_error > > > Key: TS-3894 > URL: https://issues.apache.org/jira/browse/TS-3894 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Fix For: 6.1.0 > > > In error cases, there may not be a consumer when expected. Missing NULL > checks on the consumer variable c can result in crashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3710) Crash in TLS with 6.0.0, related to the session cleanup additions
[ https://issues.apache.org/jira/browse/TS-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3710: --- Attachment: ts-3710-8-26-15.diff ts-3710-8-26.diff contains the changes we have been running in production since 8/26/2015. We haven't seen this crash on machines running with this build. This is very similar to the previous diffs. One slight difference is that we are canceling the read before the close case as well as the other cases. > Crash in TLS with 6.0.0, related to the session cleanup additions > - > > Key: TS-3710 > URL: https://issues.apache.org/jira/browse/TS-3710 > Project: Traffic Server > Issue Type: Bug > Components: SSL >Affects Versions: 5.3.0 >Reporter: Leif Hedstrom >Assignee: Susan Hinrichs >Priority: Critical > Labels: yahoo > Fix For: 6.1.0 > > Attachments: ts-3710-2.diff, ts-3710-8-26-15.diff, > ts-3710-final-2.diff, ts-3710.diff > > > {code} > ==9570==ERROR: AddressSanitizer: heap-use-after-free on address > 0x60649f48 at pc 0xb9f969 bp 0x2b8dbc348920 sp 0x2b8dbc348918 > READ of size 8 at 0x60649f48 thread T8 ([ET_NET 7]) > #0 0xb9f968 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #1 0xb9f968 in read_signal_and_update > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142 > #2 0xb9f968 in UnixNetVConnection::mainEvent(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1115 > #3 0xb7daf7 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #4 0xb7daf7 in InactivityCop::check_inactivity(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNet.cc:102 > #5 0xc21ffe in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #6 0xc21ffe in EThread::process_event(Event*, int) > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #7 0xc241f7 in EThread::execute() > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207 > #8 0xc20c18 in spawn_thread_internal > /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 > #9 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > #10 0x2b8db585f1ac in __clone (/lib64/libc.so.6+0xf61ac) > 0x60649f48 is located 8 bytes inside of 56-byte region > [0x60649f40,0x60649f78) > freed by thread T8 ([ET_NET 7]) here: > #0 0x2b8db1bf3117 in operator delete(void*) > ../../.././libsanitizer/asan/asan_new_delete.cc:81 > #1 0xb5b20e in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89 > #2 0xbb2eef in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #3 0xbb2eef in read_signal_and_update > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142 > #4 0xbb2eef in read_signal_done > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:203 > #5 0xbb2eef in UnixNetVConnection::readSignalDone(int, NetHandler*) > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:957 > #6 0xb55d6d in SSLNetVConnection::net_read_io(NetHandler*, EThread*) > /usr/local/src/trafficserver/iocore/net/SSLNetVConnection.cc:480 > #7 0xb748fc in NetHandler::mainNetEvent(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNet.cc:516 > #8 0xc24e89 in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #9 0xc24e89 in EThread::process_event(Event*, int) > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #10 0xc24e89 in EThread::execute() > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252 > #11 0xc20c18 in spawn_thread_internal > /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 > #12 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > previously allocated by thread T8 ([ET_NET 7]) here: > #0 0x2b8db1bf2c9f in operator new(unsigned long) > ../../.././libsanitizer/asan/asan_new_delete.cc:50 > #1 0xb59f8b in SSLNextProtocolAccept::mainEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:134 > #2 0xb888e9 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #3 0xb888e9 in NetAccept::acceptFastEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/UnixNetAccept.cc:466 > #4 0xc24e89 in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #5 0xc24e89 in EThread::process_event(Event*, int) >
[jira] [Commented] (TS-3957) Core dump from SpdyClientSession::state_session_start
[ https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945533#comment-14945533 ] Susan Hinrichs commented on TS-3957: Our change was first partially put in production 9/4. We haven't seen any more crashes like this on that build. We have run into at least one resource storm that caused this problem originally. > Core dump from SpdyClientSession::state_session_start > - > > Key: TS-3957 > URL: https://issues.apache.org/jira/browse/TS-3957 > Project: Traffic Server > Issue Type: Bug > Components: SPDY >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Labels: yahoo > > We see this in production on machines under swap, so the timings are very > distorted. > {code} > gdb) bt > #0 0x in ?? () > #1 0x0064a5dc in SpdyClientSession::state_session_start > (this=0x2b234fbe8030) > at SpdyClientSession.cc:211 > #2 0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, > event=1, > data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145 > #3 0x0079a066 in EThread::process_event (this=0x2b21170a2010, > e=0x2b23eda76630, > calling_code=1) at UnixEThread.cc:128 > #4 0x0079a234 in EThread::execute (this=0x2b21170a2010) at > UnixEThread.cc:179 > #5 0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85 > #6 0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0 > #7 0x003827ee88fd in clone () from /lib64/libc.so.6 > {code} > After poking around on the core some more [~amc] and I determined that the vc > referenced by the SpdyClientSession was a freed object (the vtable pointer > was swizzled out to be the freelist next pointer). > We assume that the swapping is causing very odd event timing. We replaced > the schedule_immediate with a direct call that that seemed to solve our crash > in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3701) link Cache Promote Plugin document into index and fix spell in records.config.en.rst
[ https://issues.apache.org/jira/browse/TS-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945957#comment-14945957 ] Susan Hinrichs commented on TS-3701: Typoed the bug number in the commit. The commit above belongs to TS-3710. > link Cache Promote Plugin document into index and fix spell in > records.config.en.rst > > > Key: TS-3701 > URL: https://issues.apache.org/jira/browse/TS-3701 > Project: Traffic Server > Issue Type: Bug > Components: Docs >Reporter: Oknet Xu >Assignee: Jon Sime > Fix For: Docs > > > here is the patch: > {code} > diff --git a/doc/reference/configuration/records.config.en.rst > b/doc/reference/configuration/records.config.en.rst > index 2c7267b..5c203a6 100644 > --- a/doc/reference/configuration/records.config.en.rst > +++ b/doc/reference/configuration/records.config.en.rst > @@ -2017,7 +2017,7 @@ Logging Configuration > - ``log_name`` STRING [format] > The filename (ex. :ref:`squid log `). > > -- ``log_header_ STRING NULL > +- ``log_header`` STRING NULL > The file header text (ex. :ref:`squid log > `). > > The format can be either ``squid`` (Squid Format), ``common`` (Netscape > Common), ``extended`` (Netscape Extended), > diff --git a/doc/reference/plugins/index.en.rst > b/doc/reference/plugins/index.en.rst > index 0e43b87..722cc4c 100644 > --- a/doc/reference/plugins/index.en.rst > +++ b/doc/reference/plugins/index.en.rst > @@ -67,6 +67,7 @@ directory of the Apache Traffic Server source tree. > Experimental plugins can be >Background Fetch Plugin: allows you to proactively fetch content from > Origin in a way that it will fill the object into cache >Balancer Plugin: balances requests across multiple origin servers > >Buffer Upload Plugin: buffers POST data before connecting to the Origin > server > + Cache Promote Plugin: provides a means to control when an object should be > allowed to enter the cache >Combohandler Plugin: provides an intelligent way to combine multiple URLs > into a single URL, and have Apache Traffic Server combine the components into > one response >Epic Plugin: emits Traffic Server metrics in a format that is consumed tby > the Epic Network Monitoring System >ESI Plugin: implements the ESI specification > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3710) Crash in TLS with 6.0.0, related to the session cleanup additions
[ https://issues.apache.org/jira/browse/TS-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945958#comment-14945958 ] Susan Hinrichs commented on TS-3710: Typoed the bug number in the commit comment. This commit belongs with this issue. Commit 1859562086b330eed6eda637f5f98a3431db5915 in trafficserver's branch refs/heads/master from shinrich [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=1859562 ] TS-3701 - Crash in trampoline cleanup > Crash in TLS with 6.0.0, related to the session cleanup additions > - > > Key: TS-3710 > URL: https://issues.apache.org/jira/browse/TS-3710 > Project: Traffic Server > Issue Type: Bug > Components: SSL >Affects Versions: 5.3.0 >Reporter: Leif Hedstrom >Assignee: Susan Hinrichs >Priority: Critical > Labels: yahoo > Fix For: 6.1.0 > > Attachments: ts-3710-2.diff, ts-3710-8-26-15.diff, > ts-3710-final-2.diff, ts-3710.diff > > > {code} > ==9570==ERROR: AddressSanitizer: heap-use-after-free on address > 0x60649f48 at pc 0xb9f969 bp 0x2b8dbc348920 sp 0x2b8dbc348918 > READ of size 8 at 0x60649f48 thread T8 ([ET_NET 7]) > #0 0xb9f968 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #1 0xb9f968 in read_signal_and_update > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142 > #2 0xb9f968 in UnixNetVConnection::mainEvent(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1115 > #3 0xb7daf7 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #4 0xb7daf7 in InactivityCop::check_inactivity(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNet.cc:102 > #5 0xc21ffe in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #6 0xc21ffe in EThread::process_event(Event*, int) > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #7 0xc241f7 in EThread::execute() > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207 > #8 0xc20c18 in spawn_thread_internal > /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 > #9 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > #10 0x2b8db585f1ac in __clone (/lib64/libc.so.6+0xf61ac) > 0x60649f48 is located 8 bytes inside of 56-byte region > [0x60649f40,0x60649f78) > freed by thread T8 ([ET_NET 7]) here: > #0 0x2b8db1bf3117 in operator delete(void*) > ../../.././libsanitizer/asan/asan_new_delete.cc:81 > #1 0xb5b20e in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89 > #2 0xbb2eef in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #3 0xbb2eef in read_signal_and_update > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142 > #4 0xbb2eef in read_signal_done > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:203 > #5 0xbb2eef in UnixNetVConnection::readSignalDone(int, NetHandler*) > /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:957 > #6 0xb55d6d in SSLNetVConnection::net_read_io(NetHandler*, EThread*) > /usr/local/src/trafficserver/iocore/net/SSLNetVConnection.cc:480 > #7 0xb748fc in NetHandler::mainNetEvent(int, Event*) > /usr/local/src/trafficserver/iocore/net/UnixNet.cc:516 > #8 0xc24e89 in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #9 0xc24e89 in EThread::process_event(Event*, int) > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #10 0xc24e89 in EThread::execute() > /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252 > #11 0xc20c18 in spawn_thread_internal > /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85 > #12 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > previously allocated by thread T8 ([ET_NET 7]) here: > #0 0x2b8db1bf2c9f in operator new(unsigned long) > ../../.././libsanitizer/asan/asan_new_delete.cc:50 > #1 0xb59f8b in SSLNextProtocolAccept::mainEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:134 > #2 0xb888e9 in Continuation::handleEvent(int, void*) > ../../iocore/eventsystem/I_Continuation.h:145 > #3 0xb888e9 in NetAccept::acceptFastEvent(int, void*) > /usr/local/src/trafficserver/iocore/net/UnixNetAccept.cc:466 > #4 0xc24e89 in Continuation::handleEvent(int, void*) > /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145 > #5 0xc24e89 in EThread::process_event(Event*, int) >
[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.
[ https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950607#comment-14950607 ] Susan Hinrichs commented on TS-3072: Ignore my previous performance numbers. I had some servers missing when I ran that. I ran a cache scenario on my 1GB network machine and 4 multi-threaded clients. The clients fetch a cached 512 byte item. I'm still testing three cases. In addition to rps changes, I noted the change in CPU % utilization from perf top for Diags:on and pthread_get_specific. Base (not including this code change). 163125 rps. .97% in Diags::on and 0.61% in pthread_get_specific New (enable set to 0). 163169 rps. 1.19% in Diags:on and 0.58% in pthread_get_specific New (enable set to 2). 162777 rps. 1.3% Diags::on and 1.06% in pthread_get_specific So the impact of enabling the Debug IP checking but not actually matching (and logging) seems pretty minimal. In these experiments, we spend roughly extra 0.75% CPU and lose roughly 400 rps (or 0.2% reduction). No real impact in adding the code but leaving debug.enabled at 0. > Debug logging for a single connection in production traffic. > > > Key: TS-3072 > URL: https://issues.apache.org/jira/browse/TS-3072 > Project: Traffic Server > Issue Type: Improvement > Components: Core, Logging >Affects Versions: 5.0.1 >Reporter: Sudheer Vinukonda >Assignee: Susan Hinrichs > Labels: Yahoo > Fix For: sometime > > Attachments: ts-3072.diff > > > Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is > really hard to isolate/debug with the high traffic. Turning on debug logs in > traffic is unfortunately not an option due to performance impacts. Even if > you took a performance hit and turned on the logs, it is just as hard to > separate out the logs for a single connection/transaction among the millions > of the logs output in a short period of time. > I think it would be good if there's a way to turn on debug logs in a > controlled manner in production environment. One simple option is to support > a config setting for example, with a client-ip, which when set, would turn on > debug logs for any connection made by just that one client. If needed, > instead of one client-ip, we may allow configuring up to 'n' (say, 5) > client-ips. > If there are other ideas, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.
[ https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950359#comment-14950359 ] Susan Hinrichs commented on TS-3072: [~zwoop] thanks for reminding me of duplexing. My rps had risen above 65K rps and I has very confused. [~bcall] agreed that for real performance testing we should be working on machines with 10G interfaces. For the purposes of this issue though I don't think we really care about absolute performance. We just need to have an understanding of the penalty of enabling a IP-specific debug, and verifying that this code change doesn't affect performance if debug is turned off entirely. I'll spend some more time today sorting out my test setup, and post updated comparisons and add numbers for the cached case as well. > Debug logging for a single connection in production traffic. > > > Key: TS-3072 > URL: https://issues.apache.org/jira/browse/TS-3072 > Project: Traffic Server > Issue Type: Improvement > Components: Core, Logging >Affects Versions: 5.0.1 >Reporter: Sudheer Vinukonda >Assignee: Susan Hinrichs > Labels: Yahoo > Fix For: sometime > > Attachments: ts-3072.diff > > > Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is > really hard to isolate/debug with the high traffic. Turning on debug logs in > traffic is unfortunately not an option due to performance impacts. Even if > you took a performance hit and turned on the logs, it is just as hard to > separate out the logs for a single connection/transaction among the millions > of the logs output in a short period of time. > I think it would be good if there's a way to turn on debug logs in a > controlled manner in production environment. One simple option is to support > a config setting for example, with a client-ip, which when set, would turn on > debug logs for any connection made by just that one client. If needed, > instead of one client-ip, we may allow configuring up to 'n' (say, 5) > client-ips. > If there are other ideas, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.
[ https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950379#comment-14950379 ] Susan Hinrichs commented on TS-3072: [~jpe...@apache.org] let me ponder that some more. But here are my first thoughts. The transaction is not always (or easily) available from the VC level which is where many useful debug messages lie. We could push the debug_override flag from the continuation down into the NetworkVC class. As it turns out I only ended up using the debug_override on the netVC's. So that would eliminate polluting the top Continuation class. It would be nice to not use thread local storage. The motivation to use thread local storage was to minimize code change, and ease the inclusion of future Debug messages into the conditional debug scheme. I'll look again to see of other data structures are always available to Diag and the point of debug decision making. One could add a Plugin call to adjust the debug_override flag from the transaction object (assuming one could get access to the netvc from the transaction) or from the session object. Though I guess a tricky bit of doing a per transaction debug they way I have things set up is only debugging one transaction but not the others on the same net VC (in the case of HTTP2 or SPDY). > Debug logging for a single connection in production traffic. > > > Key: TS-3072 > URL: https://issues.apache.org/jira/browse/TS-3072 > Project: Traffic Server > Issue Type: Improvement > Components: Core, Logging >Affects Versions: 5.0.1 >Reporter: Sudheer Vinukonda >Assignee: Susan Hinrichs > Labels: Yahoo > Fix For: sometime > > Attachments: ts-3072.diff > > > Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is > really hard to isolate/debug with the high traffic. Turning on debug logs in > traffic is unfortunately not an option due to performance impacts. Even if > you took a performance hit and turned on the logs, it is just as hard to > separate out the logs for a single connection/transaction among the millions > of the logs output in a short period of time. > I think it would be good if there's a way to turn on debug logs in a > controlled manner in production environment. One simple option is to support > a config setting for example, with a client-ip, which when set, would turn on > debug logs for any connection made by just that one client. If needed, > instead of one client-ip, we may allow configuring up to 'n' (say, 5) > client-ips. > If there are other ideas, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3742) ATS advertises TLS ticket extension even when disabled
[ https://issues.apache.org/jira/browse/TS-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618446#comment-14618446 ] Susan Hinrichs commented on TS-3742: Then your workaround for this issue until we can address it properly via TS-3371 is to add the following line to make an explicit default dest_ip=* ssl_cert_name=certx.pem ssl_ticket_enabled=0 certx.pem can be one of your existing cert files, or a new key pair. The downside of this approach is that you will have a cert (probably bogus) for all SSL connection attempts. That may not be worth cleaning up your ticket advertising. ATS advertises TLS ticket extension even when disabled -- Key: TS-3742 URL: https://issues.apache.org/jira/browse/TS-3742 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Noted by [~hreindl]. Even if you have ssl_ticket_enabled=0 on the relevant line in ssl_multicert.config, the Server Hello message will still contain the ticket tls extension. The problem is the code is blindly resetting the ticket callback on the context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TS-3742) ATS advertises TLS ticket extension even when disabled
[ https://issues.apache.org/jira/browse/TS-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs closed TS-3742. -- Resolution: Won't Fix ATS advertises TLS ticket extension even when disabled -- Key: TS-3742 URL: https://issues.apache.org/jira/browse/TS-3742 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Noted by [~hreindl]. Even if you have ssl_ticket_enabled=0 on the relevant line in ssl_multicert.config, the Server Hello message will still contain the ticket tls extension. The problem is the code is blindly resetting the ticket callback on the context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3742) ATS advertises TLS ticket extension even when disabled
[ https://issues.apache.org/jira/browse/TS-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616676#comment-14616676 ] Susan Hinrichs commented on TS-3742: My original observation on the cause of this issue is incorrect. The real problem is that whether tickets are enabled or not is controlled by the default entry in ssl_multicert.config or by the built in default which is created if no '*' entry is present in ssl_multicert.config. The code dutifully sets or clears SSL_OP_NO_TICKET for each SSL_CTX based on the ssl_ticket_enabled flag (which is on by default). But by the time the code updates the SSL_CTX for the active SSL object in the SNI callback, the state about the tickets already seems to be set in the SSL object. I tried calling SSL_clear_options and SSL_set_options to make the SSL object have the same value as the SSL_CTX object with respect to the SSL_OP_NO_TICKET flag, but it did not change whether the server hello advertised tickets or not. It kept to the same state as was set on the original default SSL_CTX. So there seems to be no code change that will enable tickets by default but disable them for a particular entry (or visa versa). As it stands, the ssl_ticket_enabled on the default entry controls whether tickets are advertised. If there is no default entry, the builtin default will have tickets enabled. The solution seems to be to implement TS-3371 and provide a global enable/disable for tickets. My tests were done with openssl 1.0.1f. Things may vary between different versions of openssl. ATS advertises TLS ticket extension even when disabled -- Key: TS-3742 URL: https://issues.apache.org/jira/browse/TS-3742 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Noted by [~hreindl]. Even if you have ssl_ticket_enabled=0 on the relevant line in ssl_multicert.config, the Server Hello message will still contain the ticket tls extension. The problem is the code is blindly resetting the ticket callback on the context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3683) Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused
[ https://issues.apache.org/jira/browse/TS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620419#comment-14620419 ] Susan Hinrichs commented on TS-3683: Sorry, I had a git log mishap when pushing the commit. Instead of the nice single git log entry, the push went up as four commits. {code} da04362227ef91b27aa7d02e9238f1ceae68689d f3e13664ab20f60cb4bd2ffef1eb7d6a374a1698 5a4350e6067ac868e54538467ec83a9413853143 71752c741ac8b49d432dd4b13f5ea2a7f176b37e {code} Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused Key: TS-3683 URL: https://issues.apache.org/jira/browse/TS-3683 Project: Traffic Server Issue Type: Improvement Components: Logging Reporter: François Pesce Assignee: Alan M. Carroll Labels: yahoo Fix For: 6.1.0 These tags would be useful for performance metrics collection: %cqtr The TCP reused status; indicates if this request went through an already established connection. %cqssr The SSL session/ticket reused status; indicates if this request hit the SSL session/ticket and avoided a full SSL handshake. both of them would display respectively 0 or 1 , if resp. not reused or reused. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3683) Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused
[ https://issues.apache.org/jira/browse/TS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3683: --- Assignee: François Pesce (was: Alan M. Carroll) Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused Key: TS-3683 URL: https://issues.apache.org/jira/browse/TS-3683 Project: Traffic Server Issue Type: Improvement Components: Logging Reporter: François Pesce Assignee: François Pesce Labels: yahoo Fix For: 6.1.0 These tags would be useful for performance metrics collection: %cqtr The TCP reused status; indicates if this request went through an already established connection. %cqssr The SSL session/ticket reused status; indicates if this request hit the SSL session/ticket and avoided a full SSL handshake. both of them would display respectively 0 or 1 , if resp. not reused or reused. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3596) TSHttpTxnPluginTagGet() returns fetchSM over H2
[ https://issues.apache.org/jira/browse/TS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620471#comment-14620471 ] Susan Hinrichs commented on TS-3596: With [~es]'s fix for TS-3476, PluginGetTag will return http/2 for the Http/2 case. [~jpe...@apache.org], agreed this is weird because Http/2 is not a plugin, but it is currently implemented by plugin framework. We're abusing that to quickly get some access to the ultimate protocol for logging protocols. Hopefully with a fix for TS-3612, this can all get cleaned up. TSHttpTxnPluginTagGet() returns fetchSM over H2 - Key: TS-3596 URL: https://issues.apache.org/jira/browse/TS-3596 Project: Traffic Server Issue Type: Bug Components: HTTP/2 Reporter: Scott Beardsley Labels: yahoo Fix For: 6.1.0 This should probably return something else, right? Maybe HTTP2 instead? We would like a way to identify H2 requests from SPDY and/or H1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3293) Need to review various protocol accept objects and make them more widely available
[ https://issues.apache.org/jira/browse/TS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3293: --- Assignee: Dave Thompson (was: Susan Hinrichs) Need to review various protocol accept objects and make them more widely available -- Key: TS-3293 URL: https://issues.apache.org/jira/browse/TS-3293 Project: Traffic Server Issue Type: Bug Reporter: Susan Hinrichs Assignee: Dave Thompson Fix For: 6.1.0 This came up most recently in propagating tr-pass information for TS-3292 The early configuration is being duplicated in too many objects. The information is being propagated differently for HTTP and SSL (who knows what is happening with SPDY). We should take a step back to review and unify this information. Alan took a first pass on this review with his Early Intervention talk from the Fall 2014 summit https://www.dropbox.com/s/4vw91czj41rdxjo/ATS-Early-Intervention.pptx?dl=0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3656) Activating follow redirection in send server response hook does not work for post
[ https://issues.apache.org/jira/browse/TS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3656: --- Fix Version/s: (was: 6.0.0) 6.1.0 Activating follow redirection in send server response hook does not work for post - Key: TS-3656 URL: https://issues.apache.org/jira/browse/TS-3656 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 6.1.0 If you have a plugin on the TS_HTTP_SEND_RESPONSE_HDR_HOOK, calls TSHttpTxnFollowRedirect(txn, 1), redirecting a POST request will fail. In the not so bad case, the POST request will be redirected to the new location, but the POST data will be lost. In the more bad case, ATS will crash. The issue is that the post_redirect buffers are freed early on. One could delay the post_redirect deallocation until later in the transaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3486) Segfault in do_io_write with plugin (??)
[ https://issues.apache.org/jira/browse/TS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622152#comment-14622152 ] Susan Hinrichs commented on TS-3486: I think the plan was to get a 5.3.2 out pretty quickly with this fix and one other. I don't think we back point to already released point releases. Segfault in do_io_write with plugin (??) Key: TS-3486 URL: https://issues.apache.org/jira/browse/TS-3486 Project: Traffic Server Issue Type: Bug Affects Versions: 5.2.0, 5.3.0 Reporter: Qiang Li Assignee: Susan Hinrichs Labels: crash Fix For: 6.0.0 Attachments: ts-3266-2.diff, ts-3266-complete.diff, ts3486-ptrace.txt.gz {code} (gdb) bt #0 0x005bdb8b in HttpServerSession::do_io_write (this=value optimized out, c=0x2aaadccc4bf0, nbytes=576, buf=0x2aaafc2ffee8, owner=false) at HttpServerSession.cc:104 #1 0x005acc1d in HttpSM::setup_server_send_request (this=0x2aaadccc4bf0) at HttpSM.cc:5686 #2 0x005b3f85 in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at HttpSM.cc:1520 #3 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, event=6, data=0x0) at HttpSM.cc:1455 #4 0x005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, event=6, data=0x0) at HttpSM.cc:1275 #5 0x004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614 #6 0x2ba118441c89 in cachefun (contp=value optimized out, event=value optimized out, edata=0x2aaadccc4bf0) at main.cpp:1876 #7 0x005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, event=value optimized out, data=value optimized out) at HttpSM.cc:1381 #8 0x005b627d in HttpSM::do_http_server_open (this=0x2aaadccc4bf0, raw=value optimized out) at HttpSM.cc:4639 #9 0x005baa04 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:7021 #10 0x005b25a3 in HttpSM::state_cache_open_write (this=0x2aaadccc4bf0, event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2442 #11 0x005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2554 #12 0x0059338a in handleEvent (this=0x2aaadccc6618, event=value optimized out, data=0x2aab1c3b6800) at ../../iocore/eventsystem/I_Continuation.h:145 #13 HttpCacheSM::state_cache_open_write (this=0x2aaadccc6618, event=value optimized out, data=0x2aab1c3b6800) at HttpCacheSM.cc:167 #14 0x00697223 in handleEvent (this=0x2aab1c3b6800, event=value optimized out) at ../../iocore/eventsystem/I_Continuation.h:145 #15 CacheVC::callcont (this=0x2aab1c3b6800, event=value optimized out) at ../../iocore/cache/P_CacheInternal.h:662 #16 0x00715940 in Cache::open_write (this=value optimized out, cont=value optimized out, key=0x2ba0ff762d70, info=value optimized out, apin_in_cache=46914401429576, type=CACHE_FRAG_TYPE_HTTP, hostname=0x2aaadd281078 www.mifangba.comhttpapi.phpwww.mifangba.comhttp://www.mifangba.com/api.php?op=countid=4modelid=12;, host_len=16) at CacheWrite.cc:1788 #17 0x006e5765 in open_write (this=value optimized out, cont=0x2aaadccc6618, expected_size=value optimized out, url=0x2aaadccc5310, cluster_cache_local=value optimized out, request=value optimized out, old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at P_CacheInternal.h:1093 #18 CacheProcessor::open_write (this=value optimized out, cont=0x2aaadccc6618, expected_size=value optimized out, url=0x2aaadccc5310, cluster_cache_local=value optimized out, request=value optimized out, old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3622 #19 0x005936f0 in HttpCacheSM::open_write (this=0x2aaadccc6618, url=value optimized out, request=value optimized out, old_info=value optimized out, pin_in_cache=value optimized out, retry=value optimized out, allow_multiple=false) at HttpCacheSM.cc:298 #20 0x005a022e in HttpSM::do_cache_prepare_action (this=0x2aaadccc4bf0, c_sm=0x2aaadccc6618, object_read_info=0x0, retry=true, allow_multiple=false) at HttpSM.cc:4511 #21 0x005babd9 in do_cache_prepare_write (this=0x2aaadccc4bf0) at HttpSM.cc:4436 #22 HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:7098 #23 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at HttpSM.cc:1517 #24 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, event=0, data=0x0) at HttpSM.cc:1455 #25 0x005ba712 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:6876 #26 0x005ba702 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:6919 #27 0x005b3f5f in HttpSM::handle_api_return
[jira] [Updated] (TS-1007) SSN Close called before TXN Close
[ https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-1007: --- Assignee: Susan Hinrichs (was: Alan M. Carroll) SSN Close called before TXN Close - Key: TS-1007 URL: https://issues.apache.org/jira/browse/TS-1007 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.0.1 Reporter: Nick Kew Assignee: Susan Hinrichs Labels: incompatible Fix For: 6.0.0 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the SSN_CLOSE_HOOK is called first of the two. This messes up normal cleanups! Details: Register a SSN_START event globally In the SSN START, add a TXN_START and a SSN_CLOSE In the TXN START, add a TXN_CLOSE Stepping through, I see the order of events actually called, for the simple case of a one-off HTTP request with no keepalive: SSN_START TXN_START SSN_END TXN_END Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the TXN! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3746) We need to make proxy.config.ssl.client.verify.server overridable
[ https://issues.apache.org/jira/browse/TS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622785#comment-14622785 ] Susan Hinrichs commented on TS-3746: Are you asking why you don't just verify all certificates from all origins? That is what I would prefer from a security perspective. But from an organizational perspective, not everyone is ready to bet connectivity that all the verifying certs are distributed appropriately. Actually the override can be set from within a transaction, since this is the connection from ATS to the origin server which would only happen within the context of a transaction. We need to make proxy.config.ssl.client.verify.server overridable - Key: TS-3746 URL: https://issues.apache.org/jira/browse/TS-3746 Project: Traffic Server Issue Type: New Feature Components: Configuration Reporter: Syeda Persia Aziz Labels: Yahoo Fix For: sometime We need to make proxy.config.ssl.client.verify.server overridable. Some origin servers need validation to avoid MITM attacks while others don't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-1007) SSN Close called before TXN Close
[ https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622820#comment-14622820 ] Susan Hinrichs commented on TS-1007: Research notes. Case 1, HTTP1.1 over TCP: SSN Start, TXN Start, TXN Close, SSN Close All is well! Case 2, HTTP 1.1 over SSL: SSN Start, TXN Start, TXN Close, TXN Start, SSN Close, TXN Close Look like the keep_alive logic is being triggered by a read ready to set up a new TXN Case 3, HTTP 1.1 over H2: SSN Start X, SSN Start Y, TXN Start, SSN Close Y, TXN Close, SSN Close X Two Sessions, not so good. Need to recompile to get the SPDY case set up. SSN Close called before TXN Close - Key: TS-1007 URL: https://issues.apache.org/jira/browse/TS-1007 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.0.1 Reporter: Nick Kew Assignee: Susan Hinrichs Labels: incompatible Fix For: 6.0.0 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the SSN_CLOSE_HOOK is called first of the two. This messes up normal cleanups! Details: Register a SSN_START event globally In the SSN START, add a TXN_START and a SSN_CLOSE In the TXN START, add a TXN_CLOSE Stepping through, I see the order of events actually called, for the simple case of a one-off HTTP request with no keepalive: SSN_START TXN_START SSN_END TXN_END Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the TXN! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-1007) SSN Close called before TXN Close
[ https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622820#comment-14622820 ] Susan Hinrichs edited comment on TS-1007 at 7/10/15 8:19 PM: - Research notes. Case 1, HTTP1.1 over TCP: SSN Start, TXN Start, TXN Close, SSN Close All is well! Case 2, HTTP 1.1 over SSL: SSN Start, TXN Start, TXN Close, SSN Close Also works. Case 2, HTTP 1.1 over SSL with a redirect (i.e. two request from the client over the same connection): SSN Start, TXN Start, TXN Close, TXN Start, SSN Close, TXN Close Problems. Case 3, HTTP 1.1 over H2: SSN Start X, SSN Start Y, TXN Start, SSN Close Y, TXN Close, SSN Close X Two Sessions, not so good. Need to recompile to get the SPDY case set up. was (Author: shinrich): Research notes. Case 1, HTTP1.1 over TCP: SSN Start, TXN Start, TXN Close, SSN Close All is well! Case 2, HTTP 1.1 over SSL: SSN Start, TXN Start, TXN Close, TXN Start, SSN Close, TXN Close Look like the keep_alive logic is being triggered by a read ready to set up a new TXN Case 3, HTTP 1.1 over H2: SSN Start X, SSN Start Y, TXN Start, SSN Close Y, TXN Close, SSN Close X Two Sessions, not so good. Need to recompile to get the SPDY case set up. SSN Close called before TXN Close - Key: TS-1007 URL: https://issues.apache.org/jira/browse/TS-1007 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.0.1 Reporter: Nick Kew Assignee: Susan Hinrichs Labels: incompatible Fix For: 6.0.0 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the SSN_CLOSE_HOOK is called first of the two. This messes up normal cleanups! Details: Register a SSN_START event globally In the SSN START, add a TXN_START and a SSN_CLOSE In the TXN START, add a TXN_CLOSE Stepping through, I see the order of events actually called, for the simple case of a one-off HTTP request with no keepalive: SSN_START TXN_START SSN_END TXN_END Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the TXN! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3871) VC Migration Can Lose Events
[ https://issues.apache.org/jira/browse/TS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3871: -- Assignee: Susan Hinrichs VC Migration Can Lose Events Key: TS-3871 URL: https://issues.apache.org/jira/browse/TS-3871 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Susan Hinrichs Assignee: Susan Hinrichs Found this in my stress testing. Sometimes the POST or GET response is completely empty. No header and no body. The packet capture shows that ATS closes the connection 70 seconds after the last POST or GET of the connection was received. This corresponds to the proxy.config.http.keep_alive_no_activity_timeout_in on my test box. I moved from global pool to local pool and the problem went away. I eventually tracked it down to a problem in the epoll update. ep.start() during the migration would fail sometimes with EEXIST error. This means that the file descriptor is already associated with the epoll. If we are migrating from thread A to thread B this should not be the case. Unless we when from thread B to thread A and back to thread B without cleaning up the original thread B epoll. If this is happening, then multiple threads will be processing network events which seems like a recipe for disaster and dropped events. Originally, I left the ep.stop() which clears the epoll on the original thread's epoll structure to be done by the original thread. But under stress that seems to be a bad idea. Too much drift. With some more research, it appears that the epoll calls are thread safe. http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html I rearranged the code to do both the ep.stop() and ep.start() in the same migrating target thread, and my stress test had no more problems. I've run this patch on a production machine for over 12 hours with no crashes and no performance discrepancies. We will be expanding this testing. To repeat, this is not a problem we saw in production, but only in my make it fall over stress test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3871) VC Migration Can Lose Events
Susan Hinrichs created TS-3871: -- Summary: VC Migration Can Lose Events Key: TS-3871 URL: https://issues.apache.org/jira/browse/TS-3871 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Susan Hinrichs Found this in my stress testing. Sometimes the POST or GET response is completely empty. No header and no body. The packet capture shows that ATS closes the connection 70 seconds after the last POST or GET of the connection was received. This corresponds to the proxy.config.http.keep_alive_no_activity_timeout_in on my test box. I moved from global pool to local pool and the problem went away. I eventually tracked it down to a problem in the epoll update. ep.start() during the migration would fail sometimes with EEXIST error. This means that the file descriptor is already associated with the epoll. If we are migrating from thread A to thread B this should not be the case. Unless we when from thread B to thread A and back to thread B without cleaning up the original thread B epoll. If this is happening, then multiple threads will be processing network events which seems like a recipe for disaster and dropped events. Originally, I left the ep.stop() which clears the epoll on the original thread's epoll structure to be done by the original thread. But under stress that seems to be a bad idea. Too much drift. With some more research, it appears that the epoll calls are thread safe. http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html I rearranged the code to do both the ep.stop() and ep.start() in the same migrating target thread, and my stress test had no more problems. I've run this patch on a production machine for over 12 hours with no crashes and no performance discrepancies. We will be expanding this testing. To repeat, this is not a problem we saw in production, but only in my make it fall over stress test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TS-3777) TSHttpConnect and POST request does not fire TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS
[ https://issues.apache.org/jira/browse/TS-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs closed TS-3777. -- Resolution: Fixed TSHttpConnect and POST request does not fire TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS Key: TS-3777 URL: https://issues.apache.org/jira/browse/TS-3777 Project: Traffic Server Issue Type: Bug Components: TS API Reporter: Daniel Vitor Morilha Assignee: Susan Hinrichs Labels: yahoo Fix For: 6.1.0 Attachments: ts-3777-2.diff, ts-3777-3.diff, ts-3777-4.diff, ts-3777.diff When using TSHttpConnect to connect to ATS itself (internal vconnection), sending a POST request and receiving a CHUNKED response. ATS does not fire neither TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS. Trying to close the vconnection from the plug-in after receiving the last chunk (\r\n0\r\n) results into the PluginVC repeating the following message: {noformat} [Jul 14 21:24:06.094] Server {0x77fbe800} DEBUG: (pvc_event) [0] Passive: Received event 1 {noformat} I am glad to provide an example if that helps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3871) VC Migration Can Lose Events
[ https://issues.apache.org/jira/browse/TS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3871: --- Attachment: ts-3871.diff VC Migration Can Lose Events Key: TS-3871 URL: https://issues.apache.org/jira/browse/TS-3871 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Susan Hinrichs Assignee: Susan Hinrichs Attachments: ts-3871.diff Found this in my stress testing. Sometimes the POST or GET response is completely empty. No header and no body. The packet capture shows that ATS closes the connection 70 seconds after the last POST or GET of the connection was received. This corresponds to the proxy.config.http.keep_alive_no_activity_timeout_in on my test box. I moved from global pool to local pool and the problem went away. I eventually tracked it down to a problem in the epoll update. ep.start() during the migration would fail sometimes with EEXIST error. This means that the file descriptor is already associated with the epoll. If we are migrating from thread A to thread B this should not be the case. Unless we when from thread B to thread A and back to thread B without cleaning up the original thread B epoll. If this is happening, then multiple threads will be processing network events which seems like a recipe for disaster and dropped events. Originally, I left the ep.stop() which clears the epoll on the original thread's epoll structure to be done by the original thread. But under stress that seems to be a bad idea. Too much drift. With some more research, it appears that the epoll calls are thread safe. http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html I rearranged the code to do both the ep.stop() and ep.start() in the same migrating target thread, and my stress test had no more problems. I've run this patch on a production machine for over 12 hours with no crashes and no performance discrepancies. We will be expanding this testing. To repeat, this is not a problem we saw in production, but only in my make it fall over stress test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3486) Segfault in do_io_write with plugin (??)
[ https://issues.apache.org/jira/browse/TS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727739#comment-14727739 ] Susan Hinrichs commented on TS-3486: server_session_sharing_pool is overridable in 5.3, but not in master Should be able to change the reference to sm->t_state.txn_conf->server_session_sharing_pool > Segfault in do_io_write with plugin (??) > > > Key: TS-3486 > URL: https://issues.apache.org/jira/browse/TS-3486 > Project: Traffic Server > Issue Type: Bug >Affects Versions: 5.2.0, 5.3.0 >Reporter: Qiang Li >Assignee: Susan Hinrichs > Labels: crash > Fix For: 6.0.0 > > Attachments: ts-3266-2.diff, ts-3266-complete.diff, > ts3486-ptrace.txt.gz > > > {code} > (gdb) bt > #0 0x005bdb8b in HttpServerSession::do_io_write (this= optimized out>, c=0x2aaadccc4bf0, nbytes=576, buf=0x2aaafc2ffee8, owner=false) > at HttpServerSession.cc:104 > #1 0x005acc1d in HttpSM::setup_server_send_request > (this=0x2aaadccc4bf0) at HttpSM.cc:5686 > #2 0x005b3f85 in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at > HttpSM.cc:1520 > #3 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, > event=6, data=0x0) at HttpSM.cc:1455 > #4 0x005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, > event=6, data=0x0) at HttpSM.cc:1275 > #5 0x004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, > event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614 > #6 0x2ba118441c89 in cachefun (contp=, event= optimized out>, edata=0x2aaadccc4bf0) at main.cpp:1876 > #7 0x005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, > event=, data=) at HttpSM.cc:1381 > #8 0x005b627d in HttpSM::do_http_server_open (this=0x2aaadccc4bf0, > raw=) at HttpSM.cc:4639 > #9 0x005baa04 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at > HttpSM.cc:7021 > #10 0x005b25a3 in HttpSM::state_cache_open_write > (this=0x2aaadccc4bf0, event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2442 > #11 0x005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, > event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2554 > #12 0x0059338a in handleEvent (this=0x2aaadccc6618, event= optimized out>, data=0x2aab1c3b6800) at > ../../iocore/eventsystem/I_Continuation.h:145 > #13 HttpCacheSM::state_cache_open_write (this=0x2aaadccc6618, event= optimized out>, data=0x2aab1c3b6800) at HttpCacheSM.cc:167 > #14 0x00697223 in handleEvent (this=0x2aab1c3b6800, event= optimized out>) at ../../iocore/eventsystem/I_Continuation.h:145 > #15 CacheVC::callcont (this=0x2aab1c3b6800, event=) at > ../../iocore/cache/P_CacheInternal.h:662 > #16 0x00715940 in Cache::open_write (this=, > cont=, key=0x2ba0ff762d70, info=, > apin_in_cache=46914401429576, type=CACHE_FRAG_TYPE_HTTP, > hostname=0x2aaadd281078 > "www.mifangba.comhttpapi.phpwww.mifangba.comhttp://www.mifangba.com/api.php?op=count=4=12;, > host_len=16) at CacheWrite.cc:1788 > #17 0x006e5765 in open_write (this=, > cont=0x2aaadccc6618, expected_size=, url=0x2aaadccc5310, > cluster_cache_local=, request=, > old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at > P_CacheInternal.h:1093 > #18 CacheProcessor::open_write (this=, > cont=0x2aaadccc6618, expected_size=, url=0x2aaadccc5310, > cluster_cache_local=, request=, > old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3622 > #19 0x005936f0 in HttpCacheSM::open_write (this=0x2aaadccc6618, > url=, request=, old_info= optimized out>, > pin_in_cache=, retry=, > allow_multiple=false) at HttpCacheSM.cc:298 > #20 0x005a022e in HttpSM::do_cache_prepare_action > (this=0x2aaadccc4bf0, c_sm=0x2aaadccc6618, object_read_info=0x0, retry=true, > allow_multiple=false) at HttpSM.cc:4511 > #21 0x005babd9 in do_cache_prepare_write (this=0x2aaadccc4bf0) at > HttpSM.cc:4436 > #22 HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:7098 > #23 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at > HttpSM.cc:1517 > #24 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, > event=0, data=0x0) at HttpSM.cc:1455 > #25 0x005ba712 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at > HttpSM.cc:6876 > #26 0x005ba702 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at > HttpSM.cc:6919 > #27 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at > HttpSM.cc:1517 > #28 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, > event=6, data=0x0) at HttpSM.cc:1455 > #29 0x005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, > event=6, data=0x0) at HttpSM.cc:1275 > #30 0x004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, >
[jira] [Created] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error
Susan Hinrichs created TS-3894: -- Summary: Missing NULL checks in HttpSM::handle_server_setup_error Key: TS-3894 URL: https://issues.apache.org/jira/browse/TS-3894 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Susan Hinrichs In error cases, there may not be a consumer when expected. Missing NULL checks on the consumer variable c can result in crashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error
[ https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3894: -- Assignee: Susan Hinrichs > Missing NULL checks in HttpSM::handle_server_setup_error > > > Key: TS-3894 > URL: https://issues.apache.org/jira/browse/TS-3894 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > > In error cases, there may not be a consumer when expected. Missing NULL > checks on the consumer variable c can result in crashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3871) VC Migration Can Lose Events
[ https://issues.apache.org/jira/browse/TS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731423#comment-14731423 ] Susan Hinrichs commented on TS-3871: Have been running with this fix in production for approximately a week. > VC Migration Can Lose Events > > > Key: TS-3871 > URL: https://issues.apache.org/jira/browse/TS-3871 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Fix For: 6.1.0 > > Attachments: ts-3871.diff > > > Found this in my stress testing. Sometimes the POST or GET response is > completely empty. No header and no body. The packet capture shows that ATS > closes the connection 70 seconds after the last POST or GET of the connection > was received. This corresponds to the > proxy.config.http.keep_alive_no_activity_timeout_in on my test box. > I moved from global pool to local pool and the problem went away. > I eventually tracked it down to a problem in the epoll update. ep.start() > during the migration would fail sometimes with EEXIST error. This means that > the file descriptor is already associated with the epoll. If we are > migrating from thread A to thread B this should not be the case. Unless we > when from thread B to thread A and back to thread B without cleaning up the > original thread B epoll. If this is happening, then multiple threads will be > processing network events which seems like a recipe for disaster and dropped > events. > Originally, I left the ep.stop() which clears the epoll on the original > thread's epoll structure to be done by the original thread. But under stress > that seems to be a bad idea. Too much drift. With some more research, it > appears that the epoll calls are thread safe. > http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html > I rearranged the code to do both the ep.stop() and ep.start() in the same > migrating target thread, and my stress test had no more problems. > I've run this patch on a production machine for over 12 hours with no crashes > and no performance discrepancies. We will be expanding this testing. > To repeat, this is not a problem we saw in production, but only in my "make > it fall over" stress test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3486) Segfault in do_io_write with plugin (??)
[ https://issues.apache.org/jira/browse/TS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729021#comment-14729021 ] Susan Hinrichs commented on TS-3486: Should be fine. The change of the server_session_sharing_pool from overridable to not was a clean up side-effect of the great server_session_sharing specification debate (specifically TS-3712). For this crash fix, we need to determine the whether the pool is global or per thread. That it is read from the override config in 5.3 is not a big issue as long as we are reading it from the correct location for the build. > Segfault in do_io_write with plugin (??) > > > Key: TS-3486 > URL: https://issues.apache.org/jira/browse/TS-3486 > Project: Traffic Server > Issue Type: Bug >Affects Versions: 5.2.0, 5.3.0 >Reporter: Qiang Li >Assignee: Susan Hinrichs > Labels: crash > Fix For: 5.3.2, 6.0.0 > > Attachments: ts-3266-2.diff, ts-3266-complete.diff, > ts3486-ptrace.txt.gz > > > {code} > (gdb) bt > #0 0x005bdb8b in HttpServerSession::do_io_write (this= optimized out>, c=0x2aaadccc4bf0, nbytes=576, buf=0x2aaafc2ffee8, owner=false) > at HttpServerSession.cc:104 > #1 0x005acc1d in HttpSM::setup_server_send_request > (this=0x2aaadccc4bf0) at HttpSM.cc:5686 > #2 0x005b3f85 in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at > HttpSM.cc:1520 > #3 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, > event=6, data=0x0) at HttpSM.cc:1455 > #4 0x005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, > event=6, data=0x0) at HttpSM.cc:1275 > #5 0x004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, > event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614 > #6 0x2ba118441c89 in cachefun (contp=, event= optimized out>, edata=0x2aaadccc4bf0) at main.cpp:1876 > #7 0x005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, > event=, data=) at HttpSM.cc:1381 > #8 0x005b627d in HttpSM::do_http_server_open (this=0x2aaadccc4bf0, > raw=) at HttpSM.cc:4639 > #9 0x005baa04 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at > HttpSM.cc:7021 > #10 0x005b25a3 in HttpSM::state_cache_open_write > (this=0x2aaadccc4bf0, event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2442 > #11 0x005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, > event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2554 > #12 0x0059338a in handleEvent (this=0x2aaadccc6618, event= optimized out>, data=0x2aab1c3b6800) at > ../../iocore/eventsystem/I_Continuation.h:145 > #13 HttpCacheSM::state_cache_open_write (this=0x2aaadccc6618, event= optimized out>, data=0x2aab1c3b6800) at HttpCacheSM.cc:167 > #14 0x00697223 in handleEvent (this=0x2aab1c3b6800, event= optimized out>) at ../../iocore/eventsystem/I_Continuation.h:145 > #15 CacheVC::callcont (this=0x2aab1c3b6800, event=) at > ../../iocore/cache/P_CacheInternal.h:662 > #16 0x00715940 in Cache::open_write (this=, > cont=, key=0x2ba0ff762d70, info=, > apin_in_cache=46914401429576, type=CACHE_FRAG_TYPE_HTTP, > hostname=0x2aaadd281078 > "www.mifangba.comhttpapi.phpwww.mifangba.comhttp://www.mifangba.com/api.php?op=count=4=12;, > host_len=16) at CacheWrite.cc:1788 > #17 0x006e5765 in open_write (this=, > cont=0x2aaadccc6618, expected_size=, url=0x2aaadccc5310, > cluster_cache_local=, request=, > old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at > P_CacheInternal.h:1093 > #18 CacheProcessor::open_write (this=, > cont=0x2aaadccc6618, expected_size=, url=0x2aaadccc5310, > cluster_cache_local=, request=, > old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3622 > #19 0x005936f0 in HttpCacheSM::open_write (this=0x2aaadccc6618, > url=, request=, old_info= optimized out>, > pin_in_cache=, retry=, > allow_multiple=false) at HttpCacheSM.cc:298 > #20 0x005a022e in HttpSM::do_cache_prepare_action > (this=0x2aaadccc4bf0, c_sm=0x2aaadccc6618, object_read_info=0x0, retry=true, > allow_multiple=false) at HttpSM.cc:4511 > #21 0x005babd9 in do_cache_prepare_write (this=0x2aaadccc4bf0) at > HttpSM.cc:4436 > #22 HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:7098 > #23 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at > HttpSM.cc:1517 > #24 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, > event=0, data=0x0) at HttpSM.cc:1455 > #25 0x005ba712 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at > HttpSM.cc:6876 > #26 0x005ba702 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at > HttpSM.cc:6919 > #27 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at > HttpSM.cc:1517 > #28 0x005b45f8 in
[jira] [Created] (TS-3901) Leaking connections from HttpSessionManager
Susan Hinrichs created TS-3901: -- Summary: Leaking connections from HttpSessionManager Key: TS-3901 URL: https://issues.apache.org/jira/browse/TS-3901 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Susan Hinrichs Observed in production. Got the following warnings in diags.log "Connection leak from http keep-alive system" Our connections to origin would increase and the number of connections in CLOSE_WAIT were enormous. I think the issue was when the origin URL was http with default port. That URL was remapped to https with default port. The default port stored in HttpServerSession->server_ip was not updated. When the connection was closed or timed out of the session pool, it would be looked up with port 443. But the session was stored via the server_ip value with port 80 and would never match. Relatively small change in HTTPHdr::_file_target_cache. Running the fix in production to verify early results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TS-3871) VC Migration Can Lose Events
[ https://issues.apache.org/jira/browse/TS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3871. Resolution: Fixed > VC Migration Can Lose Events > > > Key: TS-3871 > URL: https://issues.apache.org/jira/browse/TS-3871 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Fix For: 6.1.0 > > Attachments: ts-3871.diff > > > Found this in my stress testing. Sometimes the POST or GET response is > completely empty. No header and no body. The packet capture shows that ATS > closes the connection 70 seconds after the last POST or GET of the connection > was received. This corresponds to the > proxy.config.http.keep_alive_no_activity_timeout_in on my test box. > I moved from global pool to local pool and the problem went away. > I eventually tracked it down to a problem in the epoll update. ep.start() > during the migration would fail sometimes with EEXIST error. This means that > the file descriptor is already associated with the epoll. If we are > migrating from thread A to thread B this should not be the case. Unless we > when from thread B to thread A and back to thread B without cleaning up the > original thread B epoll. If this is happening, then multiple threads will be > processing network events which seems like a recipe for disaster and dropped > events. > Originally, I left the ep.stop() which clears the epoll on the original > thread's epoll structure to be done by the original thread. But under stress > that seems to be a bad idea. Too much drift. With some more research, it > appears that the epoll calls are thread safe. > http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html > I rearranged the code to do both the ep.stop() and ep.start() in the same > migrating target thread, and my stress test had no more problems. > I've run this patch on a production machine for over 12 hours with no crashes > and no performance discrepancies. We will be expanding this testing. > To repeat, this is not a problem we saw in production, but only in my "make > it fall over" stress test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3901) Leaking connections from HttpSessionManager
[ https://issues.apache.org/jira/browse/TS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3901: -- Assignee: Susan Hinrichs > Leaking connections from HttpSessionManager > --- > > Key: TS-3901 > URL: https://issues.apache.org/jira/browse/TS-3901 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > > Observed in production. Got the following warnings in diags.log > "Connection leak from http keep-alive system" > Our connections to origin would increase and the number of connections in > CLOSE_WAIT were enormous. > I think the issue was when the origin URL was http with default port. That > URL was remapped to https with default port. The default port stored in > HttpServerSession->server_ip was not updated. > When the connection was closed or timed out of the session pool, it would be > looked up with port 443. But the session was stored via the server_ip value > with port 80 and would never match. > Relatively small change in HTTPHdr::_file_target_cache. > Running the fix in production to verify early results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used
[ https://issues.apache.org/jira/browse/TS-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs closed TS-3905. -- Resolution: Duplicate > proxy.config.http.keep_alive_no_activity_timeout_out is not used > > > Key: TS-3905 > URL: https://issues.apache.org/jira/browse/TS-3905 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Fix For: 6.1.0 > > > The keep_alive_no_activity_timeout_in is set correctly on the > HttpClientSession when the transaction releases it. The client session is > then hanging out until the next transaction appears, and the > keep_alive_no_activity_timeout_in should apply instead of the > transaction_no_activity_timeout_in. > For the server session side, the keep_alive_no_activity_timeout_out and > transaction_no_activity_timeout_out should apply. The > keep_alive_no_activity_timeout_out does get set correctly when the server > session is attached to the client session to timeout via the > HttpClientSession::attach_server_session_method(). > But in ServerSessionPool::releaseSession, the following is called > {code} > ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout()); > {code} > My reading is that this will reset the inactivity timeout of the server > session to whatever it was last set to. Instead it should set the inactivity > timeout to keep_alive_no_activity_timeout_out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used
[ https://issues.apache.org/jira/browse/TS-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742160#comment-14742160 ] Susan Hinrichs commented on TS-3905: [~zwoop] you are right. TS-3312 addresses this issue. I filed this bug based on code inspection. It didn't appear that we were using the keep_alive_timeout_out parameter. But reviewing this bug, I see that we are using that parameter although in kind of an odd path to enable parameter override. I'll verify it runs for my scenario Monday, but I assume if it meets LinkedIn's needs it will also work for me, so I'm closing as a duplicate. > proxy.config.http.keep_alive_no_activity_timeout_out is not used > > > Key: TS-3905 > URL: https://issues.apache.org/jira/browse/TS-3905 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Fix For: 6.1.0 > > > The keep_alive_no_activity_timeout_in is set correctly on the > HttpClientSession when the transaction releases it. The client session is > then hanging out until the next transaction appears, and the > keep_alive_no_activity_timeout_in should apply instead of the > transaction_no_activity_timeout_in. > For the server session side, the keep_alive_no_activity_timeout_out and > transaction_no_activity_timeout_out should apply. The > keep_alive_no_activity_timeout_out does get set correctly when the server > session is attached to the client session to timeout via the > HttpClientSession::attach_server_session_method(). > But in ServerSessionPool::releaseSession, the following is called > {code} > ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout()); > {code} > My reading is that this will reset the inactivity timeout of the server > session to whatever it was last set to. Instead it should set the inactivity > timeout to keep_alive_no_activity_timeout_out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3898) Connection to the origin can allocate 1MB of iobuffers
[ https://issues.apache.org/jira/browse/TS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3898: -- Assignee: Susan Hinrichs (was: kang li) > Connection to the origin can allocate 1MB of iobuffers > -- > > Key: TS-3898 > URL: https://issues.apache.org/jira/browse/TS-3898 > Project: Traffic Server > Issue Type: Improvement > Components: HTTP >Affects Versions: 5.3.0, 6.0.0 >Reporter: Bryan Call >Assignee: Susan Hinrichs > Labels: yahoo > Fix For: 6.1.0 > > > When connecting to an origin there can be 1MB of iobuffers allocated. This > happens under TLS and non-TLS. Seems like it happens when the origin doesn't > supply a content-length. More investigation is needed. > Configuration: > {code} > [bcall@homer trafficserver]$ tail -1 /usr/local/etc/trafficserver/remap.config > map / https://www.flickr.com > {code} > Client: > {code} > [bcall@homer trafficserver]$ curl -D - -k https://127.0.0.1:4443/ > [bcall@homer trafficserver]$ sudo kill -SIGUSR1 $(pidof traffic_server){code} > Server: > {code} > allocated |in-use | type size | free list name > |||-- > 1048576 | 0 | 32768 | > memory/ioBufAllocator[8] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3898) Connection to the origin can allocate 1MB of iobuffers
[ https://issues.apache.org/jira/browse/TS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3898: -- Assignee: Susan Hinrichs > Connection to the origin can allocate 1MB of iobuffers > -- > > Key: TS-3898 > URL: https://issues.apache.org/jira/browse/TS-3898 > Project: Traffic Server > Issue Type: Improvement > Components: HTTP >Affects Versions: 5.3.0, 6.0.0 >Reporter: Bryan Call >Assignee: Susan Hinrichs > Labels: yahoo > Fix For: 6.1.0 > > > When connecting to an origin there can be 1MB of iobuffers allocated. This > happens under TLS and non-TLS. Seems like it happens when the origin doesn't > supply a content-length. More investigation is needed. > Configuration: > {code} > [bcall@homer trafficserver]$ tail -1 /usr/local/etc/trafficserver/remap.config > map / https://www.flickr.com > {code} > Client: > {code} > [bcall@homer trafficserver]$ curl -D - -k https://127.0.0.1:4443/ > [bcall@homer trafficserver]$ sudo kill -SIGUSR1 $(pidof traffic_server){code} > Server: > {code} > allocated |in-use | type size | free list name > |||-- > 1048576 | 0 | 32768 | > memory/ioBufAllocator[8] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3898) Connection to the origin can allocate 1MB of iobuffers
[ https://issues.apache.org/jira/browse/TS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3898: --- Assignee: Bryan Call (was: Susan Hinrichs) > Connection to the origin can allocate 1MB of iobuffers > -- > > Key: TS-3898 > URL: https://issues.apache.org/jira/browse/TS-3898 > Project: Traffic Server > Issue Type: Improvement > Components: HTTP >Affects Versions: 5.3.0, 6.0.0 >Reporter: Bryan Call >Assignee: Bryan Call > Labels: yahoo > Fix For: 6.1.0 > > > When connecting to an origin there can be 1MB of iobuffers allocated. This > happens under TLS and non-TLS. Seems like it happens when the origin doesn't > supply a content-length. More investigation is needed. > Configuration: > {code} > [bcall@homer trafficserver]$ tail -1 /usr/local/etc/trafficserver/remap.config > map / https://www.flickr.com > {code} > Client: > {code} > [bcall@homer trafficserver]$ curl -D - -k https://127.0.0.1:4443/ > [bcall@homer trafficserver]$ sudo kill -SIGUSR1 $(pidof traffic_server){code} > Server: > {code} > allocated |in-use | type size | free list name > |||-- > 1048576 | 0 | 32768 | > memory/ioBufAllocator[8] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3898) Connection to the origin can allocate 1MB of iobuffers
[ https://issues.apache.org/jira/browse/TS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3898: --- Assignee: kang li (was: Susan Hinrichs) > Connection to the origin can allocate 1MB of iobuffers > -- > > Key: TS-3898 > URL: https://issues.apache.org/jira/browse/TS-3898 > Project: Traffic Server > Issue Type: Improvement > Components: HTTP >Affects Versions: 5.3.0, 6.0.0 >Reporter: Bryan Call >Assignee: kang li > Labels: yahoo > Fix For: 6.1.0 > > > When connecting to an origin there can be 1MB of iobuffers allocated. This > happens under TLS and non-TLS. Seems like it happens when the origin doesn't > supply a content-length. More investigation is needed. > Configuration: > {code} > [bcall@homer trafficserver]$ tail -1 /usr/local/etc/trafficserver/remap.config > map / https://www.flickr.com > {code} > Client: > {code} > [bcall@homer trafficserver]$ curl -D - -k https://127.0.0.1:4443/ > [bcall@homer trafficserver]$ sudo kill -SIGUSR1 $(pidof traffic_server){code} > Server: > {code} > allocated |in-use | type size | free list name > |||-- > 1048576 | 0 | 32768 | > memory/ioBufAllocator[8] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3909) SSLNextProtocolTrampoline heap-use-after-free
[ https://issues.apache.org/jira/browse/TS-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3909: --- Attachment: ts-3909.diff The patch in ts-3909.diff has been useful in reducing (eliminating?) the crash resulting from this ASAN in production. As I recall we tried a similar patch for ts-3710, but it did not eliminate the ASAN. > SSLNextProtocolTrampoline heap-use-after-free > - > > Key: TS-3909 > URL: https://issues.apache.org/jira/browse/TS-3909 > Project: Traffic Server > Issue Type: Bug > Components: SSL >Affects Versions: 6.0.0 >Reporter: Bryan Call >Assignee: Susan Hinrichs > Fix For: 6.0.0 > > Attachments: ts-3909.diff > > > {code} > ==6232==ERROR: AddressSanitizer: heap-use-after-free on address > 0x606000538880 at pc 0x9c851c bp 0x2ac88a2d4880 sp 0x2ac88a2d4878 > READ of size 8 at 0x606000538880 thread T24 ([ET_NET 23]) > #0 0x9c851b in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNextProtocolAccept.cc:108 > #1 0x531046 in Continuation::handleEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146 > #2 0x9f4040 in read_signal_and_update > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:145 > #3 0x9f46f4 in read_signal_done > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:206 > #4 0x9fa8a1 in UnixNetVConnection::readSignalDone(int, NetHandler*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:1006 > #5 0x9bdd96 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:542 > #6 0x9e1a02 in NetHandler::mainNetEvent(int, Event*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:516 > #7 0x531046 in Continuation::handleEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146 > #8 0xa405e4 in EThread::process_event(Event*, int) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #9 0xa411fc in EThread::execute() > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:252 > #10 0xa3ebbd in spawn_thread_internal > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/Thread.cc:86 > #11 0x2ac87d9badf4 in start_thread (/lib64/libpthread.so.0+0x7df4) > #12 0x2ac87e74b1ac in __clone (/lib64/libc.so.6+0xf61ac) > 0x606000538880 is located 0 bytes inside of 56-byte region > [0x606000538880,0x6060005388b8) > freed by thread T24 ([ET_NET 23]) here: > #0 0x2ac87acd6127 in operator delete(void*) > ../../.././libsanitizer/asan/asan_new_delete.cc:81 > #1 0x9c8613 in SSLNextProtocolTrampoline::~SSLNextProtocolTrampoline() > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNextProtocolAccept.cc:66 > #2 0x9c83ea in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89 > #3 0x531046 in Continuation::handleEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146 > #4 0x9f4040 in read_signal_and_update > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:145 > #5 0x9fbe75 in UnixNetVConnection::mainEvent(int, Event*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:1175 > #6 0x531046 in Continuation::handleEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146 > #7 0x9e35e4 in NetHandler::_close_vc(UnixNetVConnection*, long, int&, > int&, int&, int&) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:678 > #8 0x9e2c01 in NetHandler::manage_keep_alive_queue() > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:634 > #9 0x9e3882 in NetHandler::add_to_keep_alive_queue(UnixNetVConnection*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:699 > #10 0x9ddb48 in UnixNetVConnection::add_to_keep_alive_queue() > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixConnection.cc:397 > #11 0x759044 in SpdyClientSession::init(NetVConnection*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/spdy/SpdyClientSession.cc:116 > #12 0x7598da in SpdyClientSession::new_connection(NetVConnection*, > MIOBuffer*, IOBufferReader*, bool) > /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/spdy/SpdyClientSession.cc:193 > #13 0x7582dc in SpdySessionAccept::mainEvent(int, void*) >
[jira] [Commented] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used
[ https://issues.apache.org/jira/browse/TS-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741497#comment-14741497 ] Susan Hinrichs commented on TS-3905: Actually as I look into get_inactivity_timeout some more, I question the validity of the code above in all cases. The three instances of that line in HttpSessionManager.cc should be reviewed. > proxy.config.http.keep_alive_no_activity_timeout_out is not used > > > Key: TS-3905 > URL: https://issues.apache.org/jira/browse/TS-3905 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > > The keep_alive_no_activity_timeout_in is set correctly on the > HttpClientSession when the transaction releases it. The client session is > then hanging out until the next transaction appears, and the > keep_alive_no_activity_timeout_in should apply instead of the > transaction_no_activity_timeout_in. > For the server session side, the keep_alive_no_activity_timeout_out and > transaction_no_activity_timeout_out should apply. The > keep_alive_no_activity_timeout_out does get set correctly when the server > session is attached to the client session to timeout via the > HttpClientSession::attach_server_session_method(). > But in ServerSessionPool::releaseSession, the following is called > {code} > ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout()); > {code} > My reading is that this will reset the inactivity timeout of the server > session to whatever it was last set to. Instead it should set the inactivity > timeout to keep_alive_no_activity_timeout_out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used
Susan Hinrichs created TS-3905: -- Summary: proxy.config.http.keep_alive_no_activity_timeout_out is not used Key: TS-3905 URL: https://issues.apache.org/jira/browse/TS-3905 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Susan Hinrichs The keep_alive_no_activity_timeout_in is set correctly on the HttpClientSession when the transaction releases it. The client session is then hanging out until the next transaction appears, and the keep_alive_no_activity_timeout_in should apply instead of the transaction_no_activity_timeout_in. For the server session side, the keep_alive_no_activity_timeout_out and transaction_no_activity_timeout_out should apply. The keep_alive_no_activity_timeout_out does get set correctly when the server session is attached to the client session to timeout via the HttpClientSession::attach_server_session_method(). But in ServerSessionPool::releaseSession, the following is called {code} ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout()); {code} My reading is that this will reset the inactivity timeout of the server session to whatever it was last set to. Instead it should set the inactivity timeout to keep_alive_no_activity_timeout_out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used
[ https://issues.apache.org/jira/browse/TS-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3905: -- Assignee: Susan Hinrichs > proxy.config.http.keep_alive_no_activity_timeout_out is not used > > > Key: TS-3905 > URL: https://issues.apache.org/jira/browse/TS-3905 > Project: Traffic Server > Issue Type: Bug > Components: HTTP >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > > The keep_alive_no_activity_timeout_in is set correctly on the > HttpClientSession when the transaction releases it. The client session is > then hanging out until the next transaction appears, and the > keep_alive_no_activity_timeout_in should apply instead of the > transaction_no_activity_timeout_in. > For the server session side, the keep_alive_no_activity_timeout_out and > transaction_no_activity_timeout_out should apply. The > keep_alive_no_activity_timeout_out does get set correctly when the server > session is attached to the client session to timeout via the > HttpClientSession::attach_server_session_method(). > But in ServerSessionPool::releaseSession, the following is called > {code} > ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout()); > {code} > My reading is that this will reset the inactivity timeout of the server > session to whatever it was last set to. Instead it should set the inactivity > timeout to keep_alive_no_activity_timeout_out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3910) SSLNetVConnection and add_to_active_queue heap-use-after-free
[ https://issues.apache.org/jira/browse/TS-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744385#comment-14744385 ] Susan Hinrichs commented on TS-3910: The vc is freed from read_signal_and_update by calling close_UnixNetVConnection directly. Should probably call vc->do_io_close to clear vio's. But I think that would only delay the problem. The use stack is interesting. The vc first referenced in frame 10 is not the freed vc. Rather the HttpClientSession::client_vc is the freed vc. I cannot immediately see how vc != HttpClientSession::client_vc when the HttpClientSession was stored as the _cont for the read_vio associated with vc. But that must be what is happening. > SSLNetVConnection and add_to_active_queue heap-use-after-free > - > > Key: TS-3910 > URL: https://issues.apache.org/jira/browse/TS-3910 > Project: Traffic Server > Issue Type: Bug > Components: Network, SSL >Affects Versions: 6.0.0 >Reporter: Bryan Call > Fix For: 6.0.0 > > > {code} > ==15615==ERROR: AddressSanitizer: heap-use-after-free on address > 0x618000be6288 at pc 0x9e756d bp 0x2b14e4f317d0 sp 0x2b14e4f317c8 > WRITE of size 8 at 0x618000be6288 thread T6 ([ET_NET 5]) > #0 0x9e756c in DLLUnixNetVConnection::Link_active_queue_link>::insert(UnixNetVConnection*, > UnixNetVConnection*) (/home/y/bin64/traffic_server+0x9e756c) > #1 0x9e6b98 in Queue UnixNetVConnection::Link_active_queue_link>::insert(UnixNetVConnection*, > UnixNetVConnection*) (/home/y/bin64/traffic_server+0x9e6b98) > #2 0x9e5fe2 in Queue UnixNetVConnection::Link_active_queue_link>::enqueue(UnixNetVConnection*) > (/home/y/bin64/traffic_server+0x9e5fe2) > #3 0x9e3cc8 in NetHandler::add_to_active_queue(UnixNetVConnection*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:733 > #4 0x9ddbe8 in UnixNetVConnection::add_to_active_queue() > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixConnection.cc:409 > #5 0x64b34c in HttpClientSession::new_transaction() > /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/http/HttpClientSession.cc:124 > #6 0x64e27d in HttpClientSession::state_keep_alive(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/http/HttpClientSession.cc:415 > #7 0x531046 in Continuation::handleEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146 > #8 0x9f4040 in read_signal_and_update > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:145 > #9 0x9fa8c3 in UnixNetVConnection::readSignalAndUpdate(int) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:1013 > #10 0x9be342 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:605 > #11 0x9e1a02 in NetHandler::mainNetEvent(int, Event*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:516 > #12 0x531046 in Continuation::handleEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146 > #13 0xa405e4 in EThread::process_event(Event*, int) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #14 0xa411fc in EThread::execute() > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:252 > #15 0xa3ebbd in spawn_thread_internal > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/Thread.cc:86 > #16 0x2b14dce95df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > #17 0x2b14ddc261ac in __clone (/lib64/libc.so.6+0xf61ac) > 0x618000be6288 is located 520 bytes inside of 880-byte region > [0x618000be6080,0x618000be63f0) > freed by thread T6 ([ET_NET 5]) here: > #0 0x2b14da1b01d7 in __interceptor_free > ../../.././libsanitizer/asan/asan_malloc_linux.cc:62 > #1 0x2b14db0ab3b2 in ats_memalign_free > /home/bcall/ytrafficserver-6.0.x/trafficserver/lib/ts/ink_memory.cc:139 > #2 0x2b14db0abf60 in ink_freelist_free > /home/bcall/ytrafficserver-6.0.x/trafficserver/lib/ts/ink_queue.cc:292 > #3 0x9c7226 in > ClassAllocator::free(SSLNetVConnection*) > (/home/y/bin64/traffic_server+0x9c7226) > #4 0x9c1a72 in SSLNetVConnection::free(EThread*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:936 > #5 0x9f3f81 in close_UnixNetVConnection(UnixNetVConnection*, EThread*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:134 > #6 0x9f42f6 in read_signal_and_update > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:164 > #7 0x9f46f4 in
[jira] [Commented] (TS-3910) SSLNetVConnection and add_to_active_queue heap-use-after-free
[ https://issues.apache.org/jira/browse/TS-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745635#comment-14745635 ] Susan Hinrichs commented on TS-3910: One path that would allow the close_UnixNetConnection to free the VC but not take the VC out of the active_queue is if nh was NULL during the call to close_UnixNetVConnection. Off-hand, I don't see how that could happen. If the client _vc is being closed from the keep-alive pool, that implies it was fully set up. It might be useful to put a release assert of nh != NULL into the code and see where this occurs. Or add some logic to the assert to check whether the active_queue_link is set or not. > SSLNetVConnection and add_to_active_queue heap-use-after-free > - > > Key: TS-3910 > URL: https://issues.apache.org/jira/browse/TS-3910 > Project: Traffic Server > Issue Type: Bug > Components: Network, SSL >Affects Versions: 6.0.0 >Reporter: Bryan Call > Fix For: 6.0.0 > > > {code} > ==15615==ERROR: AddressSanitizer: heap-use-after-free on address > 0x618000be6288 at pc 0x9e756d bp 0x2b14e4f317d0 sp 0x2b14e4f317c8 > WRITE of size 8 at 0x618000be6288 thread T6 ([ET_NET 5]) > #0 0x9e756c in DLLUnixNetVConnection::Link_active_queue_link>::insert(UnixNetVConnection*, > UnixNetVConnection*) (/home/y/bin64/traffic_server+0x9e756c) > #1 0x9e6b98 in Queue UnixNetVConnection::Link_active_queue_link>::insert(UnixNetVConnection*, > UnixNetVConnection*) (/home/y/bin64/traffic_server+0x9e6b98) > #2 0x9e5fe2 in Queue UnixNetVConnection::Link_active_queue_link>::enqueue(UnixNetVConnection*) > (/home/y/bin64/traffic_server+0x9e5fe2) > #3 0x9e3cc8 in NetHandler::add_to_active_queue(UnixNetVConnection*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:733 > #4 0x9ddbe8 in UnixNetVConnection::add_to_active_queue() > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixConnection.cc:409 > #5 0x64b34c in HttpClientSession::new_transaction() > /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/http/HttpClientSession.cc:124 > #6 0x64e27d in HttpClientSession::state_keep_alive(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/http/HttpClientSession.cc:415 > #7 0x531046 in Continuation::handleEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146 > #8 0x9f4040 in read_signal_and_update > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:145 > #9 0x9fa8c3 in UnixNetVConnection::readSignalAndUpdate(int) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:1013 > #10 0x9be342 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:605 > #11 0x9e1a02 in NetHandler::mainNetEvent(int, Event*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:516 > #12 0x531046 in Continuation::handleEvent(int, void*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146 > #13 0xa405e4 in EThread::process_event(Event*, int) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:128 > #14 0xa411fc in EThread::execute() > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:252 > #15 0xa3ebbd in spawn_thread_internal > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/Thread.cc:86 > #16 0x2b14dce95df4 in start_thread (/lib64/libpthread.so.0+0x7df4) > #17 0x2b14ddc261ac in __clone (/lib64/libc.so.6+0xf61ac) > 0x618000be6288 is located 520 bytes inside of 880-byte region > [0x618000be6080,0x618000be63f0) > freed by thread T6 ([ET_NET 5]) here: > #0 0x2b14da1b01d7 in __interceptor_free > ../../.././libsanitizer/asan/asan_malloc_linux.cc:62 > #1 0x2b14db0ab3b2 in ats_memalign_free > /home/bcall/ytrafficserver-6.0.x/trafficserver/lib/ts/ink_memory.cc:139 > #2 0x2b14db0abf60 in ink_freelist_free > /home/bcall/ytrafficserver-6.0.x/trafficserver/lib/ts/ink_queue.cc:292 > #3 0x9c7226 in > ClassAllocator::free(SSLNetVConnection*) > (/home/y/bin64/traffic_server+0x9c7226) > #4 0x9c1a72 in SSLNetVConnection::free(EThread*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:936 > #5 0x9f3f81 in close_UnixNetVConnection(UnixNetVConnection*, EThread*) > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:134 > #6 0x9f42f6 in read_signal_and_update > /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:164 > #7 0x9f46f4 in read_signal_done >
[jira] [Updated] (TS-3072) Debug logging for a single connection in production traffic.
[ https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3072: --- Attachment: ts-3072.diff Re-activating the discussion. We recently deployed what [~sudheerv] suggested in production while tracking yet another tedious user-specific crash. I've attached the patch in ts-3072.diff. It is a surprisingly small code change. We changed how debug.enabled is interpreted to minimize the performance impact if one is not using the debug.client_ip feature. The client-ip value is only tested if debug.enabled is set to 2. Regular full debugging happens with debug.enabled set to 1. Nothing is checked if debug.enabled is set to 0. It was incredibly useful while tracking down our most recent fire. We didn't have to anticipate the need for a plugin. We were able to change the client_ip setting without restarting ATS. [~amc] has ideas for generalizing this technique to "taint" VC's for other more detailed tracking/debugging/monitoring. > Debug logging for a single connection in production traffic. > > > Key: TS-3072 > URL: https://issues.apache.org/jira/browse/TS-3072 > Project: Traffic Server > Issue Type: Improvement > Components: Core, Logging >Affects Versions: 5.0.1 >Reporter: Sudheer Vinukonda > Labels: Yahoo > Fix For: sometime > > Attachments: ts-3072.diff > > > Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is > really hard to isolate/debug with the high traffic. Turning on debug logs in > traffic is unfortunately not an option due to performance impacts. Even if > you took a performance hit and turned on the logs, it is just as hard to > separate out the logs for a single connection/transaction among the millions > of the logs output in a short period of time. > I think it would be good if there's a way to turn on debug logs in a > controlled manner in production environment. One simple option is to support > a config setting for example, with a client-ip, which when set, would turn on > debug logs for any connection made by just that one client. If needed, > instead of one client-ip, we may allow configuring up to 'n' (say, 5) > client-ips. > If there are other ideas, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3957) Core dump from SpdyClientSession::state_session_start
[ https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3957: --- Labels: yahoo (was: ) > Core dump from SpdyClientSession::state_session_start > - > > Key: TS-3957 > URL: https://issues.apache.org/jira/browse/TS-3957 > Project: Traffic Server > Issue Type: Bug > Components: SPDY >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Labels: yahoo > > We see this in production on machines under swap, so the timings are very > distorted. > {code} > gdb) bt > #0 0x in ?? () > #1 0x0064a5dc in SpdyClientSession::state_session_start > (this=0x2b234fbe8030) > at SpdyClientSession.cc:211 > #2 0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, > event=1, > data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145 > #3 0x0079a066 in EThread::process_event (this=0x2b21170a2010, > e=0x2b23eda76630, > calling_code=1) at UnixEThread.cc:128 > #4 0x0079a234 in EThread::execute (this=0x2b21170a2010) at > UnixEThread.cc:179 > #5 0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85 > #6 0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0 > #7 0x003827ee88fd in clone () from /lib64/libc.so.6 > {code} > After poking around on the core some more [~amc] and I determined that the vc > referenced by the SpdyClientSession was a freed object (the vtable pointer > was swizzled out to be the freelist next pointer). > We assume that the swapping is causing very odd event timing. We replaced > the schedule_immediate with a direct call that that seemed to solve our crash > in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3957) Core dump from SpdyClientSession::state_session_start
Susan Hinrichs created TS-3957: -- Summary: Core dump from SpdyClientSession::state_session_start Key: TS-3957 URL: https://issues.apache.org/jira/browse/TS-3957 Project: Traffic Server Issue Type: Bug Components: SPDY Reporter: Susan Hinrichs We see this in production on machines under swap, so the timings are very distorted. {code} gdb) bt #0 0x in ?? () #1 0x0064a5dc in SpdyClientSession::state_session_start (this=0x2b234fbe8030) at SpdyClientSession.cc:211 #2 0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, event=1, data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145 #3 0x0079a066 in EThread::process_event (this=0x2b21170a2010, e=0x2b23eda76630, calling_code=1) at UnixEThread.cc:128 #4 0x0079a234 in EThread::execute (this=0x2b21170a2010) at UnixEThread.cc:179 #5 0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85 #6 0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0 #7 0x003827ee88fd in clone () from /lib64/libc.so.6 {code} After poking around on the core some more [~amc] and I determined that the vc referenced by the SpdyClientSession was a freed object (the vtable pointer was swizzled out to be the freelist next pointer). We assume that the swapping is causing very odd event timing. We replaced the schedule_immediate with a direct call that that seemed to solve our crash in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3957) Core dump from SpdyClientSession::state_session_start
[ https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3957: -- Assignee: Susan Hinrichs > Core dump from SpdyClientSession::state_session_start > - > > Key: TS-3957 > URL: https://issues.apache.org/jira/browse/TS-3957 > Project: Traffic Server > Issue Type: Bug > Components: SPDY >Reporter: Susan Hinrichs >Assignee: Susan Hinrichs > Labels: yahoo > > We see this in production on machines under swap, so the timings are very > distorted. > {code} > gdb) bt > #0 0x in ?? () > #1 0x0064a5dc in SpdyClientSession::state_session_start > (this=0x2b234fbe8030) > at SpdyClientSession.cc:211 > #2 0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, > event=1, > data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145 > #3 0x0079a066 in EThread::process_event (this=0x2b21170a2010, > e=0x2b23eda76630, > calling_code=1) at UnixEThread.cc:128 > #4 0x0079a234 in EThread::execute (this=0x2b21170a2010) at > UnixEThread.cc:179 > #5 0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85 > #6 0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0 > #7 0x003827ee88fd in clone () from /lib64/libc.so.6 > {code} > After poking around on the core some more [~amc] and I determined that the vc > referenced by the SpdyClientSession was a freed object (the vtable pointer > was swizzled out to be the freelist next pointer). > We assume that the swapping is causing very odd event timing. We replaced > the schedule_immediate with a direct call that that seemed to solve our crash > in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.
[ https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945245#comment-14945245 ] Susan Hinrichs commented on TS-3072: Only ad hoc performance comparison so far. I'll run a sequence of tests on the stress test box. > Debug logging for a single connection in production traffic. > > > Key: TS-3072 > URL: https://issues.apache.org/jira/browse/TS-3072 > Project: Traffic Server > Issue Type: Improvement > Components: Core, Logging >Affects Versions: 5.0.1 >Reporter: Sudheer Vinukonda > Labels: Yahoo > Fix For: sometime > > Attachments: ts-3072.diff > > > Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is > really hard to isolate/debug with the high traffic. Turning on debug logs in > traffic is unfortunately not an option due to performance impacts. Even if > you took a performance hit and turned on the logs, it is just as hard to > separate out the logs for a single connection/transaction among the millions > of the logs output in a short period of time. > I think it would be good if there's a way to turn on debug logs in a > controlled manner in production environment. One simple option is to support > a config setting for example, with a client-ip, which when set, would turn on > debug logs for any connection made by just that one client. If needed, > instead of one client-ip, we may allow configuring up to 'n' (say, 5) > client-ips. > If there are other ideas, please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-315) Add switch to disable config file generation/runtime behavior changing
[ https://issues.apache.org/jira/browse/TS-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-315: -- Labels: A yahoo (was: A) > Add switch to disable config file generation/runtime behavior changing > -- > > Key: TS-315 > URL: https://issues.apache.org/jira/browse/TS-315 > Project: Traffic Server > Issue Type: Improvement > Components: Configuration >Reporter: Miles Libbey >Assignee: Bryan Call >Priority: Minor > Labels: A, yahoo > Fix For: sometime > > > (was yahoo bug 1863676) > Original description > by Michael S. Fischer 2 years ago at 2008-04-09 09:52 > In production, in order to improve site stability, it is imperative that TS > never accidentally overwrite its own > configuration files. > For this reason, we'd like to request a switch be added to TS, preferably via > the command line, that disables all > automatic configuration file generation or other runtime behavioral changes > initiated by any form of IPC other than > 'traffic_line -x' (including the web interface, etc.) > > > Comment 1 > by Bjornar Sandvik 2 years ago at 2008-04-09 09:57:17 > A very crucial request, in my opinion. If TS needs to be able to read > command-line config changes on the fly, these > changes should be stored in another config file (for example > remap.config.local instead of remap.config). We have a > patch config package that overwrites 4 of the config files under > /home/conf/ts/, and with all packages > we'd like to think that the content of these files can't change outside our > control. > > Comment 2 > by Bryan Call 2 years ago at 2008-04-09 11:02:46 > traffic_line -x doesn't modify the configuration, it reloads the > configuration files. If we want to have an option for > this it would be good to have it as an option configuration file (CONFIG > proxy.config.write_protect INT 1). > It would be an equivalent of write protecting floppies (ahh the memories)... > > > Comment 3 > by Michael S. Fischer 2 years ago at 2008-04-09 11:09:09 > I don't think it would be a good idea to have this in the configuration file, > as it would introduce a chicken/egg > problem. > > > Comment 4 > by Leif Hedstrom 19 months ago at 2008-08-27 12:43:17 > So I'm not 100% positive that this isn't just a bad interaction. Now, it's > only > triggered when trafficserver is running, but usually what ends up happening > is that we get a records.config which > looks like it's the default config that comes with the trafficserver package. > It's possible it's all one and the same issue, or we might have two issues. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)