[jira] [Created] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession

2015-07-21 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3784:
--

 Summary: Unpleasant debug assert in when starting up a 
SpdyClientSession
 Key: TS-3784
 URL: https://issues.apache.org/jira/browse/TS-3784
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Reporter: Susan Hinrichs


Noticed this while trying to reproduce [~oknet]'s issue on TS-3667.

I have a callback set on the SNI hook.  It selects a new certificate and 
reenables the vc before returning.  The stack is below.  The assert is because 
the current thread does not hold the read.vio mutex.  In fact no thread holds 
the read vio mutex.

For HttpClientSession and Http2ClientSession, they use the VC's mutex when 
setting up the vio's, so when the do_io_reads occur the mutex is automatically 
already held. 

If I change SpdyClientSession to use the VC mutex instead of creating a new 
mutex, this assert does not get triggered.  Not clear whether this is causing 
any real issues, but it seems cleaner to follow the mutex assignment strategy 
of the other protocols.

Here is the stack
{code}
#0  0x00351e4328a5 in raise () from /lib64/libc.so.6
#1  0x00351e434085 in abort () from /lib64/libc.so.6
#2  0x77dda215 in ink_die_die_die () at ink_error.cc:43
#3  0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag 
__va_list_tag *) (
fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at 
ink_error.cc:65
#4  0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: 
failed assert `%s`)
at ink_error.cc:73
#5  0x77dd7f12 in _ink_assert (
expression=0x826e48 vio-mutex-thread_holding == this_ethread()  
thread, 
file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37
#6  0x0077b4d7 in UnixNetVConnection::set_enabled (this=0x7fffb801c540, 
vio=0x7fffb801c660)
at UnixNetVConnection.cc:895
#7  0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, 
vio=0x7fffb801c660)
at UnixNetVConnection.cc:788
#8  0x00509755 in VIO::reenable (this=0x7fffb801c660) at 
../iocore/eventsystem/P_VIO.h:112
#9  0x0077a1da in UnixNetVConnection::do_io_read (this=0x7fffb801c540, 
c=0x7fffd402e3c0, 
nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628
#10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at 
SpdyClientSession.cc:210
#11 0x0054a1fa in ProxyClientSession::handle_api_return 
(this=0x7fffd402e3c0, event=6)
at ProxyClientSession.cc:167
#12 0x0054a142 in ProxyClientSession::do_api_callout 
(this=0x7fffd402e3c0, 
id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147
#13 0x00639303 in SpdyClientSession::new_connection 
(this=0x7fffd402e3c0, 
new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at 
SpdyClientSession.cc:195
#14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, 
event=202, 
edata=0x7fffb801c540) at SpdySessionAccept.cc:48
#15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, event=202, 
data=0x7fffb801c540)
at ../iocore/eventsystem/I_Continuation.h:146
#16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, 
edata=0x7fffb801c540)
at SSLNextProtocolAccept.cc:32
#17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent 
(this=0x7fffd40008e0, 
event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99
#18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, 
event=102, 
data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146
#19 0x0077871e in read_signal_and_update (event=102, vc=0x7fffb801c540)
---Type return to continue, or q return to quit---
at UnixNetVConnection.cc:145
#20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, 
vc=0x7fffb801c540)
at UnixNetVConnection.cc:206
#21 0x0077baac in UnixNetVConnection::readSignalDone 
(this=0x7fffb801c540, event=102, 
nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006
#22 0x0075e559 in SSLNetVConnection::net_read_io (this=0x7fffb801c540, 
nh=0x7fffef9b2be0, 
lthread=0x7fffef9af010) at SSLNetVConnection.cc:543
#23 0x00770b52 in NetHandler::mainNetEvent (this=0x7fffef9b2be0, 
event=5, e=0x1153690)
at UnixNet.cc:516
#24 0x0050970e in Continuation::handleEvent (this=0x7fffef9b2be0, 
event=5, data=0x1153690)
at ../iocore/eventsystem/I_Continuation.h:146
#25 0x0079aefa in EThread::process_event (this=0x7fffef9af010, 
e=0x1153690, calling_code=5)
at UnixEThread.cc:128
#26 0x0079b51b in EThread::execute (this=0x7fffef9af010) at 
UnixEThread.cc:252
#27 0x0079a414 in spawn_thread_internal (a=0x1532c60) at Thread.cc:86
#28 0x00351e807851 in start_thread () from /lib64/libpthread.so.0
#29 0x00351e4e890d in clone () from /lib64/libc.so.6
{code}



--
This message was sent by 

[jira] [Updated] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession

2015-07-21 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3784:
---
Description: 
Noticed this while trying to reproduce [~oknet]'s issue on TS-3667.

I have a callback set on the SNI hook.  It selects a new certificate and 
reenables the vc before returning.  The stack is below.  The assert is because 
the current thread does not hold the read.vio mutex.  In fact no thread holds 
the read vio mutex.

For HttpClientSession and Http2ClientSession, they use the VC's mutex instead 
of creating new mutex.   So that shared mutex is used when setting up the 
vio's, so when the do_io_reads occur the mutex is automatically already held. 

If I change SpdyClientSession to use the VC mutex instead of creating a new 
mutex, this assert does not get triggered.  Not clear whether this is causing 
any real issues, but it seems cleaner to follow the mutex assignment strategy 
of the other protocols.

Here is the stack
{code}
#0  0x00351e4328a5 in raise () from /lib64/libc.so.6
#1  0x00351e434085 in abort () from /lib64/libc.so.6
#2  0x77dda215 in ink_die_die_die () at ink_error.cc:43
#3  0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag 
__va_list_tag *) (
fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at 
ink_error.cc:65
#4  0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: 
failed assert `%s`)
at ink_error.cc:73
#5  0x77dd7f12 in _ink_assert (
expression=0x826e48 vio-mutex-thread_holding == this_ethread()  
thread, 
file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37
#6  0x0077b4d7 in UnixNetVConnection::set_enabled (this=0x7fffb801c540, 
vio=0x7fffb801c660)
at UnixNetVConnection.cc:895
#7  0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, 
vio=0x7fffb801c660)
at UnixNetVConnection.cc:788
#8  0x00509755 in VIO::reenable (this=0x7fffb801c660) at 
../iocore/eventsystem/P_VIO.h:112
#9  0x0077a1da in UnixNetVConnection::do_io_read (this=0x7fffb801c540, 
c=0x7fffd402e3c0, 
nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628
#10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at 
SpdyClientSession.cc:210
#11 0x0054a1fa in ProxyClientSession::handle_api_return 
(this=0x7fffd402e3c0, event=6)
at ProxyClientSession.cc:167
#12 0x0054a142 in ProxyClientSession::do_api_callout 
(this=0x7fffd402e3c0, 
id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147
#13 0x00639303 in SpdyClientSession::new_connection 
(this=0x7fffd402e3c0, 
new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at 
SpdyClientSession.cc:195
#14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, 
event=202, 
edata=0x7fffb801c540) at SpdySessionAccept.cc:48
#15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, event=202, 
data=0x7fffb801c540)
at ../iocore/eventsystem/I_Continuation.h:146
#16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, 
edata=0x7fffb801c540)
at SSLNextProtocolAccept.cc:32
#17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent 
(this=0x7fffd40008e0, 
event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99
#18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, 
event=102, 
data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146
#19 0x0077871e in read_signal_and_update (event=102, vc=0x7fffb801c540)
---Type return to continue, or q return to quit---
at UnixNetVConnection.cc:145
#20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, 
vc=0x7fffb801c540)
at UnixNetVConnection.cc:206
#21 0x0077baac in UnixNetVConnection::readSignalDone 
(this=0x7fffb801c540, event=102, 
nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006
#22 0x0075e559 in SSLNetVConnection::net_read_io (this=0x7fffb801c540, 
nh=0x7fffef9b2be0, 
lthread=0x7fffef9af010) at SSLNetVConnection.cc:543
#23 0x00770b52 in NetHandler::mainNetEvent (this=0x7fffef9b2be0, 
event=5, e=0x1153690)
at UnixNet.cc:516
#24 0x0050970e in Continuation::handleEvent (this=0x7fffef9b2be0, 
event=5, data=0x1153690)
at ../iocore/eventsystem/I_Continuation.h:146
#25 0x0079aefa in EThread::process_event (this=0x7fffef9af010, 
e=0x1153690, calling_code=5)
at UnixEThread.cc:128
#26 0x0079b51b in EThread::execute (this=0x7fffef9af010) at 
UnixEThread.cc:252
#27 0x0079a414 in spawn_thread_internal (a=0x1532c60) at Thread.cc:86
#28 0x00351e807851 in start_thread () from /lib64/libpthread.so.0
#29 0x00351e4e890d in clone () from /lib64/libc.so.6
{code}

  was:
Noticed this while trying to reproduce [~oknet]'s issue on TS-3667.

I have a callback set on the SNI hook.  It selects a new certificate 

[jira] [Commented] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-07-21 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635316#comment-14635316
 ] 

Susan Hinrichs commented on TS-3667:


BTW in trying to reproduce this case I ran across an unsettling debug assert in 
the SPDY client logic.  I'll file a separate issue for that.

 SSL Handhake read does not correctly handle EOF and error cases
 ---

 Key: TS-3667
 URL: https://issues.apache.org/jira/browse/TS-3667
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Affects Versions: 5.2.0, 5.3.0
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.3.1, 6.0.0

 Attachments: ts-3667.diff


 Reported by [~esproul] and postwait.
 The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
 EOF and errors are not terminated, but rather spin until the inactivity 
 timeout is reached.  EAGAIN  is not being descheduled until more data is 
 available.
 This results in higher CPU utilization and hitting the SSL_error() function 
 much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession

2015-07-21 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635343#comment-14635343
 ] 

Susan Hinrichs commented on TS-3784:


Here is the patch that eliminates the debug assert for me

{code}
diff --git a/proxy/spdy/SpdyClientSession.cc b/proxy/spdy/SpdyClientSession.cc
index 2f8720e..fe5c732 100644
--- a/proxy/spdy/SpdyClientSession.cc
+++ b/proxy/spdy/SpdyClientSession.cc
@@ -94,7 +94,8 @@ SpdyClientSession::init(NetVConnection *netvc)
 {
   int r;
 
-  this-mutex = new_ProxyMutex();
+  //this-mutex = new_ProxyMutex();
+  this-mutex = netvc-mutex;
   this-vc = netvc;
   this-req_map.clear();
 
{code}

 Unpleasant debug assert in when starting up a SpdyClientSession
 ---

 Key: TS-3784
 URL: https://issues.apache.org/jira/browse/TS-3784
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Reporter: Susan Hinrichs

 Noticed this while trying to reproduce [~oknet]'s issue on TS-3667.
 I have a callback set on the SNI hook.  It selects a new certificate and 
 reenables the vc before returning.  The stack is below.  The assert is 
 because the current thread does not hold the read.vio mutex.  In fact no 
 thread holds the read vio mutex.
 For HttpClientSession and Http2ClientSession, they use the VC's mutex when 
 setting up the vio's, so when the do_io_reads occur the mutex is 
 automatically already held. 
 If I change SpdyClientSession to use the VC mutex instead of creating a new 
 mutex, this assert does not get triggered.  Not clear whether this is causing 
 any real issues, but it seems cleaner to follow the mutex assignment strategy 
 of the other protocols.
 Here is the stack
 {code}
 #0  0x00351e4328a5 in raise () from /lib64/libc.so.6
 #1  0x00351e434085 in abort () from /lib64/libc.so.6
 #2  0x77dda215 in ink_die_die_die () at ink_error.cc:43
 #3  0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag 
 __va_list_tag *) (
 fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at 
 ink_error.cc:65
 #4  0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: 
 failed assert `%s`)
 at ink_error.cc:73
 #5  0x77dd7f12 in _ink_assert (
 expression=0x826e48 vio-mutex-thread_holding == this_ethread()  
 thread, 
 file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37
 #6  0x0077b4d7 in UnixNetVConnection::set_enabled 
 (this=0x7fffb801c540, vio=0x7fffb801c660)
 at UnixNetVConnection.cc:895
 #7  0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, 
 vio=0x7fffb801c660)
 at UnixNetVConnection.cc:788
 #8  0x00509755 in VIO::reenable (this=0x7fffb801c660) at 
 ../iocore/eventsystem/P_VIO.h:112
 #9  0x0077a1da in UnixNetVConnection::do_io_read 
 (this=0x7fffb801c540, c=0x7fffd402e3c0, 
 nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628
 #10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at 
 SpdyClientSession.cc:210
 #11 0x0054a1fa in ProxyClientSession::handle_api_return 
 (this=0x7fffd402e3c0, event=6)
 at ProxyClientSession.cc:167
 #12 0x0054a142 in ProxyClientSession::do_api_callout 
 (this=0x7fffd402e3c0, 
 id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147
 #13 0x00639303 in SpdyClientSession::new_connection 
 (this=0x7fffd402e3c0, 
 new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at 
 SpdyClientSession.cc:195
 #14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, 
 event=202, 
 edata=0x7fffb801c540) at SpdySessionAccept.cc:48
 #15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, 
 event=202, data=0x7fffb801c540)
 at ../iocore/eventsystem/I_Continuation.h:146
 #16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, 
 edata=0x7fffb801c540)
 at SSLNextProtocolAccept.cc:32
 #17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent 
 (this=0x7fffd40008e0, 
 event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99
 #18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, 
 event=102, 
 data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146
 #19 0x0077871e in read_signal_and_update (event=102, 
 vc=0x7fffb801c540)
 ---Type return to continue, or q return to quit---
 at UnixNetVConnection.cc:145
 #20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, 
 vc=0x7fffb801c540)
 at UnixNetVConnection.cc:206
 #21 0x0077baac in UnixNetVConnection::readSignalDone 
 (this=0x7fffb801c540, event=102, 
 nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006
 #22 0x0075e559 in SSLNetVConnection::net_read_io 
 (this=0x7fffb801c540, nh=0x7fffef9b2be0, 
 

[jira] [Resolved] (TS-3788) SNI callbacks stall after TS-3667 fix

2015-07-22 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3788.

Resolution: Fixed

 SNI callbacks stall after TS-3667 fix
 -

 Key: TS-3788
 URL: https://issues.apache.org/jira/browse/TS-3788
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 6.1.0


 Reported by [~oknet] and the main discussion is in the TS-3667.  Due to 
 changes in the fix for TS-3667, the EAGAIN would get checked before calling 
 SSL_accept.  If SSL_accept state machine needed to write data, it would never 
 get triggered and the handshake would stall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3790) action=tunnel in ssl_multicert.config will cause crash

2015-07-22 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3790.

Resolution: Fixed

 action=tunnel in ssl_multicert.config will cause crash
 --

 Key: TS-3790
 URL: https://issues.apache.org/jira/browse/TS-3790
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 6.1.0

 Attachments: ts-3790.diff


 Enabled an old line in my ssl_multicert.config and accidentally tested the 
 action=tunnel feature.  It caused the traffic_server process to crash.  The 
 code was assuming that a handShakeBuffer must be present if we are deciding 
 to do a blind tunnel, but that is only the case if the decision is made in 
 the SNI callback.  I'm going to attach a patch that fixes the problem.
 Example line that will trigger the issue.  Packets addressed to 1.2.3.4 will 
 try to convert to blind tunnel before any SSL handshake processing is 
 attempted.
 {code}
 dest_ip=1.2.3.4 action=tunnel ssl_cert_name=servercert.pem 
 ssl_key_name=privkey.pem
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3775) ASAN crash while running regression test Cache_vol

2015-07-22 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637609#comment-14637609
 ] 

Susan Hinrichs commented on TS-3775:


Sigh.  Committing a number of smaller fixes.  Looks like I mis-read my notes.  
Will update commits by hand.


 ASAN crash while running regression test Cache_vol
 --

 Key: TS-3775
 URL: https://issues.apache.org/jira/browse/TS-3775
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Attachments: ts-3775.diff


 Seen while running master built with ASAN on FC 21.  I have a patch which 
 I'll attach and discuss in comment.
 {code}
 REGRESSION TEST Cache_vol started
 RPRINT Cache_vol: 1 128 Megabyte Volumes
 RPRINT Cache_vol: Not enough space for 10 volume
 RPRINT Cache_vol: Random Volumes after clearing the disks
 RPRINT Cache_vol: volume=1 scheme=http size=128
 RPRINT Cache_vol: Random Volumes without clearing the disks
 RPRINT Cache_vol: volume=1 scheme=rtsp size=128
 =
 ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0
 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1])
 #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702
 #1 0x989545 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #2 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #5 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #6 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #8 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #9 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #10 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #11 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c)
 0x6048e9e0 is located 16 bytes inside of 40-byte region 
 [0x6048e9d0,0x6048e9f8)
 freed by thread T2 ([ET_NET 1]) here:
 #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
 #1 0x9c84ac in CacheDisk::delete_volume(int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:330
 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684
 #3 0x989455 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #7 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #8 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #10 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #11 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #12 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #13 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 previously allocated by thread T2 ([ET_NET 1]) here:
 #0 0x76f5714f in operator new(unsigned long) 
 (/lib64/libasan.so.1+0x5814f)
 #1 0x9c770d in CacheDisk::create_volume(int, long, int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:296
 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023
 #3 0x989b41 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2877
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in 

[jira] [Resolved] (TS-3654) ASAN heap-use-after-free in cache-hosting (regression)

2015-07-22 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3654.

Resolution: Fixed

 ASAN heap-use-after-free in cache-hosting (regression)
 --

 Key: TS-3654
 URL: https://issues.apache.org/jira/browse/TS-3654
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: Leif Hedstrom
Assignee: Susan Hinrichs
 Fix For: 6.1.0


 {code}
 RPRINT Cache_vol: 1 128 Megabyte Volumes
 RPRINT Cache_vol: Not enough space for 10 volume
 RPRINT Cache_vol: Random Volumes after clearing the disks
 RPRINT Cache_vol: volume=1 scheme=http size=128
 RPRINT Cache_vol: Random Volumes without clearing the disks
 RPRINT Cache_vol: volume=1 scheme=rtsp size=128
 =
 ==3733==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x604a2960 at pc 0xa7ce83 bp 0x7f3c7f946980 sp 0x7f3c7f946970
 READ of size 8 at 0x604a2960 thread T3 ([ET_NET 2])
 #0 0xa7ce82 in cplist_update ../../../../iocore/cache/Cache.cc:3230
 #1 0xa7ce82 in cplist_reconfigure() ../../../../iocore/cache/Cache.cc:3374
 #2 0xac619e in execute_and_verify(RegressionTest*) 
 ../../../../iocore/cache/CacheHosting.cc:994
 #3 0xac75f8 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 ../../../../iocore/cache/CacheHosting.cc:840
 #4 0x7f3c8480b4d2 in start_test ../../../../lib/ts/Regression.cc:77
 #5 0x7f3c8480b4d2 in RegressionTest::run_some() 
 ../../../../lib/ts/Regression.cc:125
 #6 0x7f3c8480b9b6 in RegressionTest::check_status() 
 ../../../../lib/ts/Regression.cc:140
 #7 0x57b5b4 in RegressionCont::mainEvent(int, Event*) 
 ../../../proxy/Main.cc:1220
 #8 0xc8b86e in Continuation::handleEvent(int, void*) 
 ../../../../iocore/eventsystem/I_Continuation.h:145
 #9 0xc8b86e in EThread::process_event(Event*, int) 
 ../../../../iocore/eventsystem/UnixEThread.cc:128
 #10 0xc8da67 in EThread::execute() 
 ../../../../iocore/eventsystem/UnixEThread.cc:207
 #11 0xc8a488 in spawn_thread_internal 
 ../../../../iocore/eventsystem/Thread.cc:85
 #12 0x7f3c84392529 in start_thread (/lib64/libpthread.so.0+0x3813e07529)
 #13 0x381370022c in __clone (/lib64/libc.so.6+0x381370022c)
 0x604a2960 is located 16 bytes inside of 40-byte region 
 [0x604a2950,0x604a2978)
 freed by thread T3 ([ET_NET 2]) here:
 #0 0x7f3c84aaf64f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
 #1 0xabbd16 in CacheDisk::delete_volume(int) 
 ../../../../iocore/cache/CacheDisk.cc:330
 #2 0xa7bfe0 in cplist_update ../../../../iocore/cache/Cache.cc:3212
 #3 0xa7bfe0 in cplist_reconfigure() ../../../../iocore/cache/Cache.cc:3374
 #4 0xac619e in execute_and_verify(RegressionTest*) 
 ../../../../iocore/cache/CacheHosting.cc:994
 #5 0xac75f8 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 ../../../../iocore/cache/CacheHosting.cc:840
 #6 0x7f3c8480b4d2 in start_test ../../../../lib/ts/Regression.cc:77
 #7 0x7f3c8480b4d2 in RegressionTest::run_some() 
 ../../../../lib/ts/Regression.cc:125
 #8 0x7f3c8480b9b6 in RegressionTest::check_status() 
 ../../../../lib/ts/Regression.cc:140
 #9 0x57b5b4 in RegressionCont::mainEvent(int, Event*) 
 ../../../proxy/Main.cc:1220
 #10 0xc8b86e in Continuation::handleEvent(int, void*) 
 ../../../../iocore/eventsystem/I_Continuation.h:145
 #11 0xc8b86e in EThread::process_event(Event*, int) 
 ../../../../iocore/eventsystem/UnixEThread.cc:128
 #12 0xc8da67 in EThread::execute() 
 ../../../../iocore/eventsystem/UnixEThread.cc:207
 #13 0xc8a488 in spawn_thread_internal 
 ../../../../iocore/eventsystem/Thread.cc:85
 #14 0x7f3c84392529 in start_thread (/lib64/libpthread.so.0+0x3813e07529)
 previously allocated by thread T3 ([ET_NET 2]) here:
 #0 0x7f3c84aaf14f in operator new(unsigned long) 
 (/lib64/libasan.so.1+0x5814f)
 #1 0xaba5ca in CacheDisk::create_volume(int, long, int) 
 ../../../../iocore/cache/CacheDisk.cc:296
 #2 0xa74f81 in create_volume ../../../../iocore/cache/Cache.cc:3551
 #3 0xa7ca20 in cplist_reconfigure() ../../../../iocore/cache/Cache.cc:3405
 #4 0xac619e in execute_and_verify(RegressionTest*) 
 ../../../../iocore/cache/CacheHosting.cc:994
 #5 0xac75f8 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 ../../../../iocore/cache/CacheHosting.cc:840
 #6 0x7f3c8480b4d2 in start_test ../../../../lib/ts/Regression.cc:77
 #7 0x7f3c8480b4d2 in RegressionTest::run_some() 
 ../../../../lib/ts/Regression.cc:125
 #8 0x7f3c8480b9b6 in RegressionTest::check_status() 
 ../../../../lib/ts/Regression.cc:140
 #9 0x57b5b4 in RegressionCont::mainEvent(int, Event*) 
 ../../../proxy/Main.cc:1220
 #10 0xc8b86e in 

[jira] [Resolved] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession

2015-07-22 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3784.

Resolution: Fixed

 Unpleasant debug assert in when starting up a SpdyClientSession
 ---

 Key: TS-3784
 URL: https://issues.apache.org/jira/browse/TS-3784
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 6.1.0


 Noticed this while trying to reproduce [~oknet]'s issue on TS-3667.
 I have a callback set on the SNI hook.  It selects a new certificate and 
 reenables the vc before returning.  The stack is below.  The assert is 
 because the current thread does not hold the read.vio mutex.  In fact no 
 thread holds the read vio mutex.
 For HttpClientSession and Http2ClientSession, they use the VC's mutex instead 
 of creating new mutex.   So that shared mutex is used when setting up the 
 vio's, so when the do_io_reads occur the mutex is automatically already held. 
 If I change SpdyClientSession to use the VC mutex instead of creating a new 
 mutex, this assert does not get triggered.  Not clear whether this is causing 
 any real issues, but it seems cleaner to follow the mutex assignment strategy 
 of the other protocols.
 Here is the stack
 {code}
 #0  0x00351e4328a5 in raise () from /lib64/libc.so.6
 #1  0x00351e434085 in abort () from /lib64/libc.so.6
 #2  0x77dda215 in ink_die_die_die () at ink_error.cc:43
 #3  0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag 
 __va_list_tag *) (
 fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at 
 ink_error.cc:65
 #4  0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: 
 failed assert `%s`)
 at ink_error.cc:73
 #5  0x77dd7f12 in _ink_assert (
 expression=0x826e48 vio-mutex-thread_holding == this_ethread()  
 thread, 
 file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37
 #6  0x0077b4d7 in UnixNetVConnection::set_enabled 
 (this=0x7fffb801c540, vio=0x7fffb801c660)
 at UnixNetVConnection.cc:895
 #7  0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, 
 vio=0x7fffb801c660)
 at UnixNetVConnection.cc:788
 #8  0x00509755 in VIO::reenable (this=0x7fffb801c660) at 
 ../iocore/eventsystem/P_VIO.h:112
 #9  0x0077a1da in UnixNetVConnection::do_io_read 
 (this=0x7fffb801c540, c=0x7fffd402e3c0, 
 nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628
 #10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at 
 SpdyClientSession.cc:210
 #11 0x0054a1fa in ProxyClientSession::handle_api_return 
 (this=0x7fffd402e3c0, event=6)
 at ProxyClientSession.cc:167
 #12 0x0054a142 in ProxyClientSession::do_api_callout 
 (this=0x7fffd402e3c0, 
 id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147
 #13 0x00639303 in SpdyClientSession::new_connection 
 (this=0x7fffd402e3c0, 
 new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at 
 SpdyClientSession.cc:195
 #14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, 
 event=202, 
 edata=0x7fffb801c540) at SpdySessionAccept.cc:48
 #15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, 
 event=202, data=0x7fffb801c540)
 at ../iocore/eventsystem/I_Continuation.h:146
 #16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, 
 edata=0x7fffb801c540)
 at SSLNextProtocolAccept.cc:32
 #17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent 
 (this=0x7fffd40008e0, 
 event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99
 #18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, 
 event=102, 
 data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146
 #19 0x0077871e in read_signal_and_update (event=102, 
 vc=0x7fffb801c540)
 ---Type return to continue, or q return to quit---
 at UnixNetVConnection.cc:145
 #20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, 
 vc=0x7fffb801c540)
 at UnixNetVConnection.cc:206
 #21 0x0077baac in UnixNetVConnection::readSignalDone 
 (this=0x7fffb801c540, event=102, 
 nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006
 #22 0x0075e559 in SSLNetVConnection::net_read_io 
 (this=0x7fffb801c540, nh=0x7fffef9b2be0, 
 lthread=0x7fffef9af010) at SSLNetVConnection.cc:543
 #23 0x00770b52 in NetHandler::mainNetEvent (this=0x7fffef9b2be0, 
 event=5, e=0x1153690)
 at UnixNet.cc:516
 #24 0x0050970e in Continuation::handleEvent (this=0x7fffef9b2be0, 
 event=5, data=0x1153690)
 at ../iocore/eventsystem/I_Continuation.h:146
 #25 0x0079aefa in EThread::process_event 

[jira] [Updated] (TS-3788) SNI callbacks stall after TS-3667 fix

2015-07-22 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3788:
---
Backport to Version: 5.3.2, 6.0.0

 SNI callbacks stall after TS-3667 fix
 -

 Key: TS-3788
 URL: https://issues.apache.org/jira/browse/TS-3788
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 6.1.0


 Reported by [~oknet] and the main discussion is in the TS-3667.  Due to 
 changes in the fix for TS-3667, the EAGAIN would get checked before calling 
 SSL_accept.  If SSL_accept state machine needed to write data, it would never 
 get triggered and the handshake would stall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3784) Unpleasant debug assert in when starting up a SpdyClientSession

2015-07-22 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637616#comment-14637616
 ] 

Susan Hinrichs commented on TS-3784:


I mis-read my notes when setting up the commit notes on this one.  Here is the 
commit for this one.

Commit 6f66b7a18234a93e810d8ef2ce23144b9b3446f4 in trafficserver's branch 
refs/heads/master from shinrich
[ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=6f66b7a ]
TS-3775: Adjust the mutex assignment for SpdyClientSession to avoid unlocked 
read vio.

 Unpleasant debug assert in when starting up a SpdyClientSession
 ---

 Key: TS-3784
 URL: https://issues.apache.org/jira/browse/TS-3784
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 6.1.0


 Noticed this while trying to reproduce [~oknet]'s issue on TS-3667.
 I have a callback set on the SNI hook.  It selects a new certificate and 
 reenables the vc before returning.  The stack is below.  The assert is 
 because the current thread does not hold the read.vio mutex.  In fact no 
 thread holds the read vio mutex.
 For HttpClientSession and Http2ClientSession, they use the VC's mutex instead 
 of creating new mutex.   So that shared mutex is used when setting up the 
 vio's, so when the do_io_reads occur the mutex is automatically already held. 
 If I change SpdyClientSession to use the VC mutex instead of creating a new 
 mutex, this assert does not get triggered.  Not clear whether this is causing 
 any real issues, but it seems cleaner to follow the mutex assignment strategy 
 of the other protocols.
 Here is the stack
 {code}
 #0  0x00351e4328a5 in raise () from /lib64/libc.so.6
 #1  0x00351e434085 in abort () from /lib64/libc.so.6
 #2  0x77dda215 in ink_die_die_die () at ink_error.cc:43
 #3  0x77dda2cc in ink_fatal_va(const char *, typedef __va_list_tag 
 __va_list_tag *) (
 fmt=0x77deb298 %s:%d: failed assert `%s`, ap=0x7fffef4a8530) at 
 ink_error.cc:65
 #4  0x77dda391 in ink_fatal (message_format=0x77deb298 %s:%d: 
 failed assert `%s`)
 at ink_error.cc:73
 #5  0x77dd7f12 in _ink_assert (
 expression=0x826e48 vio-mutex-thread_holding == this_ethread()  
 thread, 
 file=0x826a9e UnixNetVConnection.cc, line=895) at ink_assert.cc:37
 #6  0x0077b4d7 in UnixNetVConnection::set_enabled 
 (this=0x7fffb801c540, vio=0x7fffb801c660)
 at UnixNetVConnection.cc:895
 #7  0x0077ab94 in UnixNetVConnection::reenable (this=0x7fffb801c540, 
 vio=0x7fffb801c660)
 at UnixNetVConnection.cc:788
 #8  0x00509755 in VIO::reenable (this=0x7fffb801c660) at 
 ../iocore/eventsystem/P_VIO.h:112
 #9  0x0077a1da in UnixNetVConnection::do_io_read 
 (this=0x7fffb801c540, c=0x7fffd402e3c0, 
 nbytes=9223372036854775807, buf=0x16f9b30) at UnixNetVConnection.cc:628
 #10 0x006393c1 in SpdyClientSession::start (this=0x7fffd402e3c0) at 
 SpdyClientSession.cc:210
 #11 0x0054a1fa in ProxyClientSession::handle_api_return 
 (this=0x7fffd402e3c0, event=6)
 at ProxyClientSession.cc:167
 #12 0x0054a142 in ProxyClientSession::do_api_callout 
 (this=0x7fffd402e3c0, 
 id=TS_HTTP_SSN_START_HOOK) at ProxyClientSession.cc:147
 #13 0x00639303 in SpdyClientSession::new_connection 
 (this=0x7fffd402e3c0, 
 new_vc=0x7fffb801c540, iobuf=0x0, reader=0x0, backdoor=false) at 
 SpdyClientSession.cc:195
 #14 0x0063878e in SpdySessionAccept::mainEvent (this=0x16bc7a0, 
 event=202, 
 edata=0x7fffb801c540) at SpdySessionAccept.cc:48
 #15 0x0050970e in Continuation::handleEvent (this=0x16bc7a0, 
 event=202, data=0x7fffb801c540)
 at ../iocore/eventsystem/I_Continuation.h:146
 #16 0x00763404 in send_plugin_event (plugin=0x16bc7a0, event=202, 
 edata=0x7fffb801c540)
 at SSLNextProtocolAccept.cc:32
 #17 0x00763b89 in SSLNextProtocolTrampoline::ioCompletionEvent 
 (this=0x7fffd40008e0, 
 event=102, edata=0x7fffb801c660) at SSLNextProtocolAccept.cc:99
 #18 0x0050970e in Continuation::handleEvent (this=0x7fffd40008e0, 
 event=102, 
 data=0x7fffb801c660) at ../iocore/eventsystem/I_Continuation.h:146
 #19 0x0077871e in read_signal_and_update (event=102, 
 vc=0x7fffb801c540)
 ---Type return to continue, or q return to quit---
 at UnixNetVConnection.cc:145
 #20 0x00778abe in read_signal_done (event=102, nh=0x7fffef9b2be0, 
 vc=0x7fffb801c540)
 at UnixNetVConnection.cc:206
 #21 0x0077baac in UnixNetVConnection::readSignalDone 
 (this=0x7fffb801c540, event=102, 
 nh=0x7fffef9b2be0) at UnixNetVConnection.cc:1006
 #22 0x0075e559 in SSLNetVConnection::net_read_io 
 (this=0x7fffb801c540, 

[jira] [Comment Edited] (TS-3775) ASAN crash while running regression test Cache_vol

2015-07-22 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637609#comment-14637609
 ] 

Susan Hinrichs edited comment on TS-3775 at 7/22/15 9:04 PM:
-

Sigh.  Committing a number of smaller fixes.  Looks like I mis-read my notes.  
Will update commits by hand.  Fix for this issue is not yet committed.



was (Author: shinrich):
Sigh.  Committing a number of smaller fixes.  Looks like I mis-read my notes.  
Will update commits by hand.


 ASAN crash while running regression test Cache_vol
 --

 Key: TS-3775
 URL: https://issues.apache.org/jira/browse/TS-3775
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Attachments: ts-3775.diff


 Seen while running master built with ASAN on FC 21.  I have a patch which 
 I'll attach and discuss in comment.
 {code}
 REGRESSION TEST Cache_vol started
 RPRINT Cache_vol: 1 128 Megabyte Volumes
 RPRINT Cache_vol: Not enough space for 10 volume
 RPRINT Cache_vol: Random Volumes after clearing the disks
 RPRINT Cache_vol: volume=1 scheme=http size=128
 RPRINT Cache_vol: Random Volumes without clearing the disks
 RPRINT Cache_vol: volume=1 scheme=rtsp size=128
 =
 ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0
 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1])
 #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702
 #1 0x989545 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #2 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #5 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #6 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #8 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #9 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #10 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #11 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c)
 0x6048e9e0 is located 16 bytes inside of 40-byte region 
 [0x6048e9d0,0x6048e9f8)
 freed by thread T2 ([ET_NET 1]) here:
 #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
 #1 0x9c84ac in CacheDisk::delete_volume(int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:330
 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684
 #3 0x989455 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #7 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #8 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #10 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #11 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #12 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #13 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 previously allocated by thread T2 ([ET_NET 1]) here:
 #0 0x76f5714f in operator new(unsigned long) 
 (/lib64/libasan.so.1+0x5814f)
 #1 0x9c770d in CacheDisk::create_volume(int, long, int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:296
 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023

[jira] [Created] (TS-3788) SNI callbacks stall after TS-3667 fix

2015-07-22 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3788:
--

 Summary: SNI callbacks stall after TS-3667 fix
 Key: TS-3788
 URL: https://issues.apache.org/jira/browse/TS-3788
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs


Reported by [~oknet] and the main discussion is in the TS-3667.  Due to changes 
in the fix for TS-3667, the EAGAIN would get checked before calling SSL_accept. 
 If SSL_accept state machine needed to write data, it would never get triggered 
and the handshake would stall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3788) SNI callbacks stall after TS-3667 fix

2015-07-22 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3788:
--

Assignee: Susan Hinrichs

 SNI callbacks stall after TS-3667 fix
 -

 Key: TS-3788
 URL: https://issues.apache.org/jira/browse/TS-3788
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs

 Reported by [~oknet] and the main discussion is in the TS-3667.  Due to 
 changes in the fix for TS-3667, the EAGAIN would get checked before calling 
 SSL_accept.  If SSL_accept state machine needed to write data, it would never 
 get triggered and the handshake would stall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-07-22 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3667.
--
Resolution: Fixed

Opened a new bug TS-3788 to track the problem noted by [~oknet]

 SSL Handhake read does not correctly handle EOF and error cases
 ---

 Key: TS-3667
 URL: https://issues.apache.org/jira/browse/TS-3667
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Affects Versions: 5.2.0, 5.3.0
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 6.0.0, 5.3.1

 Attachments: ts-3667.diff


 Reported by [~esproul] and postwait.
 The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
 EOF and errors are not terminated, but rather spin until the inactivity 
 timeout is reached.  EAGAIN  is not being descheduled until more data is 
 available.
 This results in higher CPU utilization and hitting the SSL_error() function 
 much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-07-22 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637019#comment-14637019
 ] 

Susan Hinrichs commented on TS-3667:


Filed TS-3784 to track the locking debug assert.

 SSL Handhake read does not correctly handle EOF and error cases
 ---

 Key: TS-3667
 URL: https://issues.apache.org/jira/browse/TS-3667
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Affects Versions: 5.2.0, 5.3.0
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.3.1, 6.0.0

 Attachments: ts-3667.diff


 Reported by [~esproul] and postwait.
 The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
 EOF and errors are not terminated, but rather spin until the inactivity 
 timeout is reached.  EAGAIN  is not being descheduled until more data is 
 available.
 This results in higher CPU utilization and hitting the SSL_error() function 
 much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3775) ASAN crash while running regression test Cache_vol

2015-07-17 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3775:
--

 Summary: ASAN crash while running regression test Cache_vol
 Key: TS-3775
 URL: https://issues.apache.org/jira/browse/TS-3775
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs


Seen while running master built with ASAN on FC 21.  I have a patch which I'll 
attach and discuss in comment.

{code}
REGRESSION TEST Cache_vol started
RPRINT Cache_vol: 1 128 Megabyte Volumes
RPRINT Cache_vol: Not enough space for 10 volume
RPRINT Cache_vol: Random Volumes after clearing the disks
RPRINT Cache_vol: volume=1 scheme=http size=128
RPRINT Cache_vol: Random Volumes without clearing the disks
RPRINT Cache_vol: volume=1 scheme=rtsp size=128
=
==4513==ERROR: AddressSanitizer: heap-use-after-free on address 0x6048e9e0 
at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0
READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1])
#0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702
#1 0x989545 in cplist_reconfigure() 
/home/shinrich/ats/iocore/cache/Cache.cc:2846
#2 0x9d1186 in execute_and_verify(RegressionTest*) 
/home/shinrich/ats/iocore/cache/CacheHosting.cc:996
#3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
/home/shinrich/ats/iocore/cache/CacheHosting.cc:842
#4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
#5 0x76cb55f1 in RegressionTest::run_some() 
/home/shinrich/ats/lib/ts/Regression.cc:126
#6 0x76cb5b00 in RegressionTest::check_status() 
/home/shinrich/ats/lib/ts/Regression.cc:141
#7 0x5404fb in RegressionCont::mainEvent(int, Event*) 
/home/shinrich/ats/proxy/Main.cc:1210
#8 0xb6b771 in Continuation::handleEvent(int, void*) 
/home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
#9 0xb6b771 in EThread::process_event(Event*, int) 
/home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
#10 0xb6d3a6 in EThread::execute() 
/home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
#11 0xb69da1 in spawn_thread_internal 
/home/shinrich/ats/iocore/eventsystem/Thread.cc:86
#12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
#13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c)

0x6048e9e0 is located 16 bytes inside of 40-byte region 
[0x6048e9d0,0x6048e9f8)
freed by thread T2 ([ET_NET 1]) here:
#0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
#1 0x9c84ac in CacheDisk::delete_volume(int) 
/home/shinrich/ats/iocore/cache/CacheDisk.cc:330
#2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684
#3 0x989455 in cplist_reconfigure() 
/home/shinrich/ats/iocore/cache/Cache.cc:2846
#4 0x9d1186 in execute_and_verify(RegressionTest*) 
/home/shinrich/ats/iocore/cache/CacheHosting.cc:996
#5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
/home/shinrich/ats/iocore/cache/CacheHosting.cc:842
#6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
#7 0x76cb55f1 in RegressionTest::run_some() 
/home/shinrich/ats/lib/ts/Regression.cc:126
#8 0x76cb5b00 in RegressionTest::check_status() 
/home/shinrich/ats/lib/ts/Regression.cc:141
#9 0x5404fb in RegressionCont::mainEvent(int, Event*) 
/home/shinrich/ats/proxy/Main.cc:1210
#10 0xb6b771 in Continuation::handleEvent(int, void*) 
/home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
#11 0xb6b771 in EThread::process_event(Event*, int) 
/home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
#12 0xb6d3a6 in EThread::execute() 
/home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
#13 0xb69da1 in spawn_thread_internal 
/home/shinrich/ats/iocore/eventsystem/Thread.cc:86
#14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)

previously allocated by thread T2 ([ET_NET 1]) here:
#0 0x76f5714f in operator new(unsigned long) 
(/lib64/libasan.so.1+0x5814f)
#1 0x9c770d in CacheDisk::create_volume(int, long, int) 
/home/shinrich/ats/iocore/cache/CacheDisk.cc:296
#2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023
#3 0x989b41 in cplist_reconfigure() 
/home/shinrich/ats/iocore/cache/Cache.cc:2877
#4 0x9d1186 in execute_and_verify(RegressionTest*) 
/home/shinrich/ats/iocore/cache/CacheHosting.cc:996
#5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
/home/shinrich/ats/iocore/cache/CacheHosting.cc:842
#6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
#7 0x76cb55f1 in RegressionTest::run_some() 
/home/shinrich/ats/lib/ts/Regression.cc:126
#8 0x76cb5b00 in RegressionTest::check_status() 
/home/shinrich/ats/lib/ts/Regression.cc:141
#9 0x5404fb in RegressionCont::mainEvent(int, Event*) 

[jira] [Updated] (TS-3775) ASAN crash while running regression test Cache_vol

2015-07-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3775:
---
Attachment: ts-3775.diff

ts-3775.diff NULL's out the disk_vol entry after it is deleted in 
CacheDisk::delete_volume.

This method shifts down the other elements in the array to cover over the 
deleted item and decrements header-num_volumes.  But cplist_update uses 
gndisks as the upper bound when iterating over the disk_vols array, and this 
number does not get decremented.

So if the item deleted is the last item in the disk_vol array it does not get 
overwritten and the deleted object will get accessed in the next call to 
cplist_update.

While this fixes the immediate problem since cplist_update does do a null 
check, there is probably a more artful solution.  Having multiple values 
tracking the length of disk_vol seems bad.

 ASAN crash while running regression test Cache_vol
 --

 Key: TS-3775
 URL: https://issues.apache.org/jira/browse/TS-3775
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs
 Attachments: ts-3775.diff


 Seen while running master built with ASAN on FC 21.  I have a patch which 
 I'll attach and discuss in comment.
 {code}
 REGRESSION TEST Cache_vol started
 RPRINT Cache_vol: 1 128 Megabyte Volumes
 RPRINT Cache_vol: Not enough space for 10 volume
 RPRINT Cache_vol: Random Volumes after clearing the disks
 RPRINT Cache_vol: volume=1 scheme=http size=128
 RPRINT Cache_vol: Random Volumes without clearing the disks
 RPRINT Cache_vol: volume=1 scheme=rtsp size=128
 =
 ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0
 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1])
 #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702
 #1 0x989545 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #2 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #5 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #6 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #8 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #9 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #10 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #11 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c)
 0x6048e9e0 is located 16 bytes inside of 40-byte region 
 [0x6048e9d0,0x6048e9f8)
 freed by thread T2 ([ET_NET 1]) here:
 #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
 #1 0x9c84ac in CacheDisk::delete_volume(int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:330
 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684
 #3 0x989455 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #7 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #8 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #10 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #11 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #12 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #13 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 previously 

[jira] [Resolved] (TS-1007) SSN Close called before TXN Close

2015-07-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-1007.

Resolution: Fixed

 SSN Close called before TXN Close
 -

 Key: TS-1007
 URL: https://issues.apache.org/jira/browse/TS-1007
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.0.1
Reporter: Nick Kew
Assignee: Susan Hinrichs
  Labels: incompatible
 Fix For: 6.0.0


 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the 
 SSN_CLOSE_HOOK is called first of the two.  This messes up normal cleanups!
 Details:
   Register a SSN_START event globally
   In the SSN START, add a TXN_START and a SSN_CLOSE
   In the TXN START, add a TXN_CLOSE
 Stepping through, I see the order of events actually called, for the simple 
 case of a one-off HTTP request with no keepalive:
 SSN_START
 TXN_START
 SSN_END
 TXN_END
 Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the 
 TXN!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-1007) SSN Close called before TXN Close

2015-07-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-1007:
---
Fix Version/s: (was: 6.0.0)
   6.1.0

 SSN Close called before TXN Close
 -

 Key: TS-1007
 URL: https://issues.apache.org/jira/browse/TS-1007
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.0.1
Reporter: Nick Kew
Assignee: Susan Hinrichs
  Labels: incompatible
 Fix For: 6.1.0


 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the 
 SSN_CLOSE_HOOK is called first of the two.  This messes up normal cleanups!
 Details:
   Register a SSN_START event globally
   In the SSN START, add a TXN_START and a SSN_CLOSE
   In the TXN START, add a TXN_CLOSE
 Stepping through, I see the order of events actually called, for the simple 
 case of a one-off HTTP request with no keepalive:
 SSN_START
 TXN_START
 SSN_END
 TXN_END
 Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the 
 TXN!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-1007) SSN Close called before TXN Close

2015-07-17 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-1007:
---
Backport to Version: 6.0.0

 SSN Close called before TXN Close
 -

 Key: TS-1007
 URL: https://issues.apache.org/jira/browse/TS-1007
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.0.1
Reporter: Nick Kew
Assignee: Susan Hinrichs
  Labels: incompatible
 Fix For: 6.1.0


 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the 
 SSN_CLOSE_HOOK is called first of the two.  This messes up normal cleanups!
 Details:
   Register a SSN_START event globally
   In the SSN START, add a TXN_START and a SSN_CLOSE
   In the TXN START, add a TXN_CLOSE
 Stepping through, I see the order of events actually called, for the simple 
 case of a one-off HTTP request with no keepalive:
 SSN_START
 TXN_START
 SSN_END
 TXN_END
 Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the 
 TXN!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (TS-3775) ASAN crash while running regression test Cache_vol

2015-07-18 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reopened TS-3775:

  Assignee: Susan Hinrichs

Haven't yet committed the diff.

 ASAN crash while running regression test Cache_vol
 --

 Key: TS-3775
 URL: https://issues.apache.org/jira/browse/TS-3775
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Attachments: ts-3775.diff


 Seen while running master built with ASAN on FC 21.  I have a patch which 
 I'll attach and discuss in comment.
 {code}
 REGRESSION TEST Cache_vol started
 RPRINT Cache_vol: 1 128 Megabyte Volumes
 RPRINT Cache_vol: Not enough space for 10 volume
 RPRINT Cache_vol: Random Volumes after clearing the disks
 RPRINT Cache_vol: volume=1 scheme=http size=128
 RPRINT Cache_vol: Random Volumes without clearing the disks
 RPRINT Cache_vol: volume=1 scheme=rtsp size=128
 =
 ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0
 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1])
 #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702
 #1 0x989545 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #2 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #5 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #6 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #8 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #9 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #10 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #11 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c)
 0x6048e9e0 is located 16 bytes inside of 40-byte region 
 [0x6048e9d0,0x6048e9f8)
 freed by thread T2 ([ET_NET 1]) here:
 #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
 #1 0x9c84ac in CacheDisk::delete_volume(int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:330
 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684
 #3 0x989455 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #7 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #8 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #10 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #11 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #12 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #13 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 previously allocated by thread T2 ([ET_NET 1]) here:
 #0 0x76f5714f in operator new(unsigned long) 
 (/lib64/libasan.so.1+0x5814f)
 #1 0x9c770d in CacheDisk::create_volume(int, long, int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:296
 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023
 #3 0x989b41 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2877
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
  

[jira] [Issue Comment Deleted] (TS-3775) ASAN crash while running regression test Cache_vol

2015-07-18 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3775:
---
Comment: was deleted

(was: Haven't yet committed the diff.)

 ASAN crash while running regression test Cache_vol
 --

 Key: TS-3775
 URL: https://issues.apache.org/jira/browse/TS-3775
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Attachments: ts-3775.diff


 Seen while running master built with ASAN on FC 21.  I have a patch which 
 I'll attach and discuss in comment.
 {code}
 REGRESSION TEST Cache_vol started
 RPRINT Cache_vol: 1 128 Megabyte Volumes
 RPRINT Cache_vol: Not enough space for 10 volume
 RPRINT Cache_vol: Random Volumes after clearing the disks
 RPRINT Cache_vol: volume=1 scheme=http size=128
 RPRINT Cache_vol: Random Volumes without clearing the disks
 RPRINT Cache_vol: volume=1 scheme=rtsp size=128
 =
 ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0
 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1])
 #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702
 #1 0x989545 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #2 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #5 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #6 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #8 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #9 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #10 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #11 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c)
 0x6048e9e0 is located 16 bytes inside of 40-byte region 
 [0x6048e9d0,0x6048e9f8)
 freed by thread T2 ([ET_NET 1]) here:
 #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
 #1 0x9c84ac in CacheDisk::delete_volume(int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:330
 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684
 #3 0x989455 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #7 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #8 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #10 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #11 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #12 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #13 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 previously allocated by thread T2 ([ET_NET 1]) here:
 #0 0x76f5714f in operator new(unsigned long) 
 (/lib64/libasan.so.1+0x5814f)
 #1 0x9c770d in CacheDisk::create_volume(int, long, int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:296
 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023
 #3 0x989b41 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2877
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
   

[jira] [Closed] (TS-3775) ASAN crash while running regression test Cache_vol

2015-07-18 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3775.
--
Resolution: Duplicate

 ASAN crash while running regression test Cache_vol
 --

 Key: TS-3775
 URL: https://issues.apache.org/jira/browse/TS-3775
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Attachments: ts-3775.diff


 Seen while running master built with ASAN on FC 21.  I have a patch which 
 I'll attach and discuss in comment.
 {code}
 REGRESSION TEST Cache_vol started
 RPRINT Cache_vol: 1 128 Megabyte Volumes
 RPRINT Cache_vol: Not enough space for 10 volume
 RPRINT Cache_vol: Random Volumes after clearing the disks
 RPRINT Cache_vol: volume=1 scheme=http size=128
 RPRINT Cache_vol: Random Volumes without clearing the disks
 RPRINT Cache_vol: volume=1 scheme=rtsp size=128
 =
 ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0
 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1])
 #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702
 #1 0x989545 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #2 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #5 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #6 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #8 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #9 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #10 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #11 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c)
 0x6048e9e0 is located 16 bytes inside of 40-byte region 
 [0x6048e9d0,0x6048e9f8)
 freed by thread T2 ([ET_NET 1]) here:
 #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
 #1 0x9c84ac in CacheDisk::delete_volume(int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:330
 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684
 #3 0x989455 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #7 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #8 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #10 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #11 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #12 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #13 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 previously allocated by thread T2 ([ET_NET 1]) here:
 #0 0x76f5714f in operator new(unsigned long) 
 (/lib64/libasan.so.1+0x5814f)
 #1 0x9c770d in CacheDisk::create_volume(int, long, int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:296
 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023
 #3 0x989b41 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2877
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 0x76cb55f1 in start_test 

[jira] [Reopened] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-07-18 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reopened TS-3667:


Reopening to address the patch [~oknet] provides.

 SSL Handhake read does not correctly handle EOF and error cases
 ---

 Key: TS-3667
 URL: https://issues.apache.org/jira/browse/TS-3667
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Affects Versions: 5.2.0, 5.3.0
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.3.1, 6.0.0

 Attachments: ts-3667.diff


 Reported by [~esproul] and postwait.
 The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
 EOF and errors are not terminated, but rather spin until the inactivity 
 timeout is reached.  EAGAIN  is not being descheduled until more data is 
 available.
 This results in higher CPU utilization and hitting the SSL_error() function 
 much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3775) ASAN crash while running regression test Cache_vol

2015-07-18 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632421#comment-14632421
 ] 

Susan Hinrichs commented on TS-3775:


Yes, looks like the same thing.  

 ASAN crash while running regression test Cache_vol
 --

 Key: TS-3775
 URL: https://issues.apache.org/jira/browse/TS-3775
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Susan Hinrichs
 Attachments: ts-3775.diff


 Seen while running master built with ASAN on FC 21.  I have a patch which 
 I'll attach and discuss in comment.
 {code}
 REGRESSION TEST Cache_vol started
 RPRINT Cache_vol: 1 128 Megabyte Volumes
 RPRINT Cache_vol: Not enough space for 10 volume
 RPRINT Cache_vol: Random Volumes after clearing the disks
 RPRINT Cache_vol: volume=1 scheme=http size=128
 RPRINT Cache_vol: Random Volumes without clearing the disks
 RPRINT Cache_vol: volume=1 scheme=rtsp size=128
 =
 ==4513==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x6048e9e0 at pc 0x989546 bp 0x7fffef2a59b0 sp 0x7fffef2a59a0
 READ of size 8 at 0x6048e9e0 thread T2 ([ET_NET 1])
 #0 0x989545 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2702
 #1 0x989545 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #2 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #3 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #4 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #5 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #6 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #7 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #8 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #9 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #10 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #11 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #12 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 #13 0x7464922c in __clone (/lib64/libc.so.6+0x10022c)
 0x6048e9e0 is located 16 bytes inside of 40-byte region 
 [0x6048e9d0,0x6048e9f8)
 freed by thread T2 ([ET_NET 1]) here:
 #0 0x76f5764f in operator delete(void*) (/lib64/libasan.so.1+0x5864f)
 #1 0x9c84ac in CacheDisk::delete_volume(int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:330
 #2 0x989455 in cplist_update /home/shinrich/ats/iocore/cache/Cache.cc:2684
 #3 0x989455 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2846
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 0x76cb55f1 in start_test /home/shinrich/ats/lib/ts/Regression.cc:78
 #7 0x76cb55f1 in RegressionTest::run_some() 
 /home/shinrich/ats/lib/ts/Regression.cc:126
 #8 0x76cb5b00 in RegressionTest::check_status() 
 /home/shinrich/ats/lib/ts/Regression.cc:141
 #9 0x5404fb in RegressionCont::mainEvent(int, Event*) 
 /home/shinrich/ats/proxy/Main.cc:1210
 #10 0xb6b771 in Continuation::handleEvent(int, void*) 
 /home/shinrich/ats/iocore/eventsystem/I_Continuation.h:146
 #11 0xb6b771 in EThread::process_event(Event*, int) 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:128
 #12 0xb6d3a6 in EThread::execute() 
 /home/shinrich/ats/iocore/eventsystem/UnixEThread.cc:207
 #13 0xb69da1 in spawn_thread_internal 
 /home/shinrich/ats/iocore/eventsystem/Thread.cc:86
 #14 0x75e27529 in start_thread (/lib64/libpthread.so.0+0x7529)
 previously allocated by thread T2 ([ET_NET 1]) here:
 #0 0x76f5714f in operator new(unsigned long) 
 (/lib64/libasan.so.1+0x5814f)
 #1 0x9c770d in CacheDisk::create_volume(int, long, int) 
 /home/shinrich/ats/iocore/cache/CacheDisk.cc:296
 #2 0x98347e in create_volume /home/shinrich/ats/iocore/cache/Cache.cc:3023
 #3 0x989b41 in cplist_reconfigure() 
 /home/shinrich/ats/iocore/cache/Cache.cc:2877
 #4 0x9d1186 in execute_and_verify(RegressionTest*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:996
 #5 0x9d2229 in RegressionTest_Cache_vol(RegressionTest*, int, int*) 
 /home/shinrich/ats/iocore/cache/CacheHosting.cc:842
 #6 

[jira] [Commented] (TS-3710) Crash in TLS with 6.0.0, related to the session cleanup additions

2015-07-20 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633650#comment-14633650
 ] 

Susan Hinrichs commented on TS-3710:


I'd definitely try the do_io_read(NULL, 0 , NULL) in ioCompleteEvent before the 
send_plugin_event calls.  That should clear the read.vio._cont before the 
trampoline is deleted.  Based on the cores I looked at over the weekend, it 
definitely looked like the problem continuation was the 
SSLNextProtocolTrampoline.

 Although that was the last patch that [~zwoop] tried.  While it might have 
slowed down the problem.  It did not stop it completely.

 Crash in TLS with 6.0.0, related to the session cleanup additions
 -

 Key: TS-3710
 URL: https://issues.apache.org/jira/browse/TS-3710
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Affects Versions: 5.3.0
Reporter: Leif Hedstrom
Assignee: Susan Hinrichs
Priority: Critical
  Labels: yahoo
 Fix For: 6.1.0

 Attachments: ts-3710-2.diff, ts-3710-final-2.diff, ts-3710.diff


 {code}
 ==9570==ERROR: AddressSanitizer: heap-use-after-free on address 
 0x60649f48 at pc 0xb9f969 bp 0x2b8dbc348920 sp 0x2b8dbc348918
 READ of size 8 at 0x60649f48 thread T8 ([ET_NET 7])
 #0 0xb9f968 in Continuation::handleEvent(int, void*) 
 ../../iocore/eventsystem/I_Continuation.h:145
 #1 0xb9f968 in read_signal_and_update 
 /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142
 #2 0xb9f968 in UnixNetVConnection::mainEvent(int, Event*) 
 /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1115
 #3 0xb7daf7 in Continuation::handleEvent(int, void*) 
 ../../iocore/eventsystem/I_Continuation.h:145
 #4 0xb7daf7 in InactivityCop::check_inactivity(int, Event*) 
 /usr/local/src/trafficserver/iocore/net/UnixNet.cc:102
 #5 0xc21ffe in Continuation::handleEvent(int, void*) 
 /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
 #6 0xc21ffe in EThread::process_event(Event*, int) 
 /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
 #7 0xc241f7 in EThread::execute() 
 /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207
 #8 0xc20c18 in spawn_thread_internal 
 /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
 #9 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
 #10 0x2b8db585f1ac in __clone (/lib64/libc.so.6+0xf61ac)
 0x60649f48 is located 8 bytes inside of 56-byte region 
 [0x60649f40,0x60649f78)
 freed by thread T8 ([ET_NET 7]) here:
 #0 0x2b8db1bf3117 in operator delete(void*) 
 ../../.././libsanitizer/asan/asan_new_delete.cc:81
 #1 0xb5b20e in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) 
 /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89
 #2 0xbb2eef in Continuation::handleEvent(int, void*) 
 ../../iocore/eventsystem/I_Continuation.h:145
 #3 0xbb2eef in read_signal_and_update 
 /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142
 #4 0xbb2eef in read_signal_done 
 /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:203
 #5 0xbb2eef in UnixNetVConnection::readSignalDone(int, NetHandler*) 
 /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:957
 #6 0xb55d6d in SSLNetVConnection::net_read_io(NetHandler*, EThread*) 
 /usr/local/src/trafficserver/iocore/net/SSLNetVConnection.cc:480
 #7 0xb748fc in NetHandler::mainNetEvent(int, Event*) 
 /usr/local/src/trafficserver/iocore/net/UnixNet.cc:516
 #8 0xc24e89 in Continuation::handleEvent(int, void*) 
 /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
 #9 0xc24e89 in EThread::process_event(Event*, int) 
 /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
 #10 0xc24e89 in EThread::execute() 
 /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252
 #11 0xc20c18 in spawn_thread_internal 
 /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
 #12 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
 previously allocated by thread T8 ([ET_NET 7]) here:
 #0 0x2b8db1bf2c9f in operator new(unsigned long) 
 ../../.././libsanitizer/asan/asan_new_delete.cc:50
 #1 0xb59f8b in SSLNextProtocolAccept::mainEvent(int, void*) 
 /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:134
 #2 0xb888e9 in Continuation::handleEvent(int, void*) 
 ../../iocore/eventsystem/I_Continuation.h:145
 #3 0xb888e9 in NetAccept::acceptFastEvent(int, void*) 
 /usr/local/src/trafficserver/iocore/net/UnixNetAccept.cc:466
 #4 0xc24e89 in Continuation::handleEvent(int, void*) 
 /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
 #5 0xc24e89 in 

[jira] [Commented] (TS-1007) SSN Close called before TXN Close

2015-07-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625200#comment-14625200
 ] 

Susan Hinrichs commented on TS-1007:


I think TS-3612 will address the nested session open/session close cases of 
SPDY, H2 and similar future protocols.

 SSN Close called before TXN Close
 -

 Key: TS-1007
 URL: https://issues.apache.org/jira/browse/TS-1007
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.0.1
Reporter: Nick Kew
Assignee: Susan Hinrichs
  Labels: incompatible
 Fix For: 6.0.0


 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the 
 SSN_CLOSE_HOOK is called first of the two.  This messes up normal cleanups!
 Details:
   Register a SSN_START event globally
   In the SSN START, add a TXN_START and a SSN_CLOSE
   In the TXN START, add a TXN_CLOSE
 Stepping through, I see the order of events actually called, for the simple 
 case of a one-off HTTP request with no keepalive:
 SSN_START
 TXN_START
 SSN_END
 TXN_END
 Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the 
 TXN!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-1007) SSN Close called before TXN Close

2015-07-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625025#comment-14625025
 ] 

Susan Hinrichs commented on TS-1007:


I've made a fix so that the transaction close occurs before the session close.  
We moved the ua_session-do_io_close into the HttpSM::kill_this().  

We still get nested sessions for the SPDY and H2 cases.  I'll file another bug 
to track that since it isn't the same issue as this one.  

 SSN Close called before TXN Close
 -

 Key: TS-1007
 URL: https://issues.apache.org/jira/browse/TS-1007
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.0.1
Reporter: Nick Kew
Assignee: Susan Hinrichs
  Labels: incompatible
 Fix For: 6.0.0


 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the 
 SSN_CLOSE_HOOK is called first of the two.  This messes up normal cleanups!
 Details:
   Register a SSN_START event globally
   In the SSN START, add a TXN_START and a SSN_CLOSE
   In the TXN START, add a TXN_CLOSE
 Stepping through, I see the order of events actually called, for the simple 
 case of a one-off HTTP request with no keepalive:
 SSN_START
 TXN_START
 SSN_END
 TXN_END
 Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the 
 TXN!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3746) We need to make proxy.config.ssl.client.verify.server overridable

2015-07-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625047#comment-14625047
 ] 

Susan Hinrichs commented on TS-3746:


By the time you are taking an already existing session out of the pool, the 
certificate has been verified (or not).  

I guess you could set up remap rules for the same domain that resolve to the 
same origin server domain with conflicting values for the verify. So whether 
the origin server certificate is verified depends which remap rule initiated 
the connection.

But if the user is really concerned about only verifying certs for one set of 
domains vs another, I wouldn't think he would write such a conflicting set of 
remap rules.

Agreed just a list of origins would be more straightforward in some sense, but 
since so much already hangs on the remap rules that is kind of the obvious 
place for it in the minds of many current ATS deployers.

[~persiaAziz] and [~davet] are testing a version using the override config 
approach. Should have a PR for review soon. 

 We need to make proxy.config.ssl.client.verify.server overridable
 -

 Key: TS-3746
 URL: https://issues.apache.org/jira/browse/TS-3746
 Project: Traffic Server
  Issue Type: New Feature
  Components: Configuration
Reporter: Syeda Persia Aziz
  Labels: Yahoo
 Fix For: sometime


 We need to make proxy.config.ssl.client.verify.server overridable. Some 
 origin servers need validation to avoid MITM attacks while others don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3746) We need to make proxy.config.ssl.client.verify.server overridable

2015-07-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3746:
---
Assignee: Dave Thompson

 We need to make proxy.config.ssl.client.verify.server overridable
 -

 Key: TS-3746
 URL: https://issues.apache.org/jira/browse/TS-3746
 Project: Traffic Server
  Issue Type: New Feature
  Components: Configuration
Reporter: Syeda Persia Aziz
Assignee: Dave Thompson
  Labels: Yahoo
 Fix For: sometime


 We need to make proxy.config.ssl.client.verify.server overridable. Some 
 origin servers need validation to avoid MITM attacks while others don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3667) SSL Handhake read does not correctly handle EOF and error cases

2015-07-20 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634109#comment-14634109
 ] 

Susan Hinrichs commented on TS-3667:


[~oknet] how do things fail for you without this patch?  I don't doubt that you 
have a problem.  But from your fix, it isn't immediately obvious to me what the 
fix was.  Thanks.

 SSL Handhake read does not correctly handle EOF and error cases
 ---

 Key: TS-3667
 URL: https://issues.apache.org/jira/browse/TS-3667
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Affects Versions: 5.2.0, 5.3.0
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.3.1, 6.0.0

 Attachments: ts-3667.diff


 Reported by [~esproul] and postwait.
 The return value of SSLNetVConnection::read_raw_data() is being ignored.  So 
 EOF and errors are not terminated, but rather spin until the inactivity 
 timeout is reached.  EAGAIN  is not being descheduled until more data is 
 available.
 This results in higher CPU utilization and hitting the SSL_error() function 
 much more than it needs to be hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-3746) We need to make proxy.config.ssl.client.verify.server overridable

2015-07-20 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3746.
--
   Resolution: Won't Fix
Fix Version/s: (was: sometime)

See discussion on PR.  We will not pursue this further here.

 We need to make proxy.config.ssl.client.verify.server overridable
 -

 Key: TS-3746
 URL: https://issues.apache.org/jira/browse/TS-3746
 Project: Traffic Server
  Issue Type: New Feature
  Components: Configuration
Reporter: Syeda Persia Aziz
Assignee: Dave Thompson
  Labels: Yahoo

 We need to make proxy.config.ssl.client.verify.server overridable. Some 
 origin servers need validation to avoid MITM attacks while others don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3596) TSHttpTxnPluginTagGet() returns fetchSM over H2

2015-07-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3596.

Resolution: Fixed

Fixed via TS-3476

 TSHttpTxnPluginTagGet() returns fetchSM over H2
 -

 Key: TS-3596
 URL: https://issues.apache.org/jira/browse/TS-3596
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP/2
Reporter: Scott Beardsley
  Labels: yahoo
 Fix For: 6.1.0


 This should probably return something else, right? Maybe HTTP2 instead? We 
 would like a way to identify H2 requests from SPDY and/or H1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3777) TSHttpConnect and POST request does not fire TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS

2015-08-24 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710072#comment-14710072
 ] 

Susan Hinrichs commented on TS-3777:


Good point.  Rearranged code to do so.

 TSHttpConnect and POST request does not fire TS_VCONN_READ_COMPLETE nor 
 TS_VCONN_EOS
 

 Key: TS-3777
 URL: https://issues.apache.org/jira/browse/TS-3777
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Reporter: Daniel Vitor Morilha
Assignee: Susan Hinrichs
  Labels: yahoo
 Fix For: 6.1.0

 Attachments: ts-3777-2.diff, ts-3777-3.diff, ts-3777-4.diff, 
 ts-3777.diff


 When using TSHttpConnect to connect to ATS itself (internal vconnection), 
 sending a POST request and receiving a CHUNKED response. ATS does not fire 
 neither TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS.
 Trying to close the vconnection from the plug-in after receiving the last 
 chunk (\r\n0\r\n) results into the PluginVC repeating the following message:
 {noformat}
 [Jul 14 21:24:06.094] Server {0x77fbe800} DEBUG: (pvc_event) [0] Passive: 
 Received event 1
 {noformat}
 I am glad to provide an example if that helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-3970) Core in PluginVC

2015-11-11 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3970.
--
Resolution: Invalid

Turns out the problem was in the plugin.  Their clean up code called 
TSIOBufferDestroy() before TSIOBufferReaderFree() on a reader for that buffer.  

This revealed a use after free error that was found quickly with ASAN.  With 
this error, it was possible to have the Buffer reallocated and effectively have 
readers on the newly reallocated buffer cleared randomly.

> Core in PluginVC
> 
>
> Key: TS-3970
> URL: https://issues.apache.org/jira/browse/TS-3970
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: crash
> Fix For: 6.1.0
>
> Attachments: ts-3970.diff
>
>
> One of our plugins moving from 5.0.1 to 5.3.x (plus 6.0 backports) started 
> seeing the following stack trace with high frequency.
> {code}
> Program terminated with signal 11, Segmentation fault.
> #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, 
> other_side_call=true) at PluginVC.cc:638
> in PluginVC.cc
> #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, 
> other_side_call=true) at PluginVC.cc:638
> #1 0x0054be2a in PluginVC::process_write_side (this=0x2b9ace2f3a40, 
> other_side_call=false) at PluginVC.cc:555
> #2 0x0054acdb in PluginVC::main_handler (this=0x2b9ace2f3a40, 
> event=1, data=0x2b9b1e32e930) at PluginVC.cc:208
> #3 0x00510c84 in Continuation::handleEvent (this=0x2b9ace2f3a40, 
> event=1, data=0x2b9b1e32e930) at ../iocore/eventsystem/I_Continuation.h:145
> #4 0x0079a2a6 in EThread::process_event (this=0x2b9ab63d9010, 
> e=0x2b9b1e32e930, calling_code=1) at UnixEThread.cc:128
> #5 0x0079a474 in EThread::execute (this=0x2b9ab63d9010) at 
> UnixEThread.cc:179
> #6 0x00799851 in spawn_thread_internal (a=0x2fea360) at Thread.cc:85
> #7 0x2b9ab457e9d1 in start_thread () from /lib64/libpthread.so.0
> #8 0x0030d38e88fd in clone () from /lib64/libc.so.6
> {code}
> The output buffer fetched by PluginVC::process_read_side was NULL.
> I think they reason this appears in 5.3 is due to the fix for TS-3522.  
> Before that change only one do_io_read was made very early to set up the read 
> from server.  This bug fix delays the real read to later and pulls mbuf out 
> of server_buffer_reader. In some cases for this plugin, the mbuf is NULL by 
> the time we get there.
> I fixed the core by using server_session->read_buffer in the do_io_read 
> instead of server_buffer_reader->mbuf.  This seems to fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3970) Core in PluginVC

2015-10-15 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3970:
--

Assignee: Susan Hinrichs

> Core in PluginVC
> 
>
> Key: TS-3970
> URL: https://issues.apache.org/jira/browse/TS-3970
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> One of our plugins moving from 5.0.1 to 5.3.x (plus 6.0 backports) started 
> seeing the following stack trace with high frequency.
> {code}
> Program terminated with signal 11, Segmentation fault.
> #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, 
> other_side_call=true) at PluginVC.cc:638
> in PluginVC.cc
> #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, 
> other_side_call=true) at PluginVC.cc:638
> #1 0x0054be2a in PluginVC::process_write_side (this=0x2b9ace2f3a40, 
> other_side_call=false) at PluginVC.cc:555
> #2 0x0054acdb in PluginVC::main_handler (this=0x2b9ace2f3a40, 
> event=1, data=0x2b9b1e32e930) at PluginVC.cc:208
> #3 0x00510c84 in Continuation::handleEvent (this=0x2b9ace2f3a40, 
> event=1, data=0x2b9b1e32e930) at ../iocore/eventsystem/I_Continuation.h:145
> #4 0x0079a2a6 in EThread::process_event (this=0x2b9ab63d9010, 
> e=0x2b9b1e32e930, calling_code=1) at UnixEThread.cc:128
> #5 0x0079a474 in EThread::execute (this=0x2b9ab63d9010) at 
> UnixEThread.cc:179
> #6 0x00799851 in spawn_thread_internal (a=0x2fea360) at Thread.cc:85
> #7 0x2b9ab457e9d1 in start_thread () from /lib64/libpthread.so.0
> #8 0x0030d38e88fd in clone () from /lib64/libc.so.6
> {code}
> The output buffer fetched by PluginVC::process_read_side was NULL.
> I think they reason this appears in 5.3 is due to the fix for TS-3522.  
> Before that change only one do_io_read was made very early to set up the read 
> from server.  This bug fix delays the real read to later and pulls mbuf out 
> of server_buffer_reader. In some cases for this plugin, the mbuf is NULL by 
> the time we get there.
> I fixed the core by using server_session->read_buffer in the do_io_read 
> instead of server_buffer_reader->mbuf.  This seems to fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3970) Core in PluginVC

2015-10-15 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3970:
---
Attachment: ts-3970.diff

ts-3970.diff contains the code changes that fixed this crash on our build.

> Core in PluginVC
> 
>
> Key: TS-3970
> URL: https://issues.apache.org/jira/browse/TS-3970
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Attachments: ts-3970.diff
>
>
> One of our plugins moving from 5.0.1 to 5.3.x (plus 6.0 backports) started 
> seeing the following stack trace with high frequency.
> {code}
> Program terminated with signal 11, Segmentation fault.
> #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, 
> other_side_call=true) at PluginVC.cc:638
> in PluginVC.cc
> #0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, 
> other_side_call=true) at PluginVC.cc:638
> #1 0x0054be2a in PluginVC::process_write_side (this=0x2b9ace2f3a40, 
> other_side_call=false) at PluginVC.cc:555
> #2 0x0054acdb in PluginVC::main_handler (this=0x2b9ace2f3a40, 
> event=1, data=0x2b9b1e32e930) at PluginVC.cc:208
> #3 0x00510c84 in Continuation::handleEvent (this=0x2b9ace2f3a40, 
> event=1, data=0x2b9b1e32e930) at ../iocore/eventsystem/I_Continuation.h:145
> #4 0x0079a2a6 in EThread::process_event (this=0x2b9ab63d9010, 
> e=0x2b9b1e32e930, calling_code=1) at UnixEThread.cc:128
> #5 0x0079a474 in EThread::execute (this=0x2b9ab63d9010) at 
> UnixEThread.cc:179
> #6 0x00799851 in spawn_thread_internal (a=0x2fea360) at Thread.cc:85
> #7 0x2b9ab457e9d1 in start_thread () from /lib64/libpthread.so.0
> #8 0x0030d38e88fd in clone () from /lib64/libc.so.6
> {code}
> The output buffer fetched by PluginVC::process_read_side was NULL.
> I think they reason this appears in 5.3 is due to the fix for TS-3522.  
> Before that change only one do_io_read was made very early to set up the read 
> from server.  This bug fix delays the real read to later and pulls mbuf out 
> of server_buffer_reader. In some cases for this plugin, the mbuf is NULL by 
> the time we get there.
> I fixed the core by using server_session->read_buffer in the do_io_read 
> instead of server_buffer_reader->mbuf.  This seems to fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3970) Core in PluginVC

2015-10-15 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3970:
--

 Summary: Core in PluginVC
 Key: TS-3970
 URL: https://issues.apache.org/jira/browse/TS-3970
 Project: Traffic Server
  Issue Type: Bug
Reporter: Susan Hinrichs


One of our plugins moving from 5.0.1 to 5.3.x (plus 6.0 backports) started 
seeing the following stack trace with high frequency.

{code}
Program terminated with signal 11, Segmentation fault.
#0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, 
other_side_call=true) at PluginVC.cc:638
in PluginVC.cc
#0 0x0054c232 in PluginVC::process_read_side (this=0x2b9ace2f3850, 
other_side_call=true) at PluginVC.cc:638
#1 0x0054be2a in PluginVC::process_write_side (this=0x2b9ace2f3a40, 
other_side_call=false) at PluginVC.cc:555
#2 0x0054acdb in PluginVC::main_handler (this=0x2b9ace2f3a40, event=1, 
data=0x2b9b1e32e930) at PluginVC.cc:208
#3 0x00510c84 in Continuation::handleEvent (this=0x2b9ace2f3a40, 
event=1, data=0x2b9b1e32e930) at ../iocore/eventsystem/I_Continuation.h:145
#4 0x0079a2a6 in EThread::process_event (this=0x2b9ab63d9010, 
e=0x2b9b1e32e930, calling_code=1) at UnixEThread.cc:128
#5 0x0079a474 in EThread::execute (this=0x2b9ab63d9010) at 
UnixEThread.cc:179
#6 0x00799851 in spawn_thread_internal (a=0x2fea360) at Thread.cc:85
#7 0x2b9ab457e9d1 in start_thread () from /lib64/libpthread.so.0
#8 0x0030d38e88fd in clone () from /lib64/libc.so.6
{code}

The output buffer fetched by PluginVC::process_read_side was NULL.

I think they reason this appears in 5.3 is due to the fix for TS-3522.  Before 
that change only one do_io_read was made very early to set up the read from 
server.  This bug fix delays the real read to later and pulls mbuf out of 
server_buffer_reader. In some cases for this plugin, the mbuf is NULL by the 
time we get there.

I fixed the core by using server_session->read_buffer in the do_io_read instead 
of server_buffer_reader->mbuf.  This seems to fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.

2015-10-08 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948975#comment-14948975
 ] 

Susan Hinrichs commented on TS-3072:


I ran some tests on my test harness machines.  It is configured in proxy mode 
(no caching) making GET requests.  1KB objects are exchanged.  Three requests 
per connection.  No SSL.  Single 1Gb interface.  I ran my clients at not quite 
resource exhaustion to measure a steady state performance. 

Run cases:
* Base: A build without client_ip debug code.  diags.debug enabled = 0
* New0: A build with client_ip debug code.  diags.debug enabled = 0
* New2-mismatch: A build with client_ip debug code.  diags.debug.enabled = 2 
and diags.debug.client_ip = IP address not involved in the test.

I also tried running with an IP matching one of my test clients and http tag 
set, the client fell over.

Base and New0 had very similar performance.  About 56,600 rps.
New2-mismatch was around 53,900 rps.

So enabling the client_ip even if nothing matches incurs around a 5% 
performance penalty.  But this is only seen if the configuration is explicitly 
set from 0 to 2.  In the depths of an investigation, this might be an 
acceptable performance penalty.



> Debug logging for a single connection in production traffic.
> 
>
> Key: TS-3072
> URL: https://issues.apache.org/jira/browse/TS-3072
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core, Logging
>Affects Versions: 5.0.1
>Reporter: Sudheer Vinukonda
>Assignee: Susan Hinrichs
>  Labels: Yahoo
> Fix For: sometime
>
> Attachments: ts-3072.diff
>
>
> Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is 
> really hard to isolate/debug with the high traffic. Turning on debug logs in 
> traffic is unfortunately not an option due to performance impacts. Even if 
> you took a performance hit and turned on the logs, it is just as hard to 
> separate out the logs for a single connection/transaction among the millions 
> of the logs output in a short period of time.
> I think it would be good if there's a way to turn on debug logs in a 
> controlled manner in production environment. One simple option is to support 
> a config setting for example, with a client-ip, which when set, would turn on 
> debug logs for any connection made by just that one client. If needed, 
> instead of one client-ip, we may allow configuring up to 'n' (say, 5) 
> client-ips. 
> If there are other ideas, please comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3072) Debug logging for a single connection in production traffic.

2015-10-08 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3072:
--

Assignee: Susan Hinrichs

> Debug logging for a single connection in production traffic.
> 
>
> Key: TS-3072
> URL: https://issues.apache.org/jira/browse/TS-3072
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core, Logging
>Affects Versions: 5.0.1
>Reporter: Sudheer Vinukonda
>Assignee: Susan Hinrichs
>  Labels: Yahoo
> Fix For: sometime
>
> Attachments: ts-3072.diff
>
>
> Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is 
> really hard to isolate/debug with the high traffic. Turning on debug logs in 
> traffic is unfortunately not an option due to performance impacts. Even if 
> you took a performance hit and turned on the logs, it is just as hard to 
> separate out the logs for a single connection/transaction among the millions 
> of the logs output in a short period of time.
> I think it would be good if there's a way to turn on debug logs in a 
> controlled manner in production environment. One simple option is to support 
> a config setting for example, with a client-ip, which when set, would turn on 
> debug logs for any connection made by just that one client. If needed, 
> instead of one client-ip, we may allow configuring up to 'n' (say, 5) 
> client-ips. 
> If there are other ideas, please comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.

2015-10-08 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949174#comment-14949174
 ] 

Susan Hinrichs commented on TS-3072:


Poking at my test some more and the numbers are a bit higher.  But based on my 
math, the single 1Gbps connection puts a hard limit of 65536 rps when running 
without caching and having each request exchange 1KB.

Max bytes sent in a second = 1024*1024*1024/8 > number of bytes sent in T 
transactions = 1024*T*2 

1024*1024/16 > T
65536 > T


> Debug logging for a single connection in production traffic.
> 
>
> Key: TS-3072
> URL: https://issues.apache.org/jira/browse/TS-3072
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core, Logging
>Affects Versions: 5.0.1
>Reporter: Sudheer Vinukonda
>Assignee: Susan Hinrichs
>  Labels: Yahoo
> Fix For: sometime
>
> Attachments: ts-3072.diff
>
>
> Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is 
> really hard to isolate/debug with the high traffic. Turning on debug logs in 
> traffic is unfortunately not an option due to performance impacts. Even if 
> you took a performance hit and turned on the logs, it is just as hard to 
> separate out the logs for a single connection/transaction among the millions 
> of the logs output in a short period of time.
> I think it would be good if there's a way to turn on debug logs in a 
> controlled manner in production environment. One simple option is to support 
> a config setting for example, with a client-ip, which when set, would turn on 
> debug logs for any connection made by just that one client. If needed, 
> instead of one client-ip, we may allow configuring up to 'n' (say, 5) 
> client-ips. 
> If there are other ideas, please comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error

2015-10-07 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947457#comment-14947457
 ] 

Susan Hinrichs commented on TS-3894:


My day of typo's writing up git commit comments.  This is the commit that 
belongs to this issue.

commit b3fab36196dc143283364b56b0db802e4dd81bad
Author: shinrich 
Date:   Tue Oct 6 14:00:44 2015 -0500

TS-3984 - Missing NULL checks in HttpSM::handler_server_setup_error.


> Missing NULL checks in HttpSM::handle_server_setup_error
> 
>
> Key: TS-3894
> URL: https://issues.apache.org/jira/browse/TS-3894
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.1.0
>
>
> In error cases, there may not be a consumer when expected.  Missing NULL 
> checks on the consumer variable c can result in crashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error

2015-10-07 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3894.

Resolution: Fixed

> Missing NULL checks in HttpSM::handle_server_setup_error
> 
>
> Key: TS-3894
> URL: https://issues.apache.org/jira/browse/TS-3894
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.1.0
>
>
> In error cases, there may not be a consumer when expected.  Missing NULL 
> checks on the consumer variable c can result in crashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-3710) Crash in TLS with 6.0.0, related to the session cleanup additions

2015-10-07 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3710.
--
Resolution: Fixed

> Crash in TLS with 6.0.0, related to the session cleanup additions
> -
>
> Key: TS-3710
> URL: https://issues.apache.org/jira/browse/TS-3710
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Affects Versions: 5.3.0
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
>  Labels: yahoo
> Fix For: 6.1.0
>
> Attachments: ts-3710-2.diff, ts-3710-8-26-15.diff, 
> ts-3710-final-2.diff, ts-3710.diff
>
>
> {code}
> ==9570==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x60649f48 at pc 0xb9f969 bp 0x2b8dbc348920 sp 0x2b8dbc348918
> READ of size 8 at 0x60649f48 thread T8 ([ET_NET 7])
> #0 0xb9f968 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #1 0xb9f968 in read_signal_and_update 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142
> #2 0xb9f968 in UnixNetVConnection::mainEvent(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1115
> #3 0xb7daf7 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #4 0xb7daf7 in InactivityCop::check_inactivity(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNet.cc:102
> #5 0xc21ffe in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #6 0xc21ffe in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #7 0xc241f7 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207
> #8 0xc20c18 in spawn_thread_internal 
> /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
> #9 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> #10 0x2b8db585f1ac in __clone (/lib64/libc.so.6+0xf61ac)
> 0x60649f48 is located 8 bytes inside of 56-byte region 
> [0x60649f40,0x60649f78)
> freed by thread T8 ([ET_NET 7]) here:
> #0 0x2b8db1bf3117 in operator delete(void*) 
> ../../.././libsanitizer/asan/asan_new_delete.cc:81
> #1 0xb5b20e in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89
> #2 0xbb2eef in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #3 0xbb2eef in read_signal_and_update 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142
> #4 0xbb2eef in read_signal_done 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:203
> #5 0xbb2eef in UnixNetVConnection::readSignalDone(int, NetHandler*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:957
> #6 0xb55d6d in SSLNetVConnection::net_read_io(NetHandler*, EThread*) 
> /usr/local/src/trafficserver/iocore/net/SSLNetVConnection.cc:480
> #7 0xb748fc in NetHandler::mainNetEvent(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNet.cc:516
> #8 0xc24e89 in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #9 0xc24e89 in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #10 0xc24e89 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252
> #11 0xc20c18 in spawn_thread_internal 
> /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
> #12 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> previously allocated by thread T8 ([ET_NET 7]) here:
> #0 0x2b8db1bf2c9f in operator new(unsigned long) 
> ../../.././libsanitizer/asan/asan_new_delete.cc:50
> #1 0xb59f8b in SSLNextProtocolAccept::mainEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:134
> #2 0xb888e9 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #3 0xb888e9 in NetAccept::acceptFastEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetAccept.cc:466
> #4 0xc24e89 in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #5 0xc24e89 in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #6 0xc24e89 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252
> #7 0xc20c18 in spawn_thread_internal 
> /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
> #8 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> Thread 

[jira] [Commented] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error

2015-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945581#comment-14945581
 ] 

Susan Hinrichs commented on TS-3894:


We have been running with this change in production starting 9/4/2015.  Have 
not seen this crash since.

> Missing NULL checks in HttpSM::handle_server_setup_error
> 
>
> Key: TS-3894
> URL: https://issues.apache.org/jira/browse/TS-3894
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.1.0
>
>
> In error cases, there may not be a consumer when expected.  Missing NULL 
> checks on the consumer variable c can result in crashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3957) Core dump from SpdyClientSession::state_session_start

2015-10-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3957.

Resolution: Fixed

> Core dump from SpdyClientSession::state_session_start
> -
>
> Key: TS-3957
> URL: https://issues.apache.org/jira/browse/TS-3957
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: yahoo
> Fix For: 6.1.0
>
>
> We see this in production on machines under swap, so the timings are very 
> distorted.
> {code}
> gdb) bt
> #0  0x in ?? ()
> #1  0x0064a5dc in SpdyClientSession::state_session_start 
> (this=0x2b234fbe8030)
> at SpdyClientSession.cc:211
> #2  0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, 
> event=1, 
> data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145
> #3  0x0079a066 in EThread::process_event (this=0x2b21170a2010, 
> e=0x2b23eda76630, 
> calling_code=1) at UnixEThread.cc:128
> #4  0x0079a234 in EThread::execute (this=0x2b21170a2010) at 
> UnixEThread.cc:179
> #5  0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85
> #6  0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0
> #7  0x003827ee88fd in clone () from /lib64/libc.so.6
> {code}
> After poking around on the core some more [~amc] and I determined that the vc 
> referenced by the SpdyClientSession was a freed object (the vtable pointer 
> was swizzled out to be the freelist next pointer).
> We assume that the swapping is causing very odd event timing.  We replaced 
> the schedule_immediate with a direct call that that seemed to solve our crash 
> in production.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3957) Core dump from SpdyClientSession::state_session_start

2015-10-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3957:
---
Fix Version/s: 6.1.0

> Core dump from SpdyClientSession::state_session_start
> -
>
> Key: TS-3957
> URL: https://issues.apache.org/jira/browse/TS-3957
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: yahoo
> Fix For: 6.1.0
>
>
> We see this in production on machines under swap, so the timings are very 
> distorted.
> {code}
> gdb) bt
> #0  0x in ?? ()
> #1  0x0064a5dc in SpdyClientSession::state_session_start 
> (this=0x2b234fbe8030)
> at SpdyClientSession.cc:211
> #2  0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, 
> event=1, 
> data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145
> #3  0x0079a066 in EThread::process_event (this=0x2b21170a2010, 
> e=0x2b23eda76630, 
> calling_code=1) at UnixEThread.cc:128
> #4  0x0079a234 in EThread::execute (this=0x2b21170a2010) at 
> UnixEThread.cc:179
> #5  0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85
> #6  0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0
> #7  0x003827ee88fd in clone () from /lib64/libc.so.6
> {code}
> After poking around on the core some more [~amc] and I determined that the vc 
> referenced by the SpdyClientSession was a freed object (the vtable pointer 
> was swizzled out to be the freelist next pointer).
> We assume that the swapping is causing very odd event timing.  We replaced 
> the schedule_immediate with a direct call that that seemed to solve our crash 
> in production.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3901) Leaking connections from HttpSessionManager

2015-10-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3901.

Resolution: Fixed

> Leaking connections from HttpSessionManager
> ---
>
> Key: TS-3901
> URL: https://issues.apache.org/jira/browse/TS-3901
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: yahoo
> Fix For: 6.1.0
>
> Attachments: ts-3901.diff
>
>
> Observed in production.  Got the following warnings in diags.log
> "Connection leak from http keep-alive system"
> Our connections to origin would increase and the number of connections in 
> CLOSE_WAIT were enormous.
> I think the issue was when the origin URL was http with default port.  That 
> URL was remapped to https with default port.  The default port stored in 
> HttpServerSession->server_ip was not updated.  
> When the connection was closed or timed out of the session pool, it would be 
> looked up with port 443.   But the session was stored via the server_ip value 
> with port 80 and would never match.
> Relatively small change in HTTPHdr::_file_target_cache. 
> Running the fix in production to verify early results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error

2015-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945581#comment-14945581
 ] 

Susan Hinrichs edited comment on TS-3894 at 10/6/15 7:06 PM:
-

We have been running with this change in production starting 9/4/2015.  Have 
not seen this crash since.

The original crash stack was:

{code}
gdb) bt
#0  0x005f5a45 in HttpSM::handle_server_setup_error 
(this=0x2bada4297f70, event=105, 
data=0x2bad410af588) at HttpSM.cc:5278
#1  0x005e98f9 in HttpSM::state_read_server_response_header 
(this=0x2bada4297f70, 
event=105, data=0x2bad410af588) at HttpSM.cc:1824
#2  0x005ec306 in HttpSM::main_handler (this=0x2bada4297f70, event=105, 
data=0x2bad410af588) at HttpSM.cc:2619
#3  0x00510de4 in Continuation::handleEvent (this=0x2bada4297f70, 
event=105, 
data=0x2bad410af588) at ../iocore/eventsystem/I_Continuation.h:145
#4  0x00778965 in read_signal_and_update (event=105, vc=0x2bad410af470)
at UnixNetVConnection.cc:148
#5  0x0077bfdb in UnixNetVConnection::mainEvent (this=0x2bad410af470, 
event=1, 
e=0x17c5c90) at UnixNetVConnection.cc:1171
#6  0x00510de4 in Continuation::handleEvent (this=0x2bad410af470, 
event=1, data=0x17c5c90)
at ../iocore/eventsystem/I_Continuation.h:145
#7  0x00772d47 in InactivityCop::check_inactivity (this=0x169b440, 
event=2, e=0x17c5c90)
at UnixNet.cc:107
#8  0x00510de4 in Continuation::handleEvent (this=0x169b440, event=2, 
data=0x17c5c90)
at ../iocore/eventsystem/I_Continuation.h:145
#9  0x007997ee in EThread::process_event (this=0x2baa860c4010, 
e=0x17c5c90, 
calling_code=2) at UnixEThread.cc:128
#10 0x00799b09 in EThread::execute (this=0x2baa860c4010) at 
UnixEThread.cc:207
#11 0x00798d99 in spawn_thread_internal (a=0x1691510) at Thread.cc:85
#12 0x2baa8491c9d1 in start_thread () from /lib64/libpthread.so.0
#13 0x0039522e88fd in clone () from /lib64/libc.so.6
{code}



was (Author: shinrich):
We have been running with this change in production starting 9/4/2015.  Have 
not seen this crash since.

> Missing NULL checks in HttpSM::handle_server_setup_error
> 
>
> Key: TS-3894
> URL: https://issues.apache.org/jira/browse/TS-3894
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.1.0
>
>
> In error cases, there may not be a consumer when expected.  Missing NULL 
> checks on the consumer variable c can result in crashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3710) Crash in TLS with 6.0.0, related to the session cleanup additions

2015-10-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3710:
---
Attachment: ts-3710-8-26-15.diff

ts-3710-8-26.diff contains the changes we have been running in production since 
8/26/2015.  We haven't seen this crash on machines running with this build.

This is very similar to the previous diffs.  One slight difference is that we 
are canceling the read before the close case as well as the other cases.

> Crash in TLS with 6.0.0, related to the session cleanup additions
> -
>
> Key: TS-3710
> URL: https://issues.apache.org/jira/browse/TS-3710
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Affects Versions: 5.3.0
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
>  Labels: yahoo
> Fix For: 6.1.0
>
> Attachments: ts-3710-2.diff, ts-3710-8-26-15.diff, 
> ts-3710-final-2.diff, ts-3710.diff
>
>
> {code}
> ==9570==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x60649f48 at pc 0xb9f969 bp 0x2b8dbc348920 sp 0x2b8dbc348918
> READ of size 8 at 0x60649f48 thread T8 ([ET_NET 7])
> #0 0xb9f968 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #1 0xb9f968 in read_signal_and_update 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142
> #2 0xb9f968 in UnixNetVConnection::mainEvent(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1115
> #3 0xb7daf7 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #4 0xb7daf7 in InactivityCop::check_inactivity(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNet.cc:102
> #5 0xc21ffe in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #6 0xc21ffe in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #7 0xc241f7 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207
> #8 0xc20c18 in spawn_thread_internal 
> /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
> #9 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> #10 0x2b8db585f1ac in __clone (/lib64/libc.so.6+0xf61ac)
> 0x60649f48 is located 8 bytes inside of 56-byte region 
> [0x60649f40,0x60649f78)
> freed by thread T8 ([ET_NET 7]) here:
> #0 0x2b8db1bf3117 in operator delete(void*) 
> ../../.././libsanitizer/asan/asan_new_delete.cc:81
> #1 0xb5b20e in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89
> #2 0xbb2eef in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #3 0xbb2eef in read_signal_and_update 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142
> #4 0xbb2eef in read_signal_done 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:203
> #5 0xbb2eef in UnixNetVConnection::readSignalDone(int, NetHandler*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:957
> #6 0xb55d6d in SSLNetVConnection::net_read_io(NetHandler*, EThread*) 
> /usr/local/src/trafficserver/iocore/net/SSLNetVConnection.cc:480
> #7 0xb748fc in NetHandler::mainNetEvent(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNet.cc:516
> #8 0xc24e89 in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #9 0xc24e89 in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #10 0xc24e89 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252
> #11 0xc20c18 in spawn_thread_internal 
> /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
> #12 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> previously allocated by thread T8 ([ET_NET 7]) here:
> #0 0x2b8db1bf2c9f in operator new(unsigned long) 
> ../../.././libsanitizer/asan/asan_new_delete.cc:50
> #1 0xb59f8b in SSLNextProtocolAccept::mainEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:134
> #2 0xb888e9 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #3 0xb888e9 in NetAccept::acceptFastEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetAccept.cc:466
> #4 0xc24e89 in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #5 0xc24e89 in EThread::process_event(Event*, int) 
> 

[jira] [Commented] (TS-3957) Core dump from SpdyClientSession::state_session_start

2015-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945533#comment-14945533
 ] 

Susan Hinrichs commented on TS-3957:


Our change was first partially put in production 9/4.  We haven't seen any more 
crashes like this on that build.  We have run into at least one resource storm 
that caused this problem originally.

> Core dump from SpdyClientSession::state_session_start
> -
>
> Key: TS-3957
> URL: https://issues.apache.org/jira/browse/TS-3957
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: yahoo
>
> We see this in production on machines under swap, so the timings are very 
> distorted.
> {code}
> gdb) bt
> #0  0x in ?? ()
> #1  0x0064a5dc in SpdyClientSession::state_session_start 
> (this=0x2b234fbe8030)
> at SpdyClientSession.cc:211
> #2  0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, 
> event=1, 
> data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145
> #3  0x0079a066 in EThread::process_event (this=0x2b21170a2010, 
> e=0x2b23eda76630, 
> calling_code=1) at UnixEThread.cc:128
> #4  0x0079a234 in EThread::execute (this=0x2b21170a2010) at 
> UnixEThread.cc:179
> #5  0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85
> #6  0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0
> #7  0x003827ee88fd in clone () from /lib64/libc.so.6
> {code}
> After poking around on the core some more [~amc] and I determined that the vc 
> referenced by the SpdyClientSession was a freed object (the vtable pointer 
> was swizzled out to be the freelist next pointer).
> We assume that the swapping is causing very odd event timing.  We replaced 
> the schedule_immediate with a direct call that that seemed to solve our crash 
> in production.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3701) link Cache Promote Plugin document into index and fix spell in records.config.en.rst

2015-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945957#comment-14945957
 ] 

Susan Hinrichs commented on TS-3701:


Typoed the bug number in the commit.  The commit above belongs to TS-3710.


> link Cache Promote Plugin document into index and fix spell in 
> records.config.en.rst
> 
>
> Key: TS-3701
> URL: https://issues.apache.org/jira/browse/TS-3701
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Docs
>Reporter: Oknet Xu
>Assignee: Jon Sime
> Fix For: Docs
>
>
> here is the patch:
> {code}
> diff --git a/doc/reference/configuration/records.config.en.rst 
> b/doc/reference/configuration/records.config.en.rst
> index 2c7267b..5c203a6 100644
> --- a/doc/reference/configuration/records.config.en.rst
> +++ b/doc/reference/configuration/records.config.en.rst
> @@ -2017,7 +2017,7 @@ Logging Configuration
>  - ``log_name`` STRING [format]
>  The filename (ex. :ref:`squid log `).
>  
> -- ``log_header_ STRING NULL
> +- ``log_header`` STRING NULL
>  The file header text (ex. :ref:`squid log 
> `).
>  
>  The format can be either ``squid`` (Squid Format), ``common`` (Netscape 
> Common),  ``extended`` (Netscape Extended),
> diff --git a/doc/reference/plugins/index.en.rst 
> b/doc/reference/plugins/index.en.rst
> index 0e43b87..722cc4c 100644
> --- a/doc/reference/plugins/index.en.rst
> +++ b/doc/reference/plugins/index.en.rst
> @@ -67,6 +67,7 @@ directory of the Apache Traffic Server source tree. 
> Experimental plugins can be
>Background Fetch Plugin: allows you to proactively fetch content from 
> Origin in a way that it will fill the object into cache 
>Balancer Plugin: balances requests across multiple origin servers 
> 
>Buffer Upload Plugin: buffers POST data before connecting to the Origin 
> server 
> +  Cache Promote Plugin: provides a means to control when an object should be 
> allowed to enter the cache 
>Combohandler Plugin: provides an intelligent way to combine multiple URLs 
> into a single URL, and have Apache Traffic Server combine the components into 
> one response 
>Epic Plugin: emits Traffic Server metrics in a format that is consumed tby 
> the Epic Network Monitoring System 
>ESI Plugin: implements the ESI specification 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3710) Crash in TLS with 6.0.0, related to the session cleanup additions

2015-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945958#comment-14945958
 ] 

Susan Hinrichs commented on TS-3710:


Typoed the bug number in the commit comment.  This commit belongs with this 
issue.

Commit 1859562086b330eed6eda637f5f98a3431db5915 in trafficserver's branch 
refs/heads/master from shinrich
[ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=1859562 ]
TS-3701 - Crash in trampoline cleanup

> Crash in TLS with 6.0.0, related to the session cleanup additions
> -
>
> Key: TS-3710
> URL: https://issues.apache.org/jira/browse/TS-3710
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Affects Versions: 5.3.0
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Critical
>  Labels: yahoo
> Fix For: 6.1.0
>
> Attachments: ts-3710-2.diff, ts-3710-8-26-15.diff, 
> ts-3710-final-2.diff, ts-3710.diff
>
>
> {code}
> ==9570==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x60649f48 at pc 0xb9f969 bp 0x2b8dbc348920 sp 0x2b8dbc348918
> READ of size 8 at 0x60649f48 thread T8 ([ET_NET 7])
> #0 0xb9f968 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #1 0xb9f968 in read_signal_and_update 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142
> #2 0xb9f968 in UnixNetVConnection::mainEvent(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1115
> #3 0xb7daf7 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #4 0xb7daf7 in InactivityCop::check_inactivity(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNet.cc:102
> #5 0xc21ffe in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #6 0xc21ffe in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #7 0xc241f7 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:207
> #8 0xc20c18 in spawn_thread_internal 
> /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
> #9 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> #10 0x2b8db585f1ac in __clone (/lib64/libc.so.6+0xf61ac)
> 0x60649f48 is located 8 bytes inside of 56-byte region 
> [0x60649f40,0x60649f78)
> freed by thread T8 ([ET_NET 7]) here:
> #0 0x2b8db1bf3117 in operator delete(void*) 
> ../../.././libsanitizer/asan/asan_new_delete.cc:81
> #1 0xb5b20e in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89
> #2 0xbb2eef in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #3 0xbb2eef in read_signal_and_update 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:142
> #4 0xbb2eef in read_signal_done 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:203
> #5 0xbb2eef in UnixNetVConnection::readSignalDone(int, NetHandler*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:957
> #6 0xb55d6d in SSLNetVConnection::net_read_io(NetHandler*, EThread*) 
> /usr/local/src/trafficserver/iocore/net/SSLNetVConnection.cc:480
> #7 0xb748fc in NetHandler::mainNetEvent(int, Event*) 
> /usr/local/src/trafficserver/iocore/net/UnixNet.cc:516
> #8 0xc24e89 in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #9 0xc24e89 in EThread::process_event(Event*, int) 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #10 0xc24e89 in EThread::execute() 
> /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:252
> #11 0xc20c18 in spawn_thread_internal 
> /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:85
> #12 0x2b8db3ff6df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> previously allocated by thread T8 ([ET_NET 7]) here:
> #0 0x2b8db1bf2c9f in operator new(unsigned long) 
> ../../.././libsanitizer/asan/asan_new_delete.cc:50
> #1 0xb59f8b in SSLNextProtocolAccept::mainEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/SSLNextProtocolAccept.cc:134
> #2 0xb888e9 in Continuation::handleEvent(int, void*) 
> ../../iocore/eventsystem/I_Continuation.h:145
> #3 0xb888e9 in NetAccept::acceptFastEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/net/UnixNetAccept.cc:466
> #4 0xc24e89 in Continuation::handleEvent(int, void*) 
> /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:145
> #5 0xc24e89 in EThread::process_event(Event*, int) 
> 

[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.

2015-10-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950607#comment-14950607
 ] 

Susan Hinrichs commented on TS-3072:


Ignore my previous performance numbers.  I had some servers missing when I ran 
that. 

I ran a cache scenario on my 1GB network machine and 4 multi-threaded clients.  
The clients fetch a cached 512 byte item.  I'm still testing three cases.  In 
addition to rps changes, I noted the change in CPU % utilization from perf top 
for Diags:on and pthread_get_specific.

Base (not including this code change).  163125 rps.  .97% in Diags::on and 
0.61% in pthread_get_specific

New (enable set to 0).  163169 rps. 1.19% in Diags:on and 0.58% in 
pthread_get_specific

New (enable set to 2). 162777 rps. 1.3% Diags::on and 1.06% in 
pthread_get_specific

So the impact of enabling the Debug IP checking but not actually matching (and 
logging) seems pretty minimal.  In these experiments, we spend roughly extra 
0.75% CPU and lose roughly 400 rps (or 0.2% reduction).

No real impact in adding the code but leaving debug.enabled at 0.


> Debug logging for a single connection in production traffic.
> 
>
> Key: TS-3072
> URL: https://issues.apache.org/jira/browse/TS-3072
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core, Logging
>Affects Versions: 5.0.1
>Reporter: Sudheer Vinukonda
>Assignee: Susan Hinrichs
>  Labels: Yahoo
> Fix For: sometime
>
> Attachments: ts-3072.diff
>
>
> Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is 
> really hard to isolate/debug with the high traffic. Turning on debug logs in 
> traffic is unfortunately not an option due to performance impacts. Even if 
> you took a performance hit and turned on the logs, it is just as hard to 
> separate out the logs for a single connection/transaction among the millions 
> of the logs output in a short period of time.
> I think it would be good if there's a way to turn on debug logs in a 
> controlled manner in production environment. One simple option is to support 
> a config setting for example, with a client-ip, which when set, would turn on 
> debug logs for any connection made by just that one client. If needed, 
> instead of one client-ip, we may allow configuring up to 'n' (say, 5) 
> client-ips. 
> If there are other ideas, please comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.

2015-10-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950359#comment-14950359
 ] 

Susan Hinrichs commented on TS-3072:


[~zwoop]  thanks for reminding me of duplexing.  My rps had risen above 65K rps 
and I has very confused.  

[~bcall] agreed that for real performance testing we should be working on 
machines with 10G interfaces.  For the purposes of this issue though I don't 
think we really care about absolute performance.  We just need to have an 
understanding of the penalty of enabling a IP-specific debug, and verifying 
that this code change doesn't affect performance if debug is turned off 
entirely. 

I'll spend some more time today sorting out my test setup, and post updated 
comparisons and add numbers for the cached case as well.

> Debug logging for a single connection in production traffic.
> 
>
> Key: TS-3072
> URL: https://issues.apache.org/jira/browse/TS-3072
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core, Logging
>Affects Versions: 5.0.1
>Reporter: Sudheer Vinukonda
>Assignee: Susan Hinrichs
>  Labels: Yahoo
> Fix For: sometime
>
> Attachments: ts-3072.diff
>
>
> Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is 
> really hard to isolate/debug with the high traffic. Turning on debug logs in 
> traffic is unfortunately not an option due to performance impacts. Even if 
> you took a performance hit and turned on the logs, it is just as hard to 
> separate out the logs for a single connection/transaction among the millions 
> of the logs output in a short period of time.
> I think it would be good if there's a way to turn on debug logs in a 
> controlled manner in production environment. One simple option is to support 
> a config setting for example, with a client-ip, which when set, would turn on 
> debug logs for any connection made by just that one client. If needed, 
> instead of one client-ip, we may allow configuring up to 'n' (say, 5) 
> client-ips. 
> If there are other ideas, please comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.

2015-10-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950379#comment-14950379
 ] 

Susan Hinrichs commented on TS-3072:


[~jpe...@apache.org] let me ponder that some more.  But here are my first 
thoughts.

The transaction is not always (or easily) available from the VC level which is 
where many useful debug messages lie.  We could push the debug_override flag 
from the continuation down into the NetworkVC class.  As it turns out I only 
ended up using the debug_override on the netVC's.  So that would eliminate 
polluting the top Continuation class.

It would be nice to not use thread local storage.  The motivation to use thread 
local storage was to minimize code change, and ease the inclusion of future 
Debug messages into the conditional debug scheme.  I'll look again to see of 
other data structures are always available to Diag and the point of debug 
decision making.

One could add a Plugin call to adjust the debug_override flag from the 
transaction object (assuming one could get access to the netvc from the 
transaction) or  from the session object.  Though I guess a tricky bit of doing 
a per transaction debug they way I have things set up is only debugging one 
transaction but not the others on the same net VC (in the case of HTTP2 or 
SPDY).  

> Debug logging for a single connection in production traffic.
> 
>
> Key: TS-3072
> URL: https://issues.apache.org/jira/browse/TS-3072
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core, Logging
>Affects Versions: 5.0.1
>Reporter: Sudheer Vinukonda
>Assignee: Susan Hinrichs
>  Labels: Yahoo
> Fix For: sometime
>
> Attachments: ts-3072.diff
>
>
> Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is 
> really hard to isolate/debug with the high traffic. Turning on debug logs in 
> traffic is unfortunately not an option due to performance impacts. Even if 
> you took a performance hit and turned on the logs, it is just as hard to 
> separate out the logs for a single connection/transaction among the millions 
> of the logs output in a short period of time.
> I think it would be good if there's a way to turn on debug logs in a 
> controlled manner in production environment. One simple option is to support 
> a config setting for example, with a client-ip, which when set, would turn on 
> debug logs for any connection made by just that one client. If needed, 
> instead of one client-ip, we may allow configuring up to 'n' (say, 5) 
> client-ips. 
> If there are other ideas, please comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3742) ATS advertises TLS ticket extension even when disabled

2015-07-08 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618446#comment-14618446
 ] 

Susan Hinrichs commented on TS-3742:


Then your workaround for this issue until we can address it properly via 
TS-3371 is to add the following line to make an explicit default

dest_ip=* ssl_cert_name=certx.pem ssl_ticket_enabled=0

certx.pem can be one of your existing cert files, or a new key pair.  The 
downside of this approach is that you will have a cert (probably bogus) for all 
SSL connection attempts.  That may not be worth cleaning up your ticket 
advertising.

 ATS advertises TLS ticket extension even when disabled
 --

 Key: TS-3742
 URL: https://issues.apache.org/jira/browse/TS-3742
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs

 Noted by [~hreindl].  Even if you have ssl_ticket_enabled=0 on the relevant 
 line in ssl_multicert.config, the Server Hello message will still contain the 
 ticket tls extension.
 The problem is the code is blindly resetting the ticket callback on the 
 context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-3742) ATS advertises TLS ticket extension even when disabled

2015-07-07 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3742.
--
Resolution: Won't Fix

 ATS advertises TLS ticket extension even when disabled
 --

 Key: TS-3742
 URL: https://issues.apache.org/jira/browse/TS-3742
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs

 Noted by [~hreindl].  Even if you have ssl_ticket_enabled=0 on the relevant 
 line in ssl_multicert.config, the Server Hello message will still contain the 
 ticket tls extension.
 The problem is the code is blindly resetting the ticket callback on the 
 context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3742) ATS advertises TLS ticket extension even when disabled

2015-07-07 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616676#comment-14616676
 ] 

Susan Hinrichs commented on TS-3742:


My original observation on the cause of this issue is incorrect.  The real 
problem is that whether tickets are enabled or not is controlled by the default 
entry in ssl_multicert.config or by the built in default which is created if no 
'*' entry is present in ssl_multicert.config.

The code dutifully sets or clears SSL_OP_NO_TICKET for each SSL_CTX based on 
the ssl_ticket_enabled flag (which is on by default).  But by the time the code 
updates the SSL_CTX for the active SSL object in the SNI callback, the state 
about the tickets already seems to be set in the SSL object.  I tried calling 
SSL_clear_options and SSL_set_options to make the SSL object have the same 
value as the SSL_CTX object with respect to the SSL_OP_NO_TICKET flag, but it 
did not change whether the server hello advertised tickets or not.  It kept to 
the same state as was set on the original default SSL_CTX.

So there seems to be no code change that will enable tickets by default but 
disable them for a particular entry (or visa versa).  As it stands, the 
ssl_ticket_enabled on the default entry controls whether tickets are 
advertised.  If there is no default entry, the builtin default will have 
tickets enabled.

The solution seems to be to implement TS-3371 and provide a global 
enable/disable for tickets.

My tests were done with openssl 1.0.1f.  Things may vary between different 
versions of openssl.

 ATS advertises TLS ticket extension even when disabled
 --

 Key: TS-3742
 URL: https://issues.apache.org/jira/browse/TS-3742
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs

 Noted by [~hreindl].  Even if you have ssl_ticket_enabled=0 on the relevant 
 line in ssl_multicert.config, the Server Hello message will still contain the 
 ticket tls extension.
 The problem is the code is blindly resetting the ticket callback on the 
 context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3683) Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused

2015-07-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620419#comment-14620419
 ] 

Susan Hinrichs commented on TS-3683:


Sorry, I had a git log mishap when pushing the commit.  Instead of the nice 
single git log entry, the push went up as four commits.

{code}
da04362227ef91b27aa7d02e9238f1ceae68689d
f3e13664ab20f60cb4bd2ffef1eb7d6a374a1698
5a4350e6067ac868e54538467ec83a9413853143
71752c741ac8b49d432dd4b13f5ea2a7f176b37e
{code}

 Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused
 

 Key: TS-3683
 URL: https://issues.apache.org/jira/browse/TS-3683
 Project: Traffic Server
  Issue Type: Improvement
  Components: Logging
Reporter: François Pesce
Assignee: Alan M. Carroll
  Labels: yahoo
 Fix For: 6.1.0


 These tags would be useful for performance metrics collection:
 %cqtr The TCP reused status; indicates if this request went through an 
 already established connection.
 %cqssr The SSL session/ticket reused status; indicates if this request hit 
 the SSL session/ticket and avoided a full SSL handshake.
 both of them would display respectively 0 or 1 , if resp. not reused or 
 reused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3683) Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused

2015-07-09 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3683:
---
Assignee: François Pesce  (was: Alan M. Carroll)

 Add a tag to log SSL Session/Ticket HIT as well as TCP connection reused
 

 Key: TS-3683
 URL: https://issues.apache.org/jira/browse/TS-3683
 Project: Traffic Server
  Issue Type: Improvement
  Components: Logging
Reporter: François Pesce
Assignee: François Pesce
  Labels: yahoo
 Fix For: 6.1.0


 These tags would be useful for performance metrics collection:
 %cqtr The TCP reused status; indicates if this request went through an 
 already established connection.
 %cqssr The SSL session/ticket reused status; indicates if this request hit 
 the SSL session/ticket and avoided a full SSL handshake.
 both of them would display respectively 0 or 1 , if resp. not reused or 
 reused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3596) TSHttpTxnPluginTagGet() returns fetchSM over H2

2015-07-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620471#comment-14620471
 ] 

Susan Hinrichs commented on TS-3596:


With [~es]'s fix for TS-3476, PluginGetTag will return http/2 for the Http/2 
case.

[~jpe...@apache.org], agreed this is weird because Http/2 is not a plugin, but 
it is currently implemented by plugin framework.  We're abusing that to quickly 
get some access to the ultimate protocol for logging protocols.  Hopefully with 
a fix for TS-3612, this can all get cleaned up.

 TSHttpTxnPluginTagGet() returns fetchSM over H2
 -

 Key: TS-3596
 URL: https://issues.apache.org/jira/browse/TS-3596
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP/2
Reporter: Scott Beardsley
  Labels: yahoo
 Fix For: 6.1.0


 This should probably return something else, right? Maybe HTTP2 instead? We 
 would like a way to identify H2 requests from SPDY and/or H1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3293) Need to review various protocol accept objects and make them more widely available

2015-07-09 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3293:
---
Assignee: Dave Thompson  (was: Susan Hinrichs)

 Need to review various protocol accept objects and make them more widely 
 available
 --

 Key: TS-3293
 URL: https://issues.apache.org/jira/browse/TS-3293
 Project: Traffic Server
  Issue Type: Bug
Reporter: Susan Hinrichs
Assignee: Dave Thompson
 Fix For: 6.1.0


 This came up most recently in propagating tr-pass information for TS-3292
 The early configuration is being duplicated in too many objects.  The 
 information is being propagated differently for HTTP and SSL (who knows what 
 is happening with SPDY).  We should take a step back to review and unify this 
 information.  
 Alan took a first pass on this review with his Early Intervention talk from 
 the Fall 2014 summit 
 https://www.dropbox.com/s/4vw91czj41rdxjo/ATS-Early-Intervention.pptx?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3656) Activating follow redirection in send server response hook does not work for post

2015-07-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3656:
---
Fix Version/s: (was: 6.0.0)
   6.1.0

 Activating follow redirection in send server response hook does not work for 
 post
 -

 Key: TS-3656
 URL: https://issues.apache.org/jira/browse/TS-3656
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 6.1.0


 If you have a plugin on the TS_HTTP_SEND_RESPONSE_HDR_HOOK, calls 
 TSHttpTxnFollowRedirect(txn, 1), redirecting a POST request will fail.
 In the not so bad case, the POST request will be redirected to the new 
 location, but the POST data will be lost.
 In the more bad case, ATS will crash.
 The issue is that the post_redirect buffers are freed early on.  One could 
 delay the post_redirect deallocation until later in the transaction.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3486) Segfault in do_io_write with plugin (??)

2015-07-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622152#comment-14622152
 ] 

Susan Hinrichs commented on TS-3486:


I think the plan was to get a 5.3.2 out pretty quickly with this fix and one 
other.  I don't think we back point to already released point releases.

 Segfault in do_io_write with plugin (??)
 

 Key: TS-3486
 URL: https://issues.apache.org/jira/browse/TS-3486
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 5.2.0, 5.3.0
Reporter: Qiang Li
Assignee: Susan Hinrichs
  Labels: crash
 Fix For: 6.0.0

 Attachments: ts-3266-2.diff, ts-3266-complete.diff, 
 ts3486-ptrace.txt.gz


 {code}
 (gdb) bt
 #0  0x005bdb8b in HttpServerSession::do_io_write (this=value 
 optimized out, c=0x2aaadccc4bf0, nbytes=576, buf=0x2aaafc2ffee8, owner=false)
 at HttpServerSession.cc:104
 #1  0x005acc1d in HttpSM::setup_server_send_request 
 (this=0x2aaadccc4bf0) at HttpSM.cc:5686
 #2  0x005b3f85 in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
 HttpSM.cc:1520
 #3  0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
 event=6, data=0x0) at HttpSM.cc:1455
 #4  0x005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, 
 event=6, data=0x0) at HttpSM.cc:1275
 #5  0x004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, 
 event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614
 #6  0x2ba118441c89 in cachefun (contp=value optimized out, event=value 
 optimized out, edata=0x2aaadccc4bf0) at main.cpp:1876
 #7  0x005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
 event=value optimized out, data=value optimized out) at HttpSM.cc:1381
 #8  0x005b627d in HttpSM::do_http_server_open (this=0x2aaadccc4bf0, 
 raw=value optimized out) at HttpSM.cc:4639
 #9  0x005baa04 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
 HttpSM.cc:7021
 #10 0x005b25a3 in HttpSM::state_cache_open_write 
 (this=0x2aaadccc4bf0, event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2442
 #11 0x005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, 
 event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2554
 #12 0x0059338a in handleEvent (this=0x2aaadccc6618, event=value 
 optimized out, data=0x2aab1c3b6800) at 
 ../../iocore/eventsystem/I_Continuation.h:145
 #13 HttpCacheSM::state_cache_open_write (this=0x2aaadccc6618, event=value 
 optimized out, data=0x2aab1c3b6800) at HttpCacheSM.cc:167
 #14 0x00697223 in handleEvent (this=0x2aab1c3b6800, event=value 
 optimized out) at ../../iocore/eventsystem/I_Continuation.h:145
 #15 CacheVC::callcont (this=0x2aab1c3b6800, event=value optimized out) at 
 ../../iocore/cache/P_CacheInternal.h:662
 #16 0x00715940 in Cache::open_write (this=value optimized out, 
 cont=value optimized out, key=0x2ba0ff762d70, info=value optimized out, 
 apin_in_cache=46914401429576, type=CACHE_FRAG_TYPE_HTTP, 
 hostname=0x2aaadd281078 
 www.mifangba.comhttpapi.phpwww.mifangba.comhttp://www.mifangba.com/api.php?op=countid=4modelid=12;,
  host_len=16) at CacheWrite.cc:1788
 #17 0x006e5765 in open_write (this=value optimized out, 
 cont=0x2aaadccc6618, expected_size=value optimized out, url=0x2aaadccc5310, 
 cluster_cache_local=value optimized out, request=value optimized out, 
 old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
 P_CacheInternal.h:1093
 #18 CacheProcessor::open_write (this=value optimized out, 
 cont=0x2aaadccc6618, expected_size=value optimized out, url=0x2aaadccc5310, 
 cluster_cache_local=value optimized out, request=value optimized out, 
 old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3622
 #19 0x005936f0 in HttpCacheSM::open_write (this=0x2aaadccc6618, 
 url=value optimized out, request=value optimized out, old_info=value 
 optimized out, 
 pin_in_cache=value optimized out, retry=value optimized out, 
 allow_multiple=false) at HttpCacheSM.cc:298
 #20 0x005a022e in HttpSM::do_cache_prepare_action 
 (this=0x2aaadccc4bf0, c_sm=0x2aaadccc6618, object_read_info=0x0, retry=true, 
 allow_multiple=false) at HttpSM.cc:4511
 #21 0x005babd9 in do_cache_prepare_write (this=0x2aaadccc4bf0) at 
 HttpSM.cc:4436
 #22 HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:7098
 #23 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
 HttpSM.cc:1517
 #24 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
 event=0, data=0x0) at HttpSM.cc:1455
 #25 0x005ba712 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
 HttpSM.cc:6876
 #26 0x005ba702 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
 HttpSM.cc:6919
 #27 0x005b3f5f in HttpSM::handle_api_return 

[jira] [Updated] (TS-1007) SSN Close called before TXN Close

2015-07-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-1007:
---
Assignee: Susan Hinrichs  (was: Alan M. Carroll)

 SSN Close called before TXN Close
 -

 Key: TS-1007
 URL: https://issues.apache.org/jira/browse/TS-1007
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.0.1
Reporter: Nick Kew
Assignee: Susan Hinrichs
  Labels: incompatible
 Fix For: 6.0.0


 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the 
 SSN_CLOSE_HOOK is called first of the two.  This messes up normal cleanups!
 Details:
   Register a SSN_START event globally
   In the SSN START, add a TXN_START and a SSN_CLOSE
   In the TXN START, add a TXN_CLOSE
 Stepping through, I see the order of events actually called, for the simple 
 case of a one-off HTTP request with no keepalive:
 SSN_START
 TXN_START
 SSN_END
 TXN_END
 Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the 
 TXN!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3746) We need to make proxy.config.ssl.client.verify.server overridable

2015-07-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622785#comment-14622785
 ] 

Susan Hinrichs commented on TS-3746:


Are you asking why you don't just verify all certificates from all origins?  
That is what I would prefer from a security perspective.  But from an 
organizational perspective, not everyone is ready to bet connectivity that all 
the verifying certs are distributed appropriately.

Actually the override can be set from within a transaction, since this is the 
connection from ATS to the origin server which would only happen within the 
context of a transaction.

 We need to make proxy.config.ssl.client.verify.server overridable
 -

 Key: TS-3746
 URL: https://issues.apache.org/jira/browse/TS-3746
 Project: Traffic Server
  Issue Type: New Feature
  Components: Configuration
Reporter: Syeda Persia Aziz
  Labels: Yahoo
 Fix For: sometime


 We need to make proxy.config.ssl.client.verify.server overridable. Some 
 origin servers need validation to avoid MITM attacks while others don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-1007) SSN Close called before TXN Close

2015-07-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622820#comment-14622820
 ] 

Susan Hinrichs commented on TS-1007:


Research notes.

Case 1, HTTP1.1 over TCP: SSN Start, TXN Start, TXN Close, SSN Close
All is well!

Case 2, HTTP 1.1 over SSL: SSN Start, TXN Start, TXN Close, TXN Start, SSN 
Close, TXN Close
Look like the keep_alive logic is being triggered by a read ready to set up a 
new TXN

Case 3, HTTP 1.1 over H2: SSN Start X, SSN Start Y, TXN Start, SSN Close Y, TXN 
Close, SSN Close X
Two Sessions, not so good.

Need to recompile to get the SPDY case set up.

 SSN Close called before TXN Close
 -

 Key: TS-1007
 URL: https://issues.apache.org/jira/browse/TS-1007
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.0.1
Reporter: Nick Kew
Assignee: Susan Hinrichs
  Labels: incompatible
 Fix For: 6.0.0


 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the 
 SSN_CLOSE_HOOK is called first of the two.  This messes up normal cleanups!
 Details:
   Register a SSN_START event globally
   In the SSN START, add a TXN_START and a SSN_CLOSE
   In the TXN START, add a TXN_CLOSE
 Stepping through, I see the order of events actually called, for the simple 
 case of a one-off HTTP request with no keepalive:
 SSN_START
 TXN_START
 SSN_END
 TXN_END
 Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the 
 TXN!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-1007) SSN Close called before TXN Close

2015-07-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622820#comment-14622820
 ] 

Susan Hinrichs edited comment on TS-1007 at 7/10/15 8:19 PM:
-

Research notes.

Case 1, HTTP1.1 over TCP: SSN Start, TXN Start, TXN Close, SSN Close
All is well!

Case 2, HTTP 1.1 over SSL: SSN Start, TXN Start, TXN Close, SSN Close
Also works.

Case 2, HTTP 1.1 over SSL with a redirect (i.e. two request from the client 
over the same connection): SSN Start, TXN Start, TXN Close, TXN Start, SSN 
Close, TXN Close
Problems.

Case 3, HTTP 1.1 over H2: SSN Start X, SSN Start Y, TXN Start, SSN Close Y, TXN 
Close, SSN Close X
Two Sessions, not so good.

Need to recompile to get the SPDY case set up.


was (Author: shinrich):
Research notes.

Case 1, HTTP1.1 over TCP: SSN Start, TXN Start, TXN Close, SSN Close
All is well!

Case 2, HTTP 1.1 over SSL: SSN Start, TXN Start, TXN Close, TXN Start, SSN 
Close, TXN Close
Look like the keep_alive logic is being triggered by a read ready to set up a 
new TXN

Case 3, HTTP 1.1 over H2: SSN Start X, SSN Start Y, TXN Start, SSN Close Y, TXN 
Close, SSN Close X
Two Sessions, not so good.

Need to recompile to get the SPDY case set up.

 SSN Close called before TXN Close
 -

 Key: TS-1007
 URL: https://issues.apache.org/jira/browse/TS-1007
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.0.1
Reporter: Nick Kew
Assignee: Susan Hinrichs
  Labels: incompatible
 Fix For: 6.0.0


 Where a plugin implements both SSN_CLOSE_HOOK and TXN_CLOSE_HOOK, the 
 SSN_CLOSE_HOOK is called first of the two.  This messes up normal cleanups!
 Details:
   Register a SSN_START event globally
   In the SSN START, add a TXN_START and a SSN_CLOSE
   In the TXN START, add a TXN_CLOSE
 Stepping through, I see the order of events actually called, for the simple 
 case of a one-off HTTP request with no keepalive:
 SSN_START
 TXN_START
 SSN_END
 TXN_END
 Whoops, SSN_END cleaned up the SSN context, leaving dangling pointers in the 
 TXN!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3871) VC Migration Can Lose Events

2015-08-27 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3871:
--

Assignee: Susan Hinrichs

 VC Migration Can Lose Events
 

 Key: TS-3871
 URL: https://issues.apache.org/jira/browse/TS-3871
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs

 Found this in my stress testing.  Sometimes the POST or GET response is 
 completely empty.  No header and no body.  The packet capture shows that ATS 
 closes the connection 70 seconds after the last POST or GET of the connection 
 was received.  This corresponds to the 
 proxy.config.http.keep_alive_no_activity_timeout_in on my test box.
 I moved from global pool to local pool and the problem went away.
 I eventually tracked it down to a problem in the epoll update.  ep.start() 
 during the migration would fail sometimes with EEXIST error.  This means that 
 the file descriptor is already associated with the epoll.  If we are 
 migrating from thread A to thread B this should not be the case.  Unless we 
 when from thread B to thread A and back to thread B without cleaning up the 
 original thread B epoll.  If this is happening, then multiple threads will be 
 processing network events which seems like a recipe for disaster and dropped 
 events.
 Originally, I left the ep.stop() which clears the epoll on the original 
 thread's epoll structure to be done by the original thread.  But under stress 
 that seems to be a bad idea.  Too much drift.  With some more research, it 
 appears that the epoll calls are thread safe.
 http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html
 I rearranged the code to do both the ep.stop() and ep.start() in the same 
 migrating target thread, and my stress test had no more problems.
 I've run this patch on a production machine for over 12 hours with no crashes 
 and no performance discrepancies.  We will be expanding this testing.
 To repeat, this is not a problem we saw in production, but only in my make 
 it fall over stress test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3871) VC Migration Can Lose Events

2015-08-27 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3871:
--

 Summary: VC Migration Can Lose Events
 Key: TS-3871
 URL: https://issues.apache.org/jira/browse/TS-3871
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs


Found this in my stress testing.  Sometimes the POST or GET response is 
completely empty.  No header and no body.  The packet capture shows that ATS 
closes the connection 70 seconds after the last POST or GET of the connection 
was received.  This corresponds to the 
proxy.config.http.keep_alive_no_activity_timeout_in on my test box.

I moved from global pool to local pool and the problem went away.

I eventually tracked it down to a problem in the epoll update.  ep.start() 
during the migration would fail sometimes with EEXIST error.  This means that 
the file descriptor is already associated with the epoll.  If we are migrating 
from thread A to thread B this should not be the case.  Unless we when from 
thread B to thread A and back to thread B without cleaning up the original 
thread B epoll.  If this is happening, then multiple threads will be processing 
network events which seems like a recipe for disaster and dropped events.

Originally, I left the ep.stop() which clears the epoll on the original 
thread's epoll structure to be done by the original thread.  But under stress 
that seems to be a bad idea.  Too much drift.  With some more research, it 
appears that the epoll calls are thread safe.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html

I rearranged the code to do both the ep.stop() and ep.start() in the same 
migrating target thread, and my stress test had no more problems.

I've run this patch on a production machine for over 12 hours with no crashes 
and no performance discrepancies.  We will be expanding this testing.

To repeat, this is not a problem we saw in production, but only in my make it 
fall over stress test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-3777) TSHttpConnect and POST request does not fire TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS

2015-08-27 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3777.
--
Resolution: Fixed

 TSHttpConnect and POST request does not fire TS_VCONN_READ_COMPLETE nor 
 TS_VCONN_EOS
 

 Key: TS-3777
 URL: https://issues.apache.org/jira/browse/TS-3777
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Reporter: Daniel Vitor Morilha
Assignee: Susan Hinrichs
  Labels: yahoo
 Fix For: 6.1.0

 Attachments: ts-3777-2.diff, ts-3777-3.diff, ts-3777-4.diff, 
 ts-3777.diff


 When using TSHttpConnect to connect to ATS itself (internal vconnection), 
 sending a POST request and receiving a CHUNKED response. ATS does not fire 
 neither TS_VCONN_READ_COMPLETE nor TS_VCONN_EOS.
 Trying to close the vconnection from the plug-in after receiving the last 
 chunk (\r\n0\r\n) results into the PluginVC repeating the following message:
 {noformat}
 [Jul 14 21:24:06.094] Server {0x77fbe800} DEBUG: (pvc_event) [0] Passive: 
 Received event 1
 {noformat}
 I am glad to provide an example if that helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3871) VC Migration Can Lose Events

2015-08-27 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3871:
---
Attachment: ts-3871.diff

 VC Migration Can Lose Events
 

 Key: TS-3871
 URL: https://issues.apache.org/jira/browse/TS-3871
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Attachments: ts-3871.diff


 Found this in my stress testing.  Sometimes the POST or GET response is 
 completely empty.  No header and no body.  The packet capture shows that ATS 
 closes the connection 70 seconds after the last POST or GET of the connection 
 was received.  This corresponds to the 
 proxy.config.http.keep_alive_no_activity_timeout_in on my test box.
 I moved from global pool to local pool and the problem went away.
 I eventually tracked it down to a problem in the epoll update.  ep.start() 
 during the migration would fail sometimes with EEXIST error.  This means that 
 the file descriptor is already associated with the epoll.  If we are 
 migrating from thread A to thread B this should not be the case.  Unless we 
 when from thread B to thread A and back to thread B without cleaning up the 
 original thread B epoll.  If this is happening, then multiple threads will be 
 processing network events which seems like a recipe for disaster and dropped 
 events.
 Originally, I left the ep.stop() which clears the epoll on the original 
 thread's epoll structure to be done by the original thread.  But under stress 
 that seems to be a bad idea.  Too much drift.  With some more research, it 
 appears that the epoll calls are thread safe.
 http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html
 I rearranged the code to do both the ep.stop() and ep.start() in the same 
 migrating target thread, and my stress test had no more problems.
 I've run this patch on a production machine for over 12 hours with no crashes 
 and no performance discrepancies.  We will be expanding this testing.
 To repeat, this is not a problem we saw in production, but only in my make 
 it fall over stress test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3486) Segfault in do_io_write with plugin (??)

2015-09-02 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727739#comment-14727739
 ] 

Susan Hinrichs commented on TS-3486:


 server_session_sharing_pool is overridable in 5.3, but not in master

 Should be able to change the reference to 
sm->t_state.txn_conf->server_session_sharing_pool

> Segfault in do_io_write with plugin (??)
> 
>
> Key: TS-3486
> URL: https://issues.apache.org/jira/browse/TS-3486
> Project: Traffic Server
>  Issue Type: Bug
>Affects Versions: 5.2.0, 5.3.0
>Reporter: Qiang Li
>Assignee: Susan Hinrichs
>  Labels: crash
> Fix For: 6.0.0
>
> Attachments: ts-3266-2.diff, ts-3266-complete.diff, 
> ts3486-ptrace.txt.gz
>
>
> {code}
> (gdb) bt
> #0  0x005bdb8b in HttpServerSession::do_io_write (this= optimized out>, c=0x2aaadccc4bf0, nbytes=576, buf=0x2aaafc2ffee8, owner=false)
> at HttpServerSession.cc:104
> #1  0x005acc1d in HttpSM::setup_server_send_request 
> (this=0x2aaadccc4bf0) at HttpSM.cc:5686
> #2  0x005b3f85 in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1520
> #3  0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=6, data=0x0) at HttpSM.cc:1455
> #4  0x005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, 
> event=6, data=0x0) at HttpSM.cc:1275
> #5  0x004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, 
> event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614
> #6  0x2ba118441c89 in cachefun (contp=, event= optimized out>, edata=0x2aaadccc4bf0) at main.cpp:1876
> #7  0x005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=, data=) at HttpSM.cc:1381
> #8  0x005b627d in HttpSM::do_http_server_open (this=0x2aaadccc4bf0, 
> raw=) at HttpSM.cc:4639
> #9  0x005baa04 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:7021
> #10 0x005b25a3 in HttpSM::state_cache_open_write 
> (this=0x2aaadccc4bf0, event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2442
> #11 0x005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, 
> event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2554
> #12 0x0059338a in handleEvent (this=0x2aaadccc6618, event= optimized out>, data=0x2aab1c3b6800) at 
> ../../iocore/eventsystem/I_Continuation.h:145
> #13 HttpCacheSM::state_cache_open_write (this=0x2aaadccc6618, event= optimized out>, data=0x2aab1c3b6800) at HttpCacheSM.cc:167
> #14 0x00697223 in handleEvent (this=0x2aab1c3b6800, event= optimized out>) at ../../iocore/eventsystem/I_Continuation.h:145
> #15 CacheVC::callcont (this=0x2aab1c3b6800, event=) at 
> ../../iocore/cache/P_CacheInternal.h:662
> #16 0x00715940 in Cache::open_write (this=, 
> cont=, key=0x2ba0ff762d70, info=, 
> apin_in_cache=46914401429576, type=CACHE_FRAG_TYPE_HTTP, 
> hostname=0x2aaadd281078 
> "www.mifangba.comhttpapi.phpwww.mifangba.comhttp://www.mifangba.com/api.php?op=count=4=12;,
>  host_len=16) at CacheWrite.cc:1788
> #17 0x006e5765 in open_write (this=, 
> cont=0x2aaadccc6618, expected_size=, url=0x2aaadccc5310, 
> cluster_cache_local=, request=, 
> old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
> P_CacheInternal.h:1093
> #18 CacheProcessor::open_write (this=, 
> cont=0x2aaadccc6618, expected_size=, url=0x2aaadccc5310, 
> cluster_cache_local=, request=, 
> old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3622
> #19 0x005936f0 in HttpCacheSM::open_write (this=0x2aaadccc6618, 
> url=, request=, old_info= optimized out>, 
> pin_in_cache=, retry=, 
> allow_multiple=false) at HttpCacheSM.cc:298
> #20 0x005a022e in HttpSM::do_cache_prepare_action 
> (this=0x2aaadccc4bf0, c_sm=0x2aaadccc6618, object_read_info=0x0, retry=true, 
> allow_multiple=false) at HttpSM.cc:4511
> #21 0x005babd9 in do_cache_prepare_write (this=0x2aaadccc4bf0) at 
> HttpSM.cc:4436
> #22 HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:7098
> #23 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1517
> #24 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=0, data=0x0) at HttpSM.cc:1455
> #25 0x005ba712 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:6876
> #26 0x005ba702 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:6919
> #27 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1517
> #28 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=6, data=0x0) at HttpSM.cc:1455
> #29 0x005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, 
> event=6, data=0x0) at HttpSM.cc:1275
> #30 0x004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, 
> 

[jira] [Created] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error

2015-09-04 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3894:
--

 Summary: Missing NULL checks in HttpSM::handle_server_setup_error
 Key: TS-3894
 URL: https://issues.apache.org/jira/browse/TS-3894
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs


In error cases, there may not be a consumer when expected.  Missing NULL checks 
on the consumer variable c can result in crashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3894) Missing NULL checks in HttpSM::handle_server_setup_error

2015-09-04 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3894:
--

Assignee: Susan Hinrichs

> Missing NULL checks in HttpSM::handle_server_setup_error
> 
>
> Key: TS-3894
> URL: https://issues.apache.org/jira/browse/TS-3894
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> In error cases, there may not be a consumer when expected.  Missing NULL 
> checks on the consumer variable c can result in crashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3871) VC Migration Can Lose Events

2015-09-04 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731423#comment-14731423
 ] 

Susan Hinrichs commented on TS-3871:


Have been running with this fix in production for approximately a week.

> VC Migration Can Lose Events
> 
>
> Key: TS-3871
> URL: https://issues.apache.org/jira/browse/TS-3871
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.1.0
>
> Attachments: ts-3871.diff
>
>
> Found this in my stress testing.  Sometimes the POST or GET response is 
> completely empty.  No header and no body.  The packet capture shows that ATS 
> closes the connection 70 seconds after the last POST or GET of the connection 
> was received.  This corresponds to the 
> proxy.config.http.keep_alive_no_activity_timeout_in on my test box.
> I moved from global pool to local pool and the problem went away.
> I eventually tracked it down to a problem in the epoll update.  ep.start() 
> during the migration would fail sometimes with EEXIST error.  This means that 
> the file descriptor is already associated with the epoll.  If we are 
> migrating from thread A to thread B this should not be the case.  Unless we 
> when from thread B to thread A and back to thread B without cleaning up the 
> original thread B epoll.  If this is happening, then multiple threads will be 
> processing network events which seems like a recipe for disaster and dropped 
> events.
> Originally, I left the ep.stop() which clears the epoll on the original 
> thread's epoll structure to be done by the original thread.  But under stress 
> that seems to be a bad idea.  Too much drift.  With some more research, it 
> appears that the epoll calls are thread safe.
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html
> I rearranged the code to do both the ep.stop() and ep.start() in the same 
> migrating target thread, and my stress test had no more problems.
> I've run this patch on a production machine for over 12 hours with no crashes 
> and no performance discrepancies.  We will be expanding this testing.
> To repeat, this is not a problem we saw in production, but only in my "make 
> it fall over" stress test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3486) Segfault in do_io_write with plugin (??)

2015-09-03 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729021#comment-14729021
 ] 

Susan Hinrichs commented on TS-3486:


Should be fine.  The change of the server_session_sharing_pool from overridable 
to not was a clean up side-effect of the great server_session_sharing 
specification debate (specifically TS-3712).  

For this crash fix, we need to determine the whether the pool is global or per 
thread.  That it is read from the override config in 5.3 is not a big issue as 
long as we are reading it from the correct location for the build.

> Segfault in do_io_write with plugin (??)
> 
>
> Key: TS-3486
> URL: https://issues.apache.org/jira/browse/TS-3486
> Project: Traffic Server
>  Issue Type: Bug
>Affects Versions: 5.2.0, 5.3.0
>Reporter: Qiang Li
>Assignee: Susan Hinrichs
>  Labels: crash
> Fix For: 5.3.2, 6.0.0
>
> Attachments: ts-3266-2.diff, ts-3266-complete.diff, 
> ts3486-ptrace.txt.gz
>
>
> {code}
> (gdb) bt
> #0  0x005bdb8b in HttpServerSession::do_io_write (this= optimized out>, c=0x2aaadccc4bf0, nbytes=576, buf=0x2aaafc2ffee8, owner=false)
> at HttpServerSession.cc:104
> #1  0x005acc1d in HttpSM::setup_server_send_request 
> (this=0x2aaadccc4bf0) at HttpSM.cc:5686
> #2  0x005b3f85 in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1520
> #3  0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=6, data=0x0) at HttpSM.cc:1455
> #4  0x005b980b in HttpSM::state_api_callback (this=0x2aaadccc4bf0, 
> event=6, data=0x0) at HttpSM.cc:1275
> #5  0x004d7a1b in TSHttpTxnReenable (txnp=0x2aaadccc4bf0, 
> event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5614
> #6  0x2ba118441c89 in cachefun (contp=, event= optimized out>, edata=0x2aaadccc4bf0) at main.cpp:1876
> #7  0x005b4466 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=, data=) at HttpSM.cc:1381
> #8  0x005b627d in HttpSM::do_http_server_open (this=0x2aaadccc4bf0, 
> raw=) at HttpSM.cc:4639
> #9  0x005baa04 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:7021
> #10 0x005b25a3 in HttpSM::state_cache_open_write 
> (this=0x2aaadccc4bf0, event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2442
> #11 0x005b5b28 in HttpSM::main_handler (this=0x2aaadccc4bf0, 
> event=1108, data=0x2aab1c3b6800) at HttpSM.cc:2554
> #12 0x0059338a in handleEvent (this=0x2aaadccc6618, event= optimized out>, data=0x2aab1c3b6800) at 
> ../../iocore/eventsystem/I_Continuation.h:145
> #13 HttpCacheSM::state_cache_open_write (this=0x2aaadccc6618, event= optimized out>, data=0x2aab1c3b6800) at HttpCacheSM.cc:167
> #14 0x00697223 in handleEvent (this=0x2aab1c3b6800, event= optimized out>) at ../../iocore/eventsystem/I_Continuation.h:145
> #15 CacheVC::callcont (this=0x2aab1c3b6800, event=) at 
> ../../iocore/cache/P_CacheInternal.h:662
> #16 0x00715940 in Cache::open_write (this=, 
> cont=, key=0x2ba0ff762d70, info=, 
> apin_in_cache=46914401429576, type=CACHE_FRAG_TYPE_HTTP, 
> hostname=0x2aaadd281078 
> "www.mifangba.comhttpapi.phpwww.mifangba.comhttp://www.mifangba.com/api.php?op=count=4=12;,
>  host_len=16) at CacheWrite.cc:1788
> #17 0x006e5765 in open_write (this=, 
> cont=0x2aaadccc6618, expected_size=, url=0x2aaadccc5310, 
> cluster_cache_local=, request=, 
> old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
> P_CacheInternal.h:1093
> #18 CacheProcessor::open_write (this=, 
> cont=0x2aaadccc6618, expected_size=, url=0x2aaadccc5310, 
> cluster_cache_local=, request=, 
> old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3622
> #19 0x005936f0 in HttpCacheSM::open_write (this=0x2aaadccc6618, 
> url=, request=, old_info= optimized out>, 
> pin_in_cache=, retry=, 
> allow_multiple=false) at HttpCacheSM.cc:298
> #20 0x005a022e in HttpSM::do_cache_prepare_action 
> (this=0x2aaadccc4bf0, c_sm=0x2aaadccc6618, object_read_info=0x0, retry=true, 
> allow_multiple=false) at HttpSM.cc:4511
> #21 0x005babd9 in do_cache_prepare_write (this=0x2aaadccc4bf0) at 
> HttpSM.cc:4436
> #22 HttpSM::set_next_state (this=0x2aaadccc4bf0) at HttpSM.cc:7098
> #23 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1517
> #24 0x005b45f8 in HttpSM::state_api_callout (this=0x2aaadccc4bf0, 
> event=0, data=0x0) at HttpSM.cc:1455
> #25 0x005ba712 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:6876
> #26 0x005ba702 in HttpSM::set_next_state (this=0x2aaadccc4bf0) at 
> HttpSM.cc:6919
> #27 0x005b3f5f in HttpSM::handle_api_return (this=0x2aaadccc4bf0) at 
> HttpSM.cc:1517
> #28 0x005b45f8 in 

[jira] [Created] (TS-3901) Leaking connections from HttpSessionManager

2015-09-10 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3901:
--

 Summary: Leaking connections from HttpSessionManager
 Key: TS-3901
 URL: https://issues.apache.org/jira/browse/TS-3901
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs


Observed in production.  Got the following warnings in diags.log

"Connection leak from http keep-alive system"

Our connections to origin would increase and the number of connections in 
CLOSE_WAIT were enormous.

I think the issue was when the origin URL was http with default port.  That URL 
was remapped to https with default port.  The default port stored in 
HttpServerSession->server_ip was not updated.  

When the connection was closed or timed out of the session pool, it would be 
looked up with port 443.   But the session was stored via the server_ip value 
with port 80 and would never match.

Relatively small change in HTTPHdr::_file_target_cache. 

Running the fix in production to verify early results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3871) VC Migration Can Lose Events

2015-09-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3871.

Resolution: Fixed

> VC Migration Can Lose Events
> 
>
> Key: TS-3871
> URL: https://issues.apache.org/jira/browse/TS-3871
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.1.0
>
> Attachments: ts-3871.diff
>
>
> Found this in my stress testing.  Sometimes the POST or GET response is 
> completely empty.  No header and no body.  The packet capture shows that ATS 
> closes the connection 70 seconds after the last POST or GET of the connection 
> was received.  This corresponds to the 
> proxy.config.http.keep_alive_no_activity_timeout_in on my test box.
> I moved from global pool to local pool and the problem went away.
> I eventually tracked it down to a problem in the epoll update.  ep.start() 
> during the migration would fail sometimes with EEXIST error.  This means that 
> the file descriptor is already associated with the epoll.  If we are 
> migrating from thread A to thread B this should not be the case.  Unless we 
> when from thread B to thread A and back to thread B without cleaning up the 
> original thread B epoll.  If this is happening, then multiple threads will be 
> processing network events which seems like a recipe for disaster and dropped 
> events.
> Originally, I left the ep.stop() which clears the epoll on the original 
> thread's epoll structure to be done by the original thread.  But under stress 
> that seems to be a bad idea.  Too much drift.  With some more research, it 
> appears that the epoll calls are thread safe.
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html
> I rearranged the code to do both the ep.stop() and ep.start() in the same 
> migrating target thread, and my stress test had no more problems.
> I've run this patch on a production machine for over 12 hours with no crashes 
> and no performance discrepancies.  We will be expanding this testing.
> To repeat, this is not a problem we saw in production, but only in my "make 
> it fall over" stress test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3901) Leaking connections from HttpSessionManager

2015-09-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3901:
--

Assignee: Susan Hinrichs

> Leaking connections from HttpSessionManager
> ---
>
> Key: TS-3901
> URL: https://issues.apache.org/jira/browse/TS-3901
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> Observed in production.  Got the following warnings in diags.log
> "Connection leak from http keep-alive system"
> Our connections to origin would increase and the number of connections in 
> CLOSE_WAIT were enormous.
> I think the issue was when the origin URL was http with default port.  That 
> URL was remapped to https with default port.  The default port stored in 
> HttpServerSession->server_ip was not updated.  
> When the connection was closed or timed out of the session pool, it would be 
> looked up with port 443.   But the session was stored via the server_ip value 
> with port 80 and would never match.
> Relatively small change in HTTPHdr::_file_target_cache. 
> Running the fix in production to verify early results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used

2015-09-12 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-3905.
--
Resolution: Duplicate

> proxy.config.http.keep_alive_no_activity_timeout_out is not used
> 
>
> Key: TS-3905
> URL: https://issues.apache.org/jira/browse/TS-3905
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.1.0
>
>
> The keep_alive_no_activity_timeout_in is set correctly on the 
> HttpClientSession when the transaction releases it.  The client session is 
> then hanging out until the next transaction appears, and the 
> keep_alive_no_activity_timeout_in should apply instead of the 
> transaction_no_activity_timeout_in.
> For the server session side, the keep_alive_no_activity_timeout_out and 
> transaction_no_activity_timeout_out should apply.  The 
> keep_alive_no_activity_timeout_out does get set correctly when the server 
> session is attached to the client session to timeout via the 
> HttpClientSession::attach_server_session_method().
> But in ServerSessionPool::releaseSession, the following is called
> {code}
> ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout());
> {code}
> My reading is that this will reset the inactivity timeout of the server 
> session to whatever it was last set to.  Instead it should set the inactivity 
> timeout to keep_alive_no_activity_timeout_out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used

2015-09-12 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742160#comment-14742160
 ] 

Susan Hinrichs commented on TS-3905:


[~zwoop] you are right.  TS-3312 addresses this issue.  I filed this bug based 
on code inspection.  It didn't appear that we were using the 
keep_alive_timeout_out parameter.  But reviewing this bug, I see that we are 
using that parameter although in kind of an odd path to enable parameter 
override.

I'll verify it runs for my scenario Monday, but I assume if it meets LinkedIn's 
needs it will also work for me, so I'm closing as a duplicate.

> proxy.config.http.keep_alive_no_activity_timeout_out is not used
> 
>
> Key: TS-3905
> URL: https://issues.apache.org/jira/browse/TS-3905
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 6.1.0
>
>
> The keep_alive_no_activity_timeout_in is set correctly on the 
> HttpClientSession when the transaction releases it.  The client session is 
> then hanging out until the next transaction appears, and the 
> keep_alive_no_activity_timeout_in should apply instead of the 
> transaction_no_activity_timeout_in.
> For the server session side, the keep_alive_no_activity_timeout_out and 
> transaction_no_activity_timeout_out should apply.  The 
> keep_alive_no_activity_timeout_out does get set correctly when the server 
> session is attached to the client session to timeout via the 
> HttpClientSession::attach_server_session_method().
> But in ServerSessionPool::releaseSession, the following is called
> {code}
> ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout());
> {code}
> My reading is that this will reset the inactivity timeout of the server 
> session to whatever it was last set to.  Instead it should set the inactivity 
> timeout to keep_alive_no_activity_timeout_out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3898) Connection to the origin can allocate 1MB of iobuffers

2015-09-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3898:
--

Assignee: Susan Hinrichs  (was: kang li)

> Connection to the origin can allocate 1MB of iobuffers
> --
>
> Key: TS-3898
> URL: https://issues.apache.org/jira/browse/TS-3898
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP
>Affects Versions: 5.3.0, 6.0.0
>Reporter: Bryan Call
>Assignee: Susan Hinrichs
>  Labels: yahoo
> Fix For: 6.1.0
>
>
> When connecting to an origin there can be 1MB of iobuffers allocated.  This 
> happens under TLS and non-TLS. Seems like it happens when the origin doesn't 
> supply a content-length.  More investigation is needed.
> Configuration:
> {code}
> [bcall@homer trafficserver]$ tail -1 /usr/local/etc/trafficserver/remap.config
> map / https://www.flickr.com
> {code}
> Client:
> {code}
> [bcall@homer trafficserver]$ curl -D - -k https://127.0.0.1:4443/
> [bcall@homer trafficserver]$ sudo kill -SIGUSR1 $(pidof traffic_server){code}
> Server:
> {code}
>  allocated  |in-use  | type size  |   free list name
> |||--
> 1048576 |  0 |  32768 | 
> memory/ioBufAllocator[8]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3898) Connection to the origin can allocate 1MB of iobuffers

2015-09-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3898:
--

Assignee: Susan Hinrichs

> Connection to the origin can allocate 1MB of iobuffers
> --
>
> Key: TS-3898
> URL: https://issues.apache.org/jira/browse/TS-3898
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP
>Affects Versions: 5.3.0, 6.0.0
>Reporter: Bryan Call
>Assignee: Susan Hinrichs
>  Labels: yahoo
> Fix For: 6.1.0
>
>
> When connecting to an origin there can be 1MB of iobuffers allocated.  This 
> happens under TLS and non-TLS. Seems like it happens when the origin doesn't 
> supply a content-length.  More investigation is needed.
> Configuration:
> {code}
> [bcall@homer trafficserver]$ tail -1 /usr/local/etc/trafficserver/remap.config
> map / https://www.flickr.com
> {code}
> Client:
> {code}
> [bcall@homer trafficserver]$ curl -D - -k https://127.0.0.1:4443/
> [bcall@homer trafficserver]$ sudo kill -SIGUSR1 $(pidof traffic_server){code}
> Server:
> {code}
>  allocated  |in-use  | type size  |   free list name
> |||--
> 1048576 |  0 |  32768 | 
> memory/ioBufAllocator[8]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3898) Connection to the origin can allocate 1MB of iobuffers

2015-09-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3898:
---
Assignee: Bryan Call  (was: Susan Hinrichs)

> Connection to the origin can allocate 1MB of iobuffers
> --
>
> Key: TS-3898
> URL: https://issues.apache.org/jira/browse/TS-3898
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP
>Affects Versions: 5.3.0, 6.0.0
>Reporter: Bryan Call
>Assignee: Bryan Call
>  Labels: yahoo
> Fix For: 6.1.0
>
>
> When connecting to an origin there can be 1MB of iobuffers allocated.  This 
> happens under TLS and non-TLS. Seems like it happens when the origin doesn't 
> supply a content-length.  More investigation is needed.
> Configuration:
> {code}
> [bcall@homer trafficserver]$ tail -1 /usr/local/etc/trafficserver/remap.config
> map / https://www.flickr.com
> {code}
> Client:
> {code}
> [bcall@homer trafficserver]$ curl -D - -k https://127.0.0.1:4443/
> [bcall@homer trafficserver]$ sudo kill -SIGUSR1 $(pidof traffic_server){code}
> Server:
> {code}
>  allocated  |in-use  | type size  |   free list name
> |||--
> 1048576 |  0 |  32768 | 
> memory/ioBufAllocator[8]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3898) Connection to the origin can allocate 1MB of iobuffers

2015-09-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3898:
---
Assignee: kang li  (was: Susan Hinrichs)

> Connection to the origin can allocate 1MB of iobuffers
> --
>
> Key: TS-3898
> URL: https://issues.apache.org/jira/browse/TS-3898
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP
>Affects Versions: 5.3.0, 6.0.0
>Reporter: Bryan Call
>Assignee: kang li
>  Labels: yahoo
> Fix For: 6.1.0
>
>
> When connecting to an origin there can be 1MB of iobuffers allocated.  This 
> happens under TLS and non-TLS. Seems like it happens when the origin doesn't 
> supply a content-length.  More investigation is needed.
> Configuration:
> {code}
> [bcall@homer trafficserver]$ tail -1 /usr/local/etc/trafficserver/remap.config
> map / https://www.flickr.com
> {code}
> Client:
> {code}
> [bcall@homer trafficserver]$ curl -D - -k https://127.0.0.1:4443/
> [bcall@homer trafficserver]$ sudo kill -SIGUSR1 $(pidof traffic_server){code}
> Server:
> {code}
>  allocated  |in-use  | type size  |   free list name
> |||--
> 1048576 |  0 |  32768 | 
> memory/ioBufAllocator[8]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3909) SSLNextProtocolTrampoline heap-use-after-free

2015-09-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3909:
---
Attachment: ts-3909.diff

The patch in ts-3909.diff has been useful in reducing (eliminating?) the crash 
resulting from this ASAN in production.

As I recall we tried a similar patch for ts-3710, but it did not eliminate the 
ASAN.

> SSLNextProtocolTrampoline heap-use-after-free
> -
>
> Key: TS-3909
> URL: https://issues.apache.org/jira/browse/TS-3909
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Affects Versions: 6.0.0
>Reporter: Bryan Call
>Assignee: Susan Hinrichs
> Fix For: 6.0.0
>
> Attachments: ts-3909.diff
>
>
> {code}
> ==6232==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x606000538880 at pc 0x9c851c bp 0x2ac88a2d4880 sp 0x2ac88a2d4878
> READ of size 8 at 0x606000538880 thread T24 ([ET_NET 23])
> #0 0x9c851b in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNextProtocolAccept.cc:108
> #1 0x531046 in Continuation::handleEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #2 0x9f4040 in read_signal_and_update 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:145
> #3 0x9f46f4 in read_signal_done 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:206
> #4 0x9fa8a1 in UnixNetVConnection::readSignalDone(int, NetHandler*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:1006
> #5 0x9bdd96 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:542
> #6 0x9e1a02 in NetHandler::mainNetEvent(int, Event*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:516
> #7 0x531046 in Continuation::handleEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #8 0xa405e4 in EThread::process_event(Event*, int) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #9 0xa411fc in EThread::execute() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:252
> #10 0xa3ebbd in spawn_thread_internal 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/Thread.cc:86
> #11 0x2ac87d9badf4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> #12 0x2ac87e74b1ac in __clone (/lib64/libc.so.6+0xf61ac)
> 0x606000538880 is located 0 bytes inside of 56-byte region 
> [0x606000538880,0x6060005388b8)
> freed by thread T24 ([ET_NET 23]) here:
> #0 0x2ac87acd6127 in operator delete(void*) 
> ../../.././libsanitizer/asan/asan_new_delete.cc:81
> #1 0x9c8613 in SSLNextProtocolTrampoline::~SSLNextProtocolTrampoline() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNextProtocolAccept.cc:66
> #2 0x9c83ea in SSLNextProtocolTrampoline::ioCompletionEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNextProtocolAccept.cc:89
> #3 0x531046 in Continuation::handleEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #4 0x9f4040 in read_signal_and_update 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:145
> #5 0x9fbe75 in UnixNetVConnection::mainEvent(int, Event*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:1175
> #6 0x531046 in Continuation::handleEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #7 0x9e35e4 in NetHandler::_close_vc(UnixNetVConnection*, long, int&, 
> int&, int&, int&) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:678
> #8 0x9e2c01 in NetHandler::manage_keep_alive_queue() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:634
> #9 0x9e3882 in NetHandler::add_to_keep_alive_queue(UnixNetVConnection*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:699
> #10 0x9ddb48 in UnixNetVConnection::add_to_keep_alive_queue() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixConnection.cc:397
> #11 0x759044 in SpdyClientSession::init(NetVConnection*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/spdy/SpdyClientSession.cc:116
> #12 0x7598da in SpdyClientSession::new_connection(NetVConnection*, 
> MIOBuffer*, IOBufferReader*, bool) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/spdy/SpdyClientSession.cc:193
> #13 0x7582dc in SpdySessionAccept::mainEvent(int, void*) 
> 

[jira] [Commented] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used

2015-09-11 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741497#comment-14741497
 ] 

Susan Hinrichs commented on TS-3905:


Actually as I look into get_inactivity_timeout some more, I question the 
validity of the code above in all cases.  The three instances of that line in 
HttpSessionManager.cc should be reviewed.

> proxy.config.http.keep_alive_no_activity_timeout_out is not used
> 
>
> Key: TS-3905
> URL: https://issues.apache.org/jira/browse/TS-3905
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> The keep_alive_no_activity_timeout_in is set correctly on the 
> HttpClientSession when the transaction releases it.  The client session is 
> then hanging out until the next transaction appears, and the 
> keep_alive_no_activity_timeout_in should apply instead of the 
> transaction_no_activity_timeout_in.
> For the server session side, the keep_alive_no_activity_timeout_out and 
> transaction_no_activity_timeout_out should apply.  The 
> keep_alive_no_activity_timeout_out does get set correctly when the server 
> session is attached to the client session to timeout via the 
> HttpClientSession::attach_server_session_method().
> But in ServerSessionPool::releaseSession, the following is called
> {code}
> ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout());
> {code}
> My reading is that this will reset the inactivity timeout of the server 
> session to whatever it was last set to.  Instead it should set the inactivity 
> timeout to keep_alive_no_activity_timeout_out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used

2015-09-11 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3905:
--

 Summary: proxy.config.http.keep_alive_no_activity_timeout_out is 
not used
 Key: TS-3905
 URL: https://issues.apache.org/jira/browse/TS-3905
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs


The keep_alive_no_activity_timeout_in is set correctly on the HttpClientSession 
when the transaction releases it.  The client session is then hanging out until 
the next transaction appears, and the keep_alive_no_activity_timeout_in should 
apply instead of the transaction_no_activity_timeout_in.

For the server session side, the keep_alive_no_activity_timeout_out and 
transaction_no_activity_timeout_out should apply.  The 
keep_alive_no_activity_timeout_out does get set correctly when the server 
session is attached to the client session to timeout via the 
HttpClientSession::attach_server_session_method().

But in ServerSessionPool::releaseSession, the following is called

{code}
ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout());
{code}

My reading is that this will reset the inactivity timeout of the server session 
to whatever it was last set to.  Instead it should set the inactivity timeout 
to keep_alive_no_activity_timeout_out.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3905) proxy.config.http.keep_alive_no_activity_timeout_out is not used

2015-09-11 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3905:
--

Assignee: Susan Hinrichs

> proxy.config.http.keep_alive_no_activity_timeout_out is not used
> 
>
> Key: TS-3905
> URL: https://issues.apache.org/jira/browse/TS-3905
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> The keep_alive_no_activity_timeout_in is set correctly on the 
> HttpClientSession when the transaction releases it.  The client session is 
> then hanging out until the next transaction appears, and the 
> keep_alive_no_activity_timeout_in should apply instead of the 
> transaction_no_activity_timeout_in.
> For the server session side, the keep_alive_no_activity_timeout_out and 
> transaction_no_activity_timeout_out should apply.  The 
> keep_alive_no_activity_timeout_out does get set correctly when the server 
> session is attached to the client session to timeout via the 
> HttpClientSession::attach_server_session_method().
> But in ServerSessionPool::releaseSession, the following is called
> {code}
> ss->get_netvc()->set_inactivity_timeout(ss->get_netvc()->get_inactivity_timeout());
> {code}
> My reading is that this will reset the inactivity timeout of the server 
> session to whatever it was last set to.  Instead it should set the inactivity 
> timeout to keep_alive_no_activity_timeout_out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3910) SSLNetVConnection and add_to_active_queue heap-use-after-free

2015-09-14 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744385#comment-14744385
 ] 

Susan Hinrichs commented on TS-3910:


The vc is freed from read_signal_and_update by calling close_UnixNetVConnection 
directly.  Should probably call vc->do_io_close to clear vio's.  But I think 
that would only delay the problem.

The use stack is interesting.  The vc first referenced in frame 10 is not the 
freed vc.  Rather the HttpClientSession::client_vc is the freed vc.  I cannot 
immediately see how vc != HttpClientSession::client_vc when the 
HttpClientSession was stored as the _cont for the read_vio associated with vc.  
But that must be what is happening.


> SSLNetVConnection and add_to_active_queue heap-use-after-free
> -
>
> Key: TS-3910
> URL: https://issues.apache.org/jira/browse/TS-3910
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network, SSL
>Affects Versions: 6.0.0
>Reporter: Bryan Call
> Fix For: 6.0.0
>
>
> {code}
> ==15615==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x618000be6288 at pc 0x9e756d bp 0x2b14e4f317d0 sp 0x2b14e4f317c8
> WRITE of size 8 at 0x618000be6288 thread T6 ([ET_NET 5])
> #0 0x9e756c in DLL UnixNetVConnection::Link_active_queue_link>::insert(UnixNetVConnection*, 
> UnixNetVConnection*) (/home/y/bin64/traffic_server+0x9e756c)
> #1 0x9e6b98 in Queue UnixNetVConnection::Link_active_queue_link>::insert(UnixNetVConnection*, 
> UnixNetVConnection*) (/home/y/bin64/traffic_server+0x9e6b98)
> #2 0x9e5fe2 in Queue UnixNetVConnection::Link_active_queue_link>::enqueue(UnixNetVConnection*) 
> (/home/y/bin64/traffic_server+0x9e5fe2)
> #3 0x9e3cc8 in NetHandler::add_to_active_queue(UnixNetVConnection*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:733
> #4 0x9ddbe8 in UnixNetVConnection::add_to_active_queue() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixConnection.cc:409
> #5 0x64b34c in HttpClientSession::new_transaction() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/http/HttpClientSession.cc:124
> #6 0x64e27d in HttpClientSession::state_keep_alive(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/http/HttpClientSession.cc:415
> #7 0x531046 in Continuation::handleEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #8 0x9f4040 in read_signal_and_update 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:145
> #9 0x9fa8c3 in UnixNetVConnection::readSignalAndUpdate(int) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:1013
> #10 0x9be342 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:605
> #11 0x9e1a02 in NetHandler::mainNetEvent(int, Event*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:516
> #12 0x531046 in Continuation::handleEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #13 0xa405e4 in EThread::process_event(Event*, int) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #14 0xa411fc in EThread::execute() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:252
> #15 0xa3ebbd in spawn_thread_internal 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/Thread.cc:86
> #16 0x2b14dce95df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> #17 0x2b14ddc261ac in __clone (/lib64/libc.so.6+0xf61ac)
> 0x618000be6288 is located 520 bytes inside of 880-byte region 
> [0x618000be6080,0x618000be63f0)
> freed by thread T6 ([ET_NET 5]) here:
> #0 0x2b14da1b01d7 in __interceptor_free 
> ../../.././libsanitizer/asan/asan_malloc_linux.cc:62
> #1 0x2b14db0ab3b2 in ats_memalign_free 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/lib/ts/ink_memory.cc:139
> #2 0x2b14db0abf60 in ink_freelist_free 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/lib/ts/ink_queue.cc:292
> #3 0x9c7226 in 
> ClassAllocator::free(SSLNetVConnection*) 
> (/home/y/bin64/traffic_server+0x9c7226)
> #4 0x9c1a72 in SSLNetVConnection::free(EThread*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:936
> #5 0x9f3f81 in close_UnixNetVConnection(UnixNetVConnection*, EThread*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:134
> #6 0x9f42f6 in read_signal_and_update 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:164
> #7 0x9f46f4 in 

[jira] [Commented] (TS-3910) SSLNetVConnection and add_to_active_queue heap-use-after-free

2015-09-15 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745635#comment-14745635
 ] 

Susan Hinrichs commented on TS-3910:


One path that would allow the close_UnixNetConnection to free the VC but not 
take the VC out of the active_queue is if nh was NULL during the call to 
close_UnixNetVConnection.

Off-hand, I don't see how that could happen.  If the client _vc is being closed 
from the keep-alive pool, that implies it was fully set up.  It might be useful 
to put a release assert of nh != NULL into the code and see where this occurs.  
Or add some logic to the assert to check whether the active_queue_link is set 
or not.

> SSLNetVConnection and add_to_active_queue heap-use-after-free
> -
>
> Key: TS-3910
> URL: https://issues.apache.org/jira/browse/TS-3910
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network, SSL
>Affects Versions: 6.0.0
>Reporter: Bryan Call
> Fix For: 6.0.0
>
>
> {code}
> ==15615==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x618000be6288 at pc 0x9e756d bp 0x2b14e4f317d0 sp 0x2b14e4f317c8
> WRITE of size 8 at 0x618000be6288 thread T6 ([ET_NET 5])
> #0 0x9e756c in DLL UnixNetVConnection::Link_active_queue_link>::insert(UnixNetVConnection*, 
> UnixNetVConnection*) (/home/y/bin64/traffic_server+0x9e756c)
> #1 0x9e6b98 in Queue UnixNetVConnection::Link_active_queue_link>::insert(UnixNetVConnection*, 
> UnixNetVConnection*) (/home/y/bin64/traffic_server+0x9e6b98)
> #2 0x9e5fe2 in Queue UnixNetVConnection::Link_active_queue_link>::enqueue(UnixNetVConnection*) 
> (/home/y/bin64/traffic_server+0x9e5fe2)
> #3 0x9e3cc8 in NetHandler::add_to_active_queue(UnixNetVConnection*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:733
> #4 0x9ddbe8 in UnixNetVConnection::add_to_active_queue() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixConnection.cc:409
> #5 0x64b34c in HttpClientSession::new_transaction() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/http/HttpClientSession.cc:124
> #6 0x64e27d in HttpClientSession::state_keep_alive(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/proxy/http/HttpClientSession.cc:415
> #7 0x531046 in Continuation::handleEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #8 0x9f4040 in read_signal_and_update 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:145
> #9 0x9fa8c3 in UnixNetVConnection::readSignalAndUpdate(int) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:1013
> #10 0x9be342 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:605
> #11 0x9e1a02 in NetHandler::mainNetEvent(int, Event*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNet.cc:516
> #12 0x531046 in Continuation::handleEvent(int, void*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/I_Continuation.h:146
> #13 0xa405e4 in EThread::process_event(Event*, int) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:128
> #14 0xa411fc in EThread::execute() 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/UnixEThread.cc:252
> #15 0xa3ebbd in spawn_thread_internal 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/eventsystem/Thread.cc:86
> #16 0x2b14dce95df4 in start_thread (/lib64/libpthread.so.0+0x7df4)
> #17 0x2b14ddc261ac in __clone (/lib64/libc.so.6+0xf61ac)
> 0x618000be6288 is located 520 bytes inside of 880-byte region 
> [0x618000be6080,0x618000be63f0)
> freed by thread T6 ([ET_NET 5]) here:
> #0 0x2b14da1b01d7 in __interceptor_free 
> ../../.././libsanitizer/asan/asan_malloc_linux.cc:62
> #1 0x2b14db0ab3b2 in ats_memalign_free 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/lib/ts/ink_memory.cc:139
> #2 0x2b14db0abf60 in ink_freelist_free 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/lib/ts/ink_queue.cc:292
> #3 0x9c7226 in 
> ClassAllocator::free(SSLNetVConnection*) 
> (/home/y/bin64/traffic_server+0x9c7226)
> #4 0x9c1a72 in SSLNetVConnection::free(EThread*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/SSLNetVConnection.cc:936
> #5 0x9f3f81 in close_UnixNetVConnection(UnixNetVConnection*, EThread*) 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:134
> #6 0x9f42f6 in read_signal_and_update 
> /home/bcall/ytrafficserver-6.0.x/trafficserver/iocore/net/UnixNetVConnection.cc:164
> #7 0x9f46f4 in read_signal_done 
> 

[jira] [Updated] (TS-3072) Debug logging for a single connection in production traffic.

2015-10-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3072:
---
Attachment: ts-3072.diff

Re-activating the discussion.  We recently deployed what [~sudheerv] suggested 
in production while tracking yet another tedious user-specific crash.  

I've attached the patch in ts-3072.diff.  It is a surprisingly small code 
change.  We changed how debug.enabled is interpreted to minimize the 
performance impact if one is not using the debug.client_ip feature.  The 
client-ip value is only tested if debug.enabled is set to 2.  Regular full 
debugging happens with debug.enabled set to 1.  Nothing is checked if 
debug.enabled is set to 0.

It was incredibly useful while tracking down our most recent fire.  We didn't 
have to anticipate the need for a plugin.  We were able to change the client_ip 
setting without restarting ATS.  

[~amc] has ideas for generalizing this technique to "taint" VC's for other more 
detailed tracking/debugging/monitoring.

> Debug logging for a single connection in production traffic.
> 
>
> Key: TS-3072
> URL: https://issues.apache.org/jira/browse/TS-3072
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core, Logging
>Affects Versions: 5.0.1
>Reporter: Sudheer Vinukonda
>  Labels: Yahoo
> Fix For: sometime
>
> Attachments: ts-3072.diff
>
>
> Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is 
> really hard to isolate/debug with the high traffic. Turning on debug logs in 
> traffic is unfortunately not an option due to performance impacts. Even if 
> you took a performance hit and turned on the logs, it is just as hard to 
> separate out the logs for a single connection/transaction among the millions 
> of the logs output in a short period of time.
> I think it would be good if there's a way to turn on debug logs in a 
> controlled manner in production environment. One simple option is to support 
> a config setting for example, with a client-ip, which when set, would turn on 
> debug logs for any connection made by just that one client. If needed, 
> instead of one client-ip, we may allow configuring up to 'n' (say, 5) 
> client-ips. 
> If there are other ideas, please comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3957) Core dump from SpdyClientSession::state_session_start

2015-10-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3957:
---
Labels: yahoo  (was: )

> Core dump from SpdyClientSession::state_session_start
> -
>
> Key: TS-3957
> URL: https://issues.apache.org/jira/browse/TS-3957
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: yahoo
>
> We see this in production on machines under swap, so the timings are very 
> distorted.
> {code}
> gdb) bt
> #0  0x in ?? ()
> #1  0x0064a5dc in SpdyClientSession::state_session_start 
> (this=0x2b234fbe8030)
> at SpdyClientSession.cc:211
> #2  0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, 
> event=1, 
> data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145
> #3  0x0079a066 in EThread::process_event (this=0x2b21170a2010, 
> e=0x2b23eda76630, 
> calling_code=1) at UnixEThread.cc:128
> #4  0x0079a234 in EThread::execute (this=0x2b21170a2010) at 
> UnixEThread.cc:179
> #5  0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85
> #6  0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0
> #7  0x003827ee88fd in clone () from /lib64/libc.so.6
> {code}
> After poking around on the core some more [~amc] and I determined that the vc 
> referenced by the SpdyClientSession was a freed object (the vtable pointer 
> was swizzled out to be the freelist next pointer).
> We assume that the swapping is causing very odd event timing.  We replaced 
> the schedule_immediate with a direct call that that seemed to solve our crash 
> in production.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3957) Core dump from SpdyClientSession::state_session_start

2015-10-05 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-3957:
--

 Summary: Core dump from SpdyClientSession::state_session_start
 Key: TS-3957
 URL: https://issues.apache.org/jira/browse/TS-3957
 Project: Traffic Server
  Issue Type: Bug
  Components: SPDY
Reporter: Susan Hinrichs


We see this in production on machines under swap, so the timings are very 
distorted.

{code}
gdb) bt
#0  0x in ?? ()
#1  0x0064a5dc in SpdyClientSession::state_session_start 
(this=0x2b234fbe8030)
at SpdyClientSession.cc:211
#2  0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, 
event=1, 
data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145
#3  0x0079a066 in EThread::process_event (this=0x2b21170a2010, 
e=0x2b23eda76630, 
calling_code=1) at UnixEThread.cc:128
#4  0x0079a234 in EThread::execute (this=0x2b21170a2010) at 
UnixEThread.cc:179
#5  0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85
#6  0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0
#7  0x003827ee88fd in clone () from /lib64/libc.so.6
{code}

After poking around on the core some more [~amc] and I determined that the vc 
referenced by the SpdyClientSession was a freed object (the vtable pointer was 
swizzled out to be the freelist next pointer).

We assume that the swapping is causing very odd event timing.  We replaced the 
schedule_immediate with a direct call that that seemed to solve our crash in 
production.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3957) Core dump from SpdyClientSession::state_session_start

2015-10-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-3957:
--

Assignee: Susan Hinrichs

> Core dump from SpdyClientSession::state_session_start
> -
>
> Key: TS-3957
> URL: https://issues.apache.org/jira/browse/TS-3957
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SPDY
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>  Labels: yahoo
>
> We see this in production on machines under swap, so the timings are very 
> distorted.
> {code}
> gdb) bt
> #0  0x in ?? ()
> #1  0x0064a5dc in SpdyClientSession::state_session_start 
> (this=0x2b234fbe8030)
> at SpdyClientSession.cc:211
> #2  0x00510e34 in Continuation::handleEvent (this=0x2b234fbe8030, 
> event=1, 
> data=0x2b23eda76630) at ../iocore/eventsystem/I_Continuation.h:145
> #3  0x0079a066 in EThread::process_event (this=0x2b21170a2010, 
> e=0x2b23eda76630, 
> calling_code=1) at UnixEThread.cc:128
> #4  0x0079a234 in EThread::execute (this=0x2b21170a2010) at 
> UnixEThread.cc:179
> #5  0x00799611 in spawn_thread_internal (a=0x12226a0) at Thread.cc:85
> #6  0x2b21153e19d1 in start_thread () from /lib64/libpthread.so.0
> #7  0x003827ee88fd in clone () from /lib64/libc.so.6
> {code}
> After poking around on the core some more [~amc] and I determined that the vc 
> referenced by the SpdyClientSession was a freed object (the vtable pointer 
> was swizzled out to be the freelist next pointer).
> We assume that the swapping is causing very odd event timing.  We replaced 
> the schedule_immediate with a direct call that that seemed to solve our crash 
> in production.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3072) Debug logging for a single connection in production traffic.

2015-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945245#comment-14945245
 ] 

Susan Hinrichs commented on TS-3072:


Only ad hoc performance comparison so far.  I'll run a sequence of tests on the 
stress test box.

> Debug logging for a single connection in production traffic.
> 
>
> Key: TS-3072
> URL: https://issues.apache.org/jira/browse/TS-3072
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core, Logging
>Affects Versions: 5.0.1
>Reporter: Sudheer Vinukonda
>  Labels: Yahoo
> Fix For: sometime
>
> Attachments: ts-3072.diff
>
>
> Presently, when there's a production issue (e.g. TS-3049, TS-2983 etc), it is 
> really hard to isolate/debug with the high traffic. Turning on debug logs in 
> traffic is unfortunately not an option due to performance impacts. Even if 
> you took a performance hit and turned on the logs, it is just as hard to 
> separate out the logs for a single connection/transaction among the millions 
> of the logs output in a short period of time.
> I think it would be good if there's a way to turn on debug logs in a 
> controlled manner in production environment. One simple option is to support 
> a config setting for example, with a client-ip, which when set, would turn on 
> debug logs for any connection made by just that one client. If needed, 
> instead of one client-ip, we may allow configuring up to 'n' (say, 5) 
> client-ips. 
> If there are other ideas, please comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-315) Add switch to disable config file generation/runtime behavior changing

2015-10-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-315:
--
Labels: A yahoo  (was: A)

> Add switch to disable config file generation/runtime behavior changing
> --
>
> Key: TS-315
> URL: https://issues.apache.org/jira/browse/TS-315
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Miles Libbey
>Assignee: Bryan Call
>Priority: Minor
>  Labels: A, yahoo
> Fix For: sometime
>
>
> (was yahoo bug 1863676)
> Original description
> by Michael S. Fischer  2 years ago at 2008-04-09 09:52
> In production, in order to improve site stability, it is imperative that TS 
> never accidentally overwrite its own
> configuration files.  
> For this reason, we'd like to request a switch be added to TS, preferably via 
> the command line, that disables all
> automatic configuration file generation or other  runtime behavioral changes 
> initiated by any form of IPC other than
> 'traffic_line -x'  (including the web interface, etc.)
>   
>  
> Comment 1
>  by Bjornar Sandvik 2 years ago at 2008-04-09 09:57:17
> A very crucial request, in my opinion. If TS needs to be able to read 
> command-line config changes on the fly, these
> changes should be stored in another config file (for example 
> remap.config.local instead of remap.config). We have a
> patch config package that overwrites 4 of the config files under 
> /home/conf/ts/, and with all packages 
> we'd like to think that the content of these files can't change outside our 
> control.
>
> Comment 2
>  by Bryan Call  2 years ago at 2008-04-09 11:02:46
> traffic_line -x doesn't modify the configuration, it reloads the 
> configuration files.  If we want to have an option for
> this it would be good to have it as an option configuration file (CONFIG 
> proxy.config.write_protect INT 1).
> It would be an equivalent of write protecting floppies (ahh the memories)...
>   
>  
> Comment 3
>  by Michael S. Fischer  2 years ago at 2008-04-09 11:09:09
> I don't think it would be a good idea to have this in the configuration file, 
> as it would introduce a chicken/egg
> problem.
>   
>  
> Comment 4
>  by Leif Hedstrom 19 months ago at 2008-08-27 12:43:17
> So I'm not 100% positive that this isn't just a bad interaction. Now, it's 
> only
> triggered when trafficserver is running, but usually what ends up happening 
> is that we get a records.config which
> looks like it's the default config that comes with the trafficserver package.
> It's possible it's all one and the same issue, or we might have two issues.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    7   8   9   10   11   12   13   14   15   16   >