[jira] [Comment Edited] (TS-3105) Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 5.1 and beyond

2014-11-07 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202264#comment-14202264
 ] 

Susan Hinrichs edited comment on TS-3105 at 11/7/14 11:42 PM:
--

ts-3105-master-9.patch should be functionally equivalent to 
ts-3105-master-7.patch.  Cleaned up comments and Warnings.  This is the version 
in my github branch.


was (Author: shinrich):
ts-3105-master-8.patch should be functionally equivalent to 
ts-3105-master-7.patch.  Cleaned up comments and Warnings.  This is the version 
in my github branch.

 Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 
 5.1 and beyond
 

 Key: TS-3105
 URL: https://issues.apache.org/jira/browse/TS-3105
 Project: Traffic Server
  Issue Type: Bug
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.2.0

 Attachments: ts-3073-and-3084-and-3105-against-510.patch, 
 ts-3105-master-7.patch, ts-3105-master-9.patch


 These two patches were run in a production environment on top of 5.0.1 
 without problem for several weeks.  Now running with these patches on top of 
 5.1 causes either an assert or a segfault.  Another person has reported the 
 same segfault when running master in a production environment.
 In the assert, the handler_state of the producers is 0 (UNKNOWN) rather than 
 a terminal state which is expected.  I'm assuming either we are being 
 directed into the terminal state from a connection that terminates too 
 quickly.  Or an event has hung around for too long and is being executed 
 against the state machine after it has been recycled.
 The event is HTTP_TUNNEL_EVENT_DONE
 The assert stack trace is
 FATAL: HttpSM.cc:2632: failed assert `0`
 /z/bin/traffic_server - STACK TRACE:
 /z/lib/libtsutil.so.5(+0x25197)[0x2b8bd08dc197]
 /z/lib/libtsutil.so.5(+0x23def)[0x2b8bd08dadef]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer*)+0xcd)[0x5982ad]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post(int, void*)+0x86)[0x5a32d6]
 /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x5a1e18]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0xee)[0x5dd6ae]
 /z/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, 
 EThread*)+0x136e)[0x721d1e]
 /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x28c)[0x7162fc]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x4fc)[0x7458ac]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2b8bd1ee4034]
 /lib64/libc.so.6(clone+0x6d)[0x2b8bd2c2875d]
 The segfault stack trace is 
 /z/bin/traffic_server - STACK TRACE: 
 /lib64/libpthread.so.0(+0xf280)[0x2abccd0d8280]
 /z/bin/traffic_server(HttpSM::tunnel_handler_ua(int, 
 HttpTunnelConsumer*)+0x122)[0x591462]
 /z/bin/traffic_server(HttpTunnel::consumer_handler(int, 
 HttpTunnelConsumer*)+0x9e)[0x5dd15e]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0x117)[0x5dd6d7]
 /z/bin/traffic_server(UnixNetVConnection::mainEvent(int, 
 Event*)+0x3f0)[0x725190]
 /z/bin/traffic_server(InactivityCop::check_inactivity(int, 
 Event*)+0x275)[0x716b75]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x2fb)[0x7456ab]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2abccd0d0034]
 /lib64/libc.so.6(clone+0x6d)[0x2abccde1475d]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3105) Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 5.1 and beyond

2014-11-04 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195703#comment-14195703
 ] 

Susan Hinrichs edited comment on TS-3105 at 11/4/14 11:44 PM:
--

ts-3105-master-7.patch contain the previous two fixes against the master branch


was (Author: shinrich):
ts-3105-master-4.patch contain the previous two fixes against the master branch

 Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 
 5.1 and beyond
 

 Key: TS-3105
 URL: https://issues.apache.org/jira/browse/TS-3105
 Project: Traffic Server
  Issue Type: Bug
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.2.0

 Attachments: ts-3073-and-3084-and-3105-against-510.patch, 
 ts-3105-master-6.patch, ts-3105-master-7.patch


 These two patches were run in a production environment on top of 5.0.1 
 without problem for several weeks.  Now running with these patches on top of 
 5.1 causes either an assert or a segfault.  Another person has reported the 
 same segfault when running master in a production environment.
 In the assert, the handler_state of the producers is 0 (UNKNOWN) rather than 
 a terminal state which is expected.  I'm assuming either we are being 
 directed into the terminal state from a connection that terminates too 
 quickly.  Or an event has hung around for too long and is being executed 
 against the state machine after it has been recycled.
 The event is HTTP_TUNNEL_EVENT_DONE
 The assert stack trace is
 FATAL: HttpSM.cc:2632: failed assert `0`
 /z/bin/traffic_server - STACK TRACE:
 /z/lib/libtsutil.so.5(+0x25197)[0x2b8bd08dc197]
 /z/lib/libtsutil.so.5(+0x23def)[0x2b8bd08dadef]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer*)+0xcd)[0x5982ad]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post(int, void*)+0x86)[0x5a32d6]
 /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x5a1e18]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0xee)[0x5dd6ae]
 /z/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, 
 EThread*)+0x136e)[0x721d1e]
 /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x28c)[0x7162fc]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x4fc)[0x7458ac]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2b8bd1ee4034]
 /lib64/libc.so.6(clone+0x6d)[0x2b8bd2c2875d]
 The segfault stack trace is 
 /z/bin/traffic_server - STACK TRACE: 
 /lib64/libpthread.so.0(+0xf280)[0x2abccd0d8280]
 /z/bin/traffic_server(HttpSM::tunnel_handler_ua(int, 
 HttpTunnelConsumer*)+0x122)[0x591462]
 /z/bin/traffic_server(HttpTunnel::consumer_handler(int, 
 HttpTunnelConsumer*)+0x9e)[0x5dd15e]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0x117)[0x5dd6d7]
 /z/bin/traffic_server(UnixNetVConnection::mainEvent(int, 
 Event*)+0x3f0)[0x725190]
 /z/bin/traffic_server(InactivityCop::check_inactivity(int, 
 Event*)+0x275)[0x716b75]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x2fb)[0x7456ab]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2abccd0d0034]
 /lib64/libc.so.6(clone+0x6d)[0x2abccde1475d]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3105) Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 5.1 and beyond

2014-11-03 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189535#comment-14189535
 ] 

Susan Hinrichs edited comment on TS-3105 at 11/3/14 9:44 PM:
-

ts-3105-master-6  Trying to prevent tunnel_handler_ua from being called twice.

Update:  This patch has been run in a production environment for 3 hours so 
far.  

The key change between this patch and the previous patch was a change in 
HttpTunnel::consumer_handler.  In the case of a final event (e.g. write 
complete, eos, error, timeout), the original code was setting the final 
callback to execute, but this caused assertion failures in 
HttpSM::tunnel_handler_post_or_put() if p-handler_state was 0.  Earlier 
versions of the patch would avoid setting the callback flag if p-handler_state 
was 0.  This avoided the assert, but apparently caused the state machine to 
leak.

This patch updated the logic to set the p-handler_state flag if it was not 
already set.  It chooses a value based on the event and the c-vc_type.


was (Author: shinrich):
ts-3105-master-6  Trying to prevent tunnel_handler_ua from being called twice.

Update:  This patch has been run in a production environment for 3 hours so 
far.  

The key change between this patch and the previous patch was a changed in 
HttpTunnel::consumer_handler.  In the case of a final event (e.g. write 
complete, eos, error, timeout), the original code was setting the final 
callback to execute, but this caused assertion failures in 
HttpSM::tunnel_handler_post_or_put() if p-handler_state was 0.  Earlier 
versions of the patch would avoid setting the callback flag if p-handler_state 
was 0.  This avoided the assert, but apparently caused the state machine to 
leak.

This patch updated the logic to set the p-handler_state flag if it was not 
already set.  It chooses a value based on the event and the c-vc_type.

 Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 
 5.1 and beyond
 

 Key: TS-3105
 URL: https://issues.apache.org/jira/browse/TS-3105
 Project: Traffic Server
  Issue Type: Bug
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.2.0

 Attachments: ts-3073-and-3084-and-3105-against-510.patch, 
 ts-3105-master-6.patch


 These two patches were run in a production environment on top of 5.0.1 
 without problem for several weeks.  Now running with these patches on top of 
 5.1 causes either an assert or a segfault.  Another person has reported the 
 same segfault when running master in a production environment.
 In the assert, the handler_state of the producers is 0 (UNKNOWN) rather than 
 a terminal state which is expected.  I'm assuming either we are being 
 directed into the terminal state from a connection that terminates too 
 quickly.  Or an event has hung around for too long and is being executed 
 against the state machine after it has been recycled.
 The event is HTTP_TUNNEL_EVENT_DONE
 The assert stack trace is
 FATAL: HttpSM.cc:2632: failed assert `0`
 /z/bin/traffic_server - STACK TRACE:
 /z/lib/libtsutil.so.5(+0x25197)[0x2b8bd08dc197]
 /z/lib/libtsutil.so.5(+0x23def)[0x2b8bd08dadef]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer*)+0xcd)[0x5982ad]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post(int, void*)+0x86)[0x5a32d6]
 /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x5a1e18]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0xee)[0x5dd6ae]
 /z/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, 
 EThread*)+0x136e)[0x721d1e]
 /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x28c)[0x7162fc]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x4fc)[0x7458ac]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2b8bd1ee4034]
 /lib64/libc.so.6(clone+0x6d)[0x2b8bd2c2875d]
 The segfault stack trace is 
 /z/bin/traffic_server - STACK TRACE: 
 /lib64/libpthread.so.0(+0xf280)[0x2abccd0d8280]
 /z/bin/traffic_server(HttpSM::tunnel_handler_ua(int, 
 HttpTunnelConsumer*)+0x122)[0x591462]
 /z/bin/traffic_server(HttpTunnel::consumer_handler(int, 
 HttpTunnelConsumer*)+0x9e)[0x5dd15e]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0x117)[0x5dd6d7]
 /z/bin/traffic_server(UnixNetVConnection::mainEvent(int, 
 Event*)+0x3f0)[0x725190]
 /z/bin/traffic_server(InactivityCop::check_inactivity(int, 
 Event*)+0x275)[0x716b75]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x2fb)[0x7456ab]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2abccd0d0034]
 

[jira] [Comment Edited] (TS-3105) Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 5.1 and beyond

2014-11-03 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195298#comment-14195298
 ] 

Susan Hinrichs edited comment on TS-3105 at 11/3/14 11:35 PM:
--

Last Friday while working on the patches for 5.1, ran into the following 
issues.  

VC_EVENT_EOS was being delivered to consumer_handler in some cases during a 
post workload.  It looks like there were two cases for this.

1. The consumer's associated VC is for the HttpServerSession.  The post 
response is very short (one packet) .  It is delivered before the second server 
response tunnel is set up.  Since there is no producer matching the VC, the 
event is instead delivered to the consumer for the first tunnel .  Fixed this 
by changing the do_io_read in HttpSM::attach_server_session to read no bytes.  
This is sufficient to redirect error and timeout events to the new VC handler, 
but it won't start reading anything until the server response tunnel is in 
place and a second do_io_read is issued in  
HttpSM::setup_server_read_response_header.  With this change the events from 
the second tunnel will be delivered to the second tunnel's producer.

Was able to see this failure case by doing POST-based filed uploads against 
test.websafedeposit.net.  Didn't fail everytime, but frequently enough to debug.

While poking around in this logic, noticed that the call to do_io_read in  
HttpSM::attach_client_session was passing a length of 0, but a non-null buffer. 
 Changed the third argument to NULL.

2. In the second case, a RESET is performed and is delivered as a VC_EVENT_EOS  
I was exercising this by sending a Reset on the client side.  This means that 
the EOS delivered to the consumer_handler should indeed be treated as an error 
case.  Exercised this by writing a test client that issues a RESET after part 
of the post.

Need to move these fixes to the master patch.



was (Author: shinrich):
Last Friday while working on the patches for 5.1, ran into the following 
issues.  

VC_EVENT_EOS was being delivered to consumer_handler in some cases during a 
post workload.  It looks like there were two cases for this.

1. The consumer's associated VC is for the HttpServerSession.  The post 
response is very short (one packet) .  It is delivered before the second server 
response tunnel is set up.  Since there is no producer matching the VC, the 
event is instead delivered to the consumer for the first tunnel .  Fixed this 
by changing the do_io_read in HttpSM::attach_server_session to read no bytes.  
This is sufficient to redirect error and timeout events to the new VC handler, 
but it won't start reading anything until the server response tunnel is in 
place and a second do_io_read is issued in  
HttpSM::setup_server_read_response_header.  With this change the events from 
the second tunnel will be delivered to the second tunnel's producer.

While poking around in this logic, noticed that the call to do_io_read in  
HttpSM::attach_client_session was passing a length of 0, but a non-null buffer. 
 Changed the third argument to NULL.

2. In the second case, a RESET is performed and is delivered as a VC_EVENT_EOS  
I was exercising this by sending a Reset on the client side.  This means that 
the EOS delivered to the consumer_handler should indeed be treated as an error 
case.



 Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 
 5.1 and beyond
 

 Key: TS-3105
 URL: https://issues.apache.org/jira/browse/TS-3105
 Project: Traffic Server
  Issue Type: Bug
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.2.0

 Attachments: ts-3073-and-3084-and-3105-against-510.patch, 
 ts-3105-master-6.patch


 These two patches were run in a production environment on top of 5.0.1 
 without problem for several weeks.  Now running with these patches on top of 
 5.1 causes either an assert or a segfault.  Another person has reported the 
 same segfault when running master in a production environment.
 In the assert, the handler_state of the producers is 0 (UNKNOWN) rather than 
 a terminal state which is expected.  I'm assuming either we are being 
 directed into the terminal state from a connection that terminates too 
 quickly.  Or an event has hung around for too long and is being executed 
 against the state machine after it has been recycled.
 The event is HTTP_TUNNEL_EVENT_DONE
 The assert stack trace is
 FATAL: HttpSM.cc:2632: failed assert `0`
 /z/bin/traffic_server - STACK TRACE:
 /z/lib/libtsutil.so.5(+0x25197)[0x2b8bd08dc197]
 /z/lib/libtsutil.so.5(+0x23def)[0x2b8bd08dadef]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer*)+0xcd)[0x5982ad]
 

[jira] [Comment Edited] (TS-3105) Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 5.1 and beyond

2014-10-30 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189535#comment-14189535
 ] 

Susan Hinrichs edited comment on TS-3105 at 10/30/14 3:20 PM:
--

ts-3105-master-6  Trying to prevent tunnel_handler_ua from being called twice.

Update:  This patch has been run in a production environment for 3 hours so 
far.  

The key change between this patch and the previous patch was a changed in 
HttpTunnel::consumer_handler.  In the case of a final event (e.g. write 
complete, eos, error, timeout), the original code was setting the final 
callback to execute, but this caused assertion failures in 
HttpSM::tunnel_handler_post_or_put() if p-handler_state was 0.  Earlier 
versions of the patch would avoid setting the callback flag if p-handler_state 
was 0.  This avoided the assert, but apparently caused the state machine to 
leak.

This patch updated the logic to set the p-handler_state flag if it was not 
already set.  It chooses a value based on the event and the c-vc_type.


was (Author: shinrich):
ts-3105-master-6  Trying to prevent tunnel_handler_ua from being called twice.

 Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 
 5.1 and beyond
 

 Key: TS-3105
 URL: https://issues.apache.org/jira/browse/TS-3105
 Project: Traffic Server
  Issue Type: Bug
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.2.0

 Attachments: ts-3073-and-3084-and-3105-against-510.patch, 
 ts-3105-master-6.patch


 These two patches were run in a production environment on top of 5.0.1 
 without problem for several weeks.  Now running with these patches on top of 
 5.1 causes either an assert or a segfault.  Another person has reported the 
 same segfault when running master in a production environment.
 In the assert, the handler_state of the producers is 0 (UNKNOWN) rather than 
 a terminal state which is expected.  I'm assuming either we are being 
 directed into the terminal state from a connection that terminates too 
 quickly.  Or an event has hung around for too long and is being executed 
 against the state machine after it has been recycled.
 The event is HTTP_TUNNEL_EVENT_DONE
 The assert stack trace is
 FATAL: HttpSM.cc:2632: failed assert `0`
 /z/bin/traffic_server - STACK TRACE:
 /z/lib/libtsutil.so.5(+0x25197)[0x2b8bd08dc197]
 /z/lib/libtsutil.so.5(+0x23def)[0x2b8bd08dadef]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer*)+0xcd)[0x5982ad]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post(int, void*)+0x86)[0x5a32d6]
 /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x5a1e18]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0xee)[0x5dd6ae]
 /z/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, 
 EThread*)+0x136e)[0x721d1e]
 /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x28c)[0x7162fc]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x4fc)[0x7458ac]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2b8bd1ee4034]
 /lib64/libc.so.6(clone+0x6d)[0x2b8bd2c2875d]
 The segfault stack trace is 
 /z/bin/traffic_server - STACK TRACE: 
 /lib64/libpthread.so.0(+0xf280)[0x2abccd0d8280]
 /z/bin/traffic_server(HttpSM::tunnel_handler_ua(int, 
 HttpTunnelConsumer*)+0x122)[0x591462]
 /z/bin/traffic_server(HttpTunnel::consumer_handler(int, 
 HttpTunnelConsumer*)+0x9e)[0x5dd15e]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0x117)[0x5dd6d7]
 /z/bin/traffic_server(UnixNetVConnection::mainEvent(int, 
 Event*)+0x3f0)[0x725190]
 /z/bin/traffic_server(InactivityCop::check_inactivity(int, 
 Event*)+0x275)[0x716b75]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x2fb)[0x7456ab]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2abccd0d0034]
 /lib64/libc.so.6(clone+0x6d)[0x2abccde1475d]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3105) Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 5.1 and beyond

2014-10-29 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188862#comment-14188862
 ] 

Susan Hinrichs edited comment on TS-3105 at 10/29/14 7:38 PM:
--

Found one more path where p-handler_state might be unset causing the assert in 
tunnel_handler_post_or_put.  Addressed by ts-3105-master-4.patch.


was (Author: shinrich):
Found one more path where p-handler_state might be unset causing the assert in 
tunnel_handler_post_or_put

 Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 
 5.1 and beyond
 

 Key: TS-3105
 URL: https://issues.apache.org/jira/browse/TS-3105
 Project: Traffic Server
  Issue Type: Bug
Reporter: Susan Hinrichs
Assignee: Susan Hinrichs
 Fix For: 5.2.0

 Attachments: ts-3073-and-3084-and-3105-against-510.patch, 
 ts-3105-master-4.patch


 These two patches were run in a production environment on top of 5.0.1 
 without problem for several weeks.  Now running with these patches on top of 
 5.1 causes either an assert or a segfault.  Another person has reported the 
 same segfault when running master in a production environment.
 In the assert, the handler_state of the producers is 0 (UNKNOWN) rather than 
 a terminal state which is expected.  I'm assuming either we are being 
 directed into the terminal state from a connection that terminates too 
 quickly.  Or an event has hung around for too long and is being executed 
 against the state machine after it has been recycled.
 The event is HTTP_TUNNEL_EVENT_DONE
 The assert stack trace is
 FATAL: HttpSM.cc:2632: failed assert `0`
 /z/bin/traffic_server - STACK TRACE:
 /z/lib/libtsutil.so.5(+0x25197)[0x2b8bd08dc197]
 /z/lib/libtsutil.so.5(+0x23def)[0x2b8bd08dadef]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer*)+0xcd)[0x5982ad]
 /z/bin/traffic_server(HttpSM::tunnel_handler_post(int, void*)+0x86)[0x5a32d6]
 /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x5a1e18]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0xee)[0x5dd6ae]
 /z/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, 
 EThread*)+0x136e)[0x721d1e]
 /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x28c)[0x7162fc]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x4fc)[0x7458ac]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2b8bd1ee4034]
 /lib64/libc.so.6(clone+0x6d)[0x2b8bd2c2875d]
 The segfault stack trace is 
 /z/bin/traffic_server - STACK TRACE: 
 /lib64/libpthread.so.0(+0xf280)[0x2abccd0d8280]
 /z/bin/traffic_server(HttpSM::tunnel_handler_ua(int, 
 HttpTunnelConsumer*)+0x122)[0x591462]
 /z/bin/traffic_server(HttpTunnel::consumer_handler(int, 
 HttpTunnelConsumer*)+0x9e)[0x5dd15e]
 /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0x117)[0x5dd6d7]
 /z/bin/traffic_server(UnixNetVConnection::mainEvent(int, 
 Event*)+0x3f0)[0x725190]
 /z/bin/traffic_server(InactivityCop::check_inactivity(int, 
 Event*)+0x275)[0x716b75]
 /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
 /z/bin/traffic_server(EThread::execute()+0x2fb)[0x7456ab]
 /z/bin/traffic_server[0x7440ca]
 /lib64/libpthread.so.0(+0x7034)[0x2abccd0d0034]
 /lib64/libc.so.6(clone+0x6d)[0x2abccde1475d]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)