[
https://issues.apache.org/jira/browse/TS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195374#comment-14195374
]
Susan Hinrichs commented on TS-3105:
------------------------------------
Fixed another problem last Friday. This error was a direct result of the fix
for TS-3084 (or TS-3073).
The problem was exhibited by a series of posts with keep-alives. post 1 is
sort and the response is in the first packet and already in the buffer by the
time the tunnel starts processing. Post 2 uses the same connection as post 1.
The problem was in HttpTunnel::producer_run. The original logic would set
half_close_flag on the connection because there was no more data to fetch. But
this is not accurate, since there may be further transactions sharing this same
connection.
Fixed this by not setting the half_close_flag in this case, and defensively
adding a line to clear the half_close flag in new_transaction.
Need to move these fixes to the master patch.
> Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on
> 5.1 and beyond
> --------------------------------------------------------------------------------------------
>
> Key: TS-3105
> URL: https://issues.apache.org/jira/browse/TS-3105
> Project: Traffic Server
> Issue Type: Bug
> Reporter: Susan Hinrichs
> Assignee: Susan Hinrichs
> Fix For: 5.2.0
>
> Attachments: ts-3073-and-3084-and-3105-against-510.patch,
> ts-3105-master-6.patch
>
>
> These two patches were run in a production environment on top of 5.0.1
> without problem for several weeks. Now running with these patches on top of
> 5.1 causes either an assert or a segfault. Another person has reported the
> same segfault when running master in a production environment.
> In the assert, the handler_state of the producers is 0 (UNKNOWN) rather than
> a terminal state which is expected. I'm assuming either we are being
> directed into the terminal state from a connection that terminates too
> quickly. Or an event has hung around for too long and is being executed
> against the state machine after it has been recycled.
> The event is HTTP_TUNNEL_EVENT_DONE
> The assert stack trace is
> FATAL: HttpSM.cc:2632: failed assert `0`
> /z/bin/traffic_server - STACK TRACE:
> /z/lib/libtsutil.so.5(+0x25197)[0x2b8bd08dc197]
> /z/lib/libtsutil.so.5(+0x23def)[0x2b8bd08dadef]
> /z/bin/traffic_server(HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer*)+0xcd)[0x5982ad]
> /z/bin/traffic_server(HttpSM::tunnel_handler_post(int, void*)+0x86)[0x5a32d6]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x5a1e18]
> /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0xee)[0x5dd6ae]
> /z/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*,
> EThread*)+0x136e)[0x721d1e]
> /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x28c)[0x7162fc]
> /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
> /z/bin/traffic_server(EThread::execute()+0x4fc)[0x7458ac]
> /z/bin/traffic_server[0x7440ca]
> /lib64/libpthread.so.0(+0x7034)[0x2b8bd1ee4034]
> /lib64/libc.so.6(clone+0x6d)[0x2b8bd2c2875d]
> The segfault stack trace is
> /z/bin/traffic_server - STACK TRACE:
> /lib64/libpthread.so.0(+0xf280)[0x2abccd0d8280]
> /z/bin/traffic_server(HttpSM::tunnel_handler_ua(int,
> HttpTunnelConsumer*)+0x122)[0x591462]
> /z/bin/traffic_server(HttpTunnel::consumer_handler(int,
> HttpTunnelConsumer*)+0x9e)[0x5dd15e]
> /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0x117)[0x5dd6d7]
> /z/bin/traffic_server(UnixNetVConnection::mainEvent(int,
> Event*)+0x3f0)[0x725190]
> /z/bin/traffic_server(InactivityCop::check_inactivity(int,
> Event*)+0x275)[0x716b75]
> /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
> /z/bin/traffic_server(EThread::execute()+0x2fb)[0x7456ab]
> /z/bin/traffic_server[0x7440ca]
> /lib64/libpthread.so.0(+0x7034)[0x2abccd0d0034]
> /lib64/libc.so.6(clone+0x6d)[0x2abccde1475d]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)