Susan Hinrichs created TS-3871:
----------------------------------

             Summary: VC Migration Can Lose Events
                 Key: TS-3871
                 URL: https://issues.apache.org/jira/browse/TS-3871
             Project: Traffic Server
          Issue Type: Bug
          Components: HTTP
            Reporter: Susan Hinrichs


Found this in my stress testing.  Sometimes the POST or GET response is 
completely empty.  No header and no body.  The packet capture shows that ATS 
closes the connection 70 seconds after the last POST or GET of the connection 
was received.  This corresponds to the 
proxy.config.http.keep_alive_no_activity_timeout_in on my test box.

I moved from global pool to local pool and the problem went away.

I eventually tracked it down to a problem in the epoll update.  ep.start() 
during the migration would fail sometimes with EEXIST error.  This means that 
the file descriptor is already associated with the epoll.  If we are migrating 
from thread A to thread B this should not be the case.  Unless we when from 
thread B to thread A and back to thread B without cleaning up the original 
thread B epoll.  If this is happening, then multiple threads will be processing 
network events which seems like a recipe for disaster and dropped events.

Originally, I left the ep.stop() which clears the epoll on the original 
thread's epoll structure to be done by the original thread.  But under stress 
that seems to be a bad idea.  Too much drift.  With some more research, it 
appears that the epoll calls are thread safe.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-03/msg00084.html

I rearranged the code to do both the ep.stop() and ep.start() in the same 
migrating target thread, and my stress test had no more problems.

I've run this patch on a production machine for over 12 hours with no crashes 
and no performance discrepancies.  We will be expanding this testing.

To repeat, this is not a problem we saw in production, but only in my "make it 
fall over" stress test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to