[ https://issues.apache.org/jira/browse/TS-4717?focusedWorklogId=26209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-26209 ]
ASF GitHub Bot logged work on TS-4717: -------------------------------------- Author: ASF GitHub Bot Created on: 05/Aug/16 21:28 Start Date: 05/Aug/16 21:28 Worklog Time Spent: 10m Work Description: GitHub user shinrich opened a pull request: https://github.com/apache/trafficserver/pull/842 TS-4717: Http2 stack explosion. Added a common state_process_frame_read method to loop over reading frames while there is data available. The original state_start_frame_read and state_complete_frame_read call into state_process_frame_read so the event handling cases still work. Have been running a version on of this code on two of our production boxes for a day. We haven't had a load surge event, so I doubt we have seen a case that would have caused the stack explosion. But the performance and error stats seem similar to their peers, so I don't think I have messed up the normal operating case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shinrich/trafficserver ts-4717 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/trafficserver/pull/842.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #842 ---- commit a166cf0335672abd3514f43a081b4fce045725f2 Author: Susan Hinrichs <shinr...@ieee.org> Date: 2016-08-05T14:29:53Z TS-4717: Http2 stack explosion. ---- Issue Time Tracking ------------------- Worklog Id: (was: 26209) Time Spent: 10m Remaining Estimate: 0h > Http2 stack explosion > --------------------- > > Key: TS-4717 > URL: https://issues.apache.org/jira/browse/TS-4717 > Project: Traffic Server > Issue Type: Bug > Components: HTTP/2 > Reporter: Susan Hinrichs > Assignee: Susan Hinrichs > Time Spent: 10m > Remaining Estimate: 0h > > We see this periodically with high traffic loads. ATS crashes with 7000+ > frames on the stack. The bulk of the frames are the following frame > sequence. > {code} > #117 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, > event=100, data=0x2b0bad0c7cf0) > at ../iocore/eventsystem/I_Continuation.h:150 > #118 0x000000000064c05d in Http2ClientSession::state_start_frame_read > (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) > at Http2ClientSession.cc:451 > #119 0x000000000064b0af in Http2ClientSession::main_event_handler > (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at > Http2ClientSession.cc:292 > #120 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, > event=100, data=0x2b0bad0c7cf0) > at ../iocore/eventsystem/I_Continuation.h:150 > #121 0x000000000064c386 in Http2ClientSession::state_complete_frame_read > (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) > at Http2ClientSession.cc:483 > #122 0x000000000064b0af in Http2ClientSession::main_event_handler > (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at > Http2ClientSession.cc:292 > #123 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, > event=100, data=0x2b0bad0c7cf0) > at ../iocore/eventsystem/I_Continuation.h:150 > #124 0x000000000064c05d in Http2ClientSession::state_start_frame_read > (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) > at Http2ClientSession.cc:451 > {code} > We had cherry picked in the fix for TS-4209 to correctly enforce the > concurrent stream limit. But in the latest crash of this type, it looks like > we are pulling small items from cache, so the stream lives and dies on the > stack. The concurrent active connection count never reaches the limit. > I am going to try to change the > state_state_start_frame_read/state_complete_frame_read logic from recursing > handlers to a loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)