[ https://issues.apache.org/jira/browse/TS-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leif Hedstrom updated TS-4161: ------------------------------ Labels: crash (was: ) > ProcessManager prone to stack-overflow > -------------------------------------- > > Key: TS-4161 > URL: https://issues.apache.org/jira/browse/TS-4161 > Project: Traffic Server > Issue Type: Bug > Components: Manager > Reporter: Gancho Tenev > Assignee: Gancho Tenev > Labels: crash > Fix For: 6.2.0 > > > ProcessManager::pollLMConnection() can get "stuck" in a loop while handling > big number of messages in a raw from the same socket. > Since alloca() is used to allocate buffers on the stack for each message read > from the socket, and those buffers are not released until the function > returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same > could happen if the message length is big enough (accidentally or on purpose). > It can be reproduced easily by setting up: > proxy.config.lm.pserver_timeout_secs: 0 > proxy.config.lm.pserver_timeout_msecs: 0 > in records.config and running ./bin/traffic_manager. > ATS crashes with a segfault in a weird place (while trying to allocate with > malloc()). If you inspect the core you would see that it got "stuck" in the > loop before it crashed over-flowing the stack (kept allocating buffers on the > stack with alloca() until it crashed). > It is worth considering replacing the alloca() with VLA (which "releases" > memory when out of scope on each iteration of the loop) or using ats_malloc() > which is supposedly less time-efficient but would be better to handle bigger > messages without worrying about stack-overflow. > IMO adding a message size limit check is a good practice especially with the > current implementation. > If the code gets "stuck" in the while loop while reading big number of > messages in a row from the same socket then the port configured by > proxy.config.process_manager.mgmt_port becomes unavailable (connection > refused). Adding a limit of messages that can be processed in a row should be > a good idea. > I stumbled up on this while running TSQA regression tests where TSQA kept > complaining that the management port is not available and the ATS kept > crashing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)