[ 
https://issues.apache.org/jira/browse/TS-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-4161:
------------------------------
    Labels: crash  (was: )

> ProcessManager prone to stack-overflow
> --------------------------------------
>
>                 Key: TS-4161
>                 URL: https://issues.apache.org/jira/browse/TS-4161
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Manager
>            Reporter: Gancho Tenev
>            Assignee: Gancho Tenev
>              Labels: crash
>             Fix For: 6.2.0
>
>
> ProcessManager::pollLMConnection() can get "stuck" in a loop while handling 
> big number of messages in a raw from the same socket. 
> Since alloca() is used to allocate buffers on the stack for each message read 
> from the socket, and those buffers are not released until the function 
> returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same 
> could happen if the message length is big enough (accidentally or on purpose).
> It can be reproduced easily by setting up:
>             proxy.config.lm.pserver_timeout_secs: 0
>             proxy.config.lm.pserver_timeout_msecs: 0
> in records.config and running ./bin/traffic_manager. 
> ATS crashes with a segfault in a weird place (while trying to allocate with 
> malloc()). If you inspect the core you would see that it got "stuck" in the 
> loop before it crashed over-flowing the stack (kept allocating buffers on the 
> stack with alloca() until it crashed).
> It is worth considering replacing the alloca() with VLA (which "releases" 
> memory when out of scope on each iteration of the loop) or using ats_malloc() 
> which is supposedly less time-efficient but would be better to handle bigger 
> messages without worrying about stack-overflow. 
> IMO adding a message size limit check is a good practice especially with the 
> current implementation.
> If the code gets "stuck" in the while loop while reading big number of 
> messages in a row from the same socket then the port configured by 
> proxy.config.process_manager.mgmt_port becomes unavailable (connection 
> refused). Adding a limit of messages that can be processed in a row should be 
> a good idea.
> I stumbled up on this while running TSQA regression tests where TSQA kept 
> complaining that the management port is not available and the ATS kept 
> crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to