Hi Sachin,

On Thu, Feb 11, 2021 at 03:11:09AM +0530, Sachin Shetty wrote:
> Hi,
> 
> We have a lua block that connects to memcache when a request arrives
> 
> """
> function get_from_gds(host, port, key)    local sock = core.tcp()
> sock:settimeout(20)    local result = DOMAIN_NOT_FOUND    local
> status, error = sock:connect(host, port)    if not status then
> core.Alert(GDS_LOG_PREFIX .. "GDS_ERROR: Error in connecting:" .. key
> .. ":" .. port .. ":" .. error)        return GDS_ERROR, "Error: " ..
> error    end    sock:send(key .. "\r\n")    while true do        local
> s, status, partial = sock:receive("*l")        if not s then
>  core.Alert(GDS_LOG_PREFIX .. "GDS_ERROR: Error reading:" .. key ..
> ":" .. port .. ":" .. status)            return GDS_ERROR, status
>   end        if s == "END" then break end        result = s    end
> sock:close()    return resultend
> 
> -- Comment: get_proxy calls get_from_gds
> 
> core.register_action("get_proxy", { "http-req" }, get_proxy)
> """
> The value is cached in a haproxy map so we don't make a memcache
> connection for every request.
> 
> At peak traffic if we reload haproxy, that invalidates the map and the
> surge causes
> quite a few memcache connections to fail. Error returned is "Can't connect"
> 
> We see the following messages in dmesg
> 
> [  +0.006924] haproxy[14258]: segfault at 0 ip 00007f117fba94c4 sp
> 00007f1179eefe08 error 4 in liblua-5.3.so[7f117fba1000+37000]
> 
> HA-Proxy version 2.0.18-be8b761 2020/09/30 - https://haproxy.org/

Unfortunately, this is not enough to figure the cause, you'll need to
enable core dumps and to pass it through gdb to figure a more exploitable
backtrace. Please take this opportunity for updating, as I'm seeing 117
patches merged into 2.0 after your version, some of which affect Lua
and others related to thread safety. One of them is even related to
Lua+maps.

Note, if that's not urgent on your side, we do have a few more fixes
pending to be backported to 2.0 that will warrant yet another version.
However none of them seem related to your issue (but if you're willing
to retest with the latest 2.0 snapshot you're welcome of course).

Willy

Reply via email to