Hello,

We've found an issue when using agent checks in conjunction with the
weighted least connections algorithm in multithreaded mode.  It seems to me
as if it is possible for next_eweight in struct server to be modified in
another thread during the execution of fwlc_srv_reposition.  If
next_eweight is set to zero then a division by zero occurs on line 59 in
src/lb_fwlc.c in fwlc_queue_srv.

I notice that in haproxy-2.0.18 this section of code is protected by
HA_SPINLOCKs and I've been unable to replicate this issue in that version.

I've written an agent (attached), bad_agent.py, which provokes this
condition by switching randomly between 1 and 0 percent.  I also include a
minimal configuration, cfg (also attached), which seems sufficient to cause
the issue.  With these two running “ab -n 5000000 -c 500
http://192.168.92.1:8080/” will quickly crash the haproxy process.

I include links to a coredump and the binary that generated it
(unstripped).  The backtrace of thread 1 follows.

GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./haproxy...done.
[New LWP 11307]
[New LWP 11308]

warning: Unexpected size of section `.reg-xstate/11307' in core file.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./haproxy -db -- cfg'.
Program terminated with signal SIGFPE, Arithmetic exception.

warning: Unexpected size of section `.reg-xstate/11307' in core file.
#0  0x000056120ec24a90 in fwlc_queue_srv (s=0x56120f989e80) at
src/lb_fwlc.c:59
59                      fwlc_queue_srv(s);
[Current thread is 1 (Thread 0x7f5871e723c0 (LWP 11307))]
(gdb) set pagination off
(gdb) bt full
#0  0x000056120ec24a90 in fwlc_queue_srv (s=0x56120f989e80) at
src/lb_fwlc.c:59
No locals.
#1  fwlc_srv_reposition (s=0x56120f989e80) at src/lb_fwlc.c:59
No locals.
#2  0x000056120ebf5390 in connect_server (s=s@entry=0x56120fa02650) at
src/backend.c:1234
        count = <optimized out>
        cli_conn = 0x7f586c504e70
        srv_conn = 0x7f586c099020
        srv_cs = <optimized out>
        old_cs = <optimized out>
        srv = <optimized out>
        reuse = <optimized out>
        err = <optimized out>
#3  0x000056120eb990f8 in sess_update_stream_int (s=0x56120fa02650) at
src/stream.c:886
        conn_err = <optimized out>
        srv = 0x56120f989e80
        si = 0x56120fa028c0
        req = 0x56120fa02660
        srv = <optimized out>
        si = <optimized out>
        req = <optimized out>
        conn_err = <optimized out>
#4  process_stream (t=<optimized out>) at src/stream.c:2234
        srv = <optimized out>
        s = 0x56120fa02650
        sess = <optimized out>
        rqf_last = <optimized out>
        rpf_last = 2147483648
        rq_prod_last = <optimized out>
        rq_cons_last = <optimized out>
        rp_cons_last = <optimized out>
        rp_prod_last = <optimized out>
        req_ana_back = <optimized out>
        req = 0x56120fa02660
        res = 0x56120fa026a0
        si_f = 0x56120fa02898
        si_b = 0x56120fa028c0
#5  0x000056120ec2007d in process_runnable_tasks () at src/task.c:317
        t = <optimized out>
        i = <optimized out>
        max_processed = <optimized out>
        local_tasks = {0x56120fa029b0, 0x7f586c04d340, 0x500000004, 0xb2,
0x0, 0x56120ec2527b <mux_pt_wake+91>, 0x500000004, 0xb2, 0x0,
0x7f586c529180, 0x2c80, 0x56120ec15513 <conn_fd_handler+259>, 0x0,
0xffffffff00000000, 0x7fff32892a50, 0x7fff32892a50}
        local_tasks_count = <optimized out>
        final_tasks_count = 0
#6  0x000056120ebce414 in run_poll_loop () at src/haproxy.c:2499
        next = <optimized out>
        exp = <optimized out>
        next = <optimized out>
        exp = <optimized out>
#7  run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2569
        ptif = <optimized out>
        ptdf = <optimized out>
        start_lock = 0
#8  0x000056120eb4212f in main (argc=<optimized out>, argv=<optimized out>)
at src/haproxy.c:3172
        tids = <optimized out>
        threads = 0x56120f9861c0
        i = <optimized out>
        old_sig = {__val = {2048, 48, 72057589742960643, 94635570839576,
31, 80, 18446744073709544224, 0, 206158430211, 0, 0, 472446402651,
511101108348, 0, 140017846675184, 140017846656064}}
        blocked_sig = {__val = {18446744067199990583, 18446744073709551615
<repeats 15 times>}}
        err = <optimized out>
        retry = <optimized out>
        limit = {rlim_cur = 4027, rlim_max = 4027}
        errmsg = "\000pression support (neither USE_ZLIB nor
\000X%-x\342/6\375\000\000\000\000\000\000\000(-\211\062\377\177\000\000\350+\211\062\377\177\000\000\001\303\307\016\022V\000\000\201\000\000\000\000\000\000\000H,\211\062\377\177\000\000`C\225\017"
        pidfd = <optimized out>
(gdb)

 core-1.8.26
<https://drive.google.com/file/d/18Y1Fm3nkmg9-6N-UqqdCzkynaMIuLANF/view?usp=drive_web>
 haproxy-1.8.26
<https://drive.google.com/file/d/159chxrfQR1AmSGCpO5mbrhGjLxbGNCIQ/view?usp=drive_web>

-- 
Peter Statham

Loadbalancer.org Ltd.
www.loadbalancer.org <https://www.loadbalancer.org/?gclid=ES2017>

   <https://twitter.com/loadbalancerorg>
<http://www.linkedin.com/company/3191352?trk=prof-exp-company-name>
<https://www.loadbalancer.org/?category=company&post-name=overview&?gclid=ES2017>
<https://www.loadbalancer.org/?gclid=ES2017>
+1 888 867 9504 / +44 (0)330 380 1064
peter.stat...@loadbalancer.org

LEAVE A REVIEW
<http://collector.reviews.io/loadbalancer-org-inc-/new-review> | DEPLOYMENT
GUIDES
<https://www.loadbalancer.org/?category=resources&post-name=deployment-guides&?gclid=ES2017>
 | BLOG <https://www.loadbalancer.org/blog/?gclid=ES2017>
#!/usr/bin/python3
from socketserver import TCPServer, BaseRequestHandler
from random import randint

class Handler (BaseRequestHandler):
    def handle (self):
        self.request.sendall ('{}%\r\n'.format (randint (0, 1)).encode ());

with TCPServer (("0.0.0.0", 3333), Handler) as srv:
    srv.serve_forever ()

Attachment: cfg
Description: Binary data

Reply via email to