Hello, We've found an issue when using agent checks in conjunction with the weighted least connections algorithm in multithreaded mode. It seems to me as if it is possible for next_eweight in struct server to be modified in another thread during the execution of fwlc_srv_reposition. If next_eweight is set to zero then a division by zero occurs on line 59 in src/lb_fwlc.c in fwlc_queue_srv.
I notice that in haproxy-2.0.18 this section of code is protected by HA_SPINLOCKs and I've been unable to replicate this issue in that version. I've written an agent (attached), bad_agent.py, which provokes this condition by switching randomly between 1 and 0 percent. I also include a minimal configuration, cfg (also attached), which seems sufficient to cause the issue. With these two running “ab -n 5000000 -c 500 http://192.168.92.1:8080/” will quickly crash the haproxy process. I include links to a coredump and the binary that generated it (unstripped). The backtrace of thread 1 follows. GNU gdb (Debian 8.2.1-2+b3) 8.2.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html > This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./haproxy...done. [New LWP 11307] [New LWP 11308] warning: Unexpected size of section `.reg-xstate/11307' in core file. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./haproxy -db -- cfg'. Program terminated with signal SIGFPE, Arithmetic exception. warning: Unexpected size of section `.reg-xstate/11307' in core file. #0 0x000056120ec24a90 in fwlc_queue_srv (s=0x56120f989e80) at src/lb_fwlc.c:59 59 fwlc_queue_srv(s); [Current thread is 1 (Thread 0x7f5871e723c0 (LWP 11307))] (gdb) set pagination off (gdb) bt full #0 0x000056120ec24a90 in fwlc_queue_srv (s=0x56120f989e80) at src/lb_fwlc.c:59 No locals. #1 fwlc_srv_reposition (s=0x56120f989e80) at src/lb_fwlc.c:59 No locals. #2 0x000056120ebf5390 in connect_server (s=s@entry=0x56120fa02650) at src/backend.c:1234 count = <optimized out> cli_conn = 0x7f586c504e70 srv_conn = 0x7f586c099020 srv_cs = <optimized out> old_cs = <optimized out> srv = <optimized out> reuse = <optimized out> err = <optimized out> #3 0x000056120eb990f8 in sess_update_stream_int (s=0x56120fa02650) at src/stream.c:886 conn_err = <optimized out> srv = 0x56120f989e80 si = 0x56120fa028c0 req = 0x56120fa02660 srv = <optimized out> si = <optimized out> req = <optimized out> conn_err = <optimized out> #4 process_stream (t=<optimized out>) at src/stream.c:2234 srv = <optimized out> s = 0x56120fa02650 sess = <optimized out> rqf_last = <optimized out> rpf_last = 2147483648 rq_prod_last = <optimized out> rq_cons_last = <optimized out> rp_cons_last = <optimized out> rp_prod_last = <optimized out> req_ana_back = <optimized out> req = 0x56120fa02660 res = 0x56120fa026a0 si_f = 0x56120fa02898 si_b = 0x56120fa028c0 #5 0x000056120ec2007d in process_runnable_tasks () at src/task.c:317 t = <optimized out> i = <optimized out> max_processed = <optimized out> local_tasks = {0x56120fa029b0, 0x7f586c04d340, 0x500000004, 0xb2, 0x0, 0x56120ec2527b <mux_pt_wake+91>, 0x500000004, 0xb2, 0x0, 0x7f586c529180, 0x2c80, 0x56120ec15513 <conn_fd_handler+259>, 0x0, 0xffffffff00000000, 0x7fff32892a50, 0x7fff32892a50} local_tasks_count = <optimized out> final_tasks_count = 0 #6 0x000056120ebce414 in run_poll_loop () at src/haproxy.c:2499 next = <optimized out> exp = <optimized out> next = <optimized out> exp = <optimized out> #7 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2569 ptif = <optimized out> ptdf = <optimized out> start_lock = 0 #8 0x000056120eb4212f in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3172 tids = <optimized out> threads = 0x56120f9861c0 i = <optimized out> old_sig = {__val = {2048, 48, 72057589742960643, 94635570839576, 31, 80, 18446744073709544224, 0, 206158430211, 0, 0, 472446402651, 511101108348, 0, 140017846675184, 140017846656064}} blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}} err = <optimized out> retry = <optimized out> limit = {rlim_cur = 4027, rlim_max = 4027} errmsg = "\000pression support (neither USE_ZLIB nor \000X%-x\342/6\375\000\000\000\000\000\000\000(-\211\062\377\177\000\000\350+\211\062\377\177\000\000\001\303\307\016\022V\000\000\201\000\000\000\000\000\000\000H,\211\062\377\177\000\000`C\225\017" pidfd = <optimized out> (gdb) core-1.8.26 <https://drive.google.com/file/d/18Y1Fm3nkmg9-6N-UqqdCzkynaMIuLANF/view?usp=drive_web> haproxy-1.8.26 <https://drive.google.com/file/d/159chxrfQR1AmSGCpO5mbrhGjLxbGNCIQ/view?usp=drive_web> -- Peter Statham Loadbalancer.org Ltd. www.loadbalancer.org <https://www.loadbalancer.org/?gclid=ES2017> <https://twitter.com/loadbalancerorg> <http://www.linkedin.com/company/3191352?trk=prof-exp-company-name> <https://www.loadbalancer.org/?category=company&post-name=overview&?gclid=ES2017> <https://www.loadbalancer.org/?gclid=ES2017> +1 888 867 9504 / +44 (0)330 380 1064 peter.stat...@loadbalancer.org LEAVE A REVIEW <http://collector.reviews.io/loadbalancer-org-inc-/new-review> | DEPLOYMENT GUIDES <https://www.loadbalancer.org/?category=resources&post-name=deployment-guides&?gclid=ES2017> | BLOG <https://www.loadbalancer.org/blog/?gclid=ES2017>
#!/usr/bin/python3 from socketserver import TCPServer, BaseRequestHandler from random import randint class Handler (BaseRequestHandler): def handle (self): self.request.sendall ('{}%\r\n'.format (randint (0, 1)).encode ()); with TCPServer (("0.0.0.0", 3333), Handler) as srv: srv.serve_forever ()
cfg
Description: Binary data