On Tue, Jun 12, 2018 at 04:00:25PM +0200, William Dauchy wrote:
> Hello William L,
>
> On Fri, Jun 08, 2018 at 04:31:30PM +0200, William Lallemand wrote:
> > That's great news!
> >
> > Here's the new patches. It shouldn't change anything to the fix, it only
> > changes the sigprocmask to pthread_sigmask.
>
> In fact, I now have a different but similar issue.
>
:(
> root 18547 3.2 1.3 986660 898844 ? Ss Jun08 182:12
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 2063 1903 1763 1445 14593 29663 4203 18290 -x /var/lib/haproxy/stats
> haproxy 14593 299 1.3 1251216 920480 ? Rsl Jun11 5882:01 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 14582 14463 -x /var/lib/haproxy/stats
> haproxy 18290 299 1.4 1265028 935288 ? Ssl Jun11 3425:51 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 18281 18271 18261 14593 -x /var/lib/haproxy/stats
> haproxy 29663 99.9 1.4 1258024 932796 ? Ssl Jun11 1063:08 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 29653 29644 18290 14593 -x /var/lib/haproxy/stats
> haproxy 4203 99.9 1.4 1258804 933216 ? Ssl Jun11 1009:27 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 4194 4182 18290 29663 14593 -x /var/lib/haproxy/stats
> haproxy 1445 25.9 1.4 1261680 929516 ? Ssl 13:51 0:42 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 1436 29663 4203 18290 14593 -x /var/lib/haproxy/stats
> haproxy 1763 18.9 1.4 1260500 931516 ? Ssl 13:52 0:15 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 1445 14593 29663 4203 18290 -x /var/lib/haproxy/stats
> haproxy 1903 25.0 1.4 1261472 931064 ? Ssl 13:53 0:14 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 1763 1445 14593 29663 4203 18290 -x /var/lib/haproxy/stats
> haproxy 2063 52.5 1.4 1259568 927916 ? Ssl 13:53 0:19 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 1903 1763 1445 14593 29663 4203 18290 -x /var/lib/haproxy/stats
> haproxy 2602 62.0 1.4 1262220 928776 ? Rsl 13:54 0:02 \_
> /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf
> 2063 1903 1763 1445 14593 29663 4203 18290 -x /var/lib/haproxy/stats
>
>
Those processes are still using a lot of CPU...
> # cat /proc/14593/status | grep Sig
> SigQ: 0/257120
> SigPnd: 0000000000000000
> SigBlk: 0000000000000800
> SigIgn: 0000000000001800
> SigCgt: 0000000180300205
>
> kill -USR1 14593 has no effect:
>
> # strace -ffff -p 14593
> strace: Process 14593 attached with 3 threads
> strace: [ Process PID=14595 runs in x32 mode. ]
This part is particularly interesting, I suppose you are not running in x32,
right?
I had this problem at some point but was never able to reproduce it...
We might find something interesting by looking further..
> [pid 14593] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=18547,
> si_uid=0} ---
> [pid 14593] rt_sigaction(SIGUSR1, {0x558357660020, [USR1],
> SA_RESTORER|SA_RESTART, 0x7f0e87671270}, {0x558357660020, [USR1],
> SA_RESTORER|SA_RESTART, 0x7f0e87671270}, 8) = 0
> [pid 14593] rt_sigreturn({mask=[USR2]}) = 7
At least you managed to strace when the process was seen as an x32 one, it
wasn't my case.
>
> however, the unix socket is on the correct process:
>
> # lsof | grep "haproxy/stats" ; ps auxwwf | grep haproxy
> haproxy 2602 haproxy 5u unix 0xffff880f902e8000 0t0
> 3333061798 /var/lib/haproxy/stats.18547.tmp
> haproxy 2602 2603 haproxy 5u unix 0xffff880f902e8000 0t0
> 3333061798 /var/lib/haproxy/stats.18547.tmp
> haproxy 2602 2604 haproxy 5u unix 0xffff880f902e8000 0t0
> 3333061798 /var/lib/haproxy/stats.18547.tmp
> haproxy 2602 2605 haproxy 5u unix 0xffff880f902e8000 0t0
> 3333061798 /var/lib/haproxy/stats.18547.tmp
>
> So it means, it does not cause any issue for the provisioner which talks
> to the correct process, however, they are remaining process.
Are they still delivering traffic?
> Should I start a different thread for that issue?
>
That's not necessary, thanks.
> it seems harder to reproduce, I got the issue ~2 days after pushing back.
>
> Thanks,
>
I'll try to reproduce this again...
--
William Lallemand