On 18.08.2021 04:14, Robert Mueller wrote:
> 
>> First, thanks for the patch.
>>
>> While the reuseport could cure (or hide if you will) the unbalancing you
>> see it makes sense to get better understanding what exactly is going on.
>>  So far we haven't seen such weird behaviour ourself neither received
>> reports about such uneven connections distribution among nginx workers.
>>
>> Any chances you have accept_mutex and/or multi_accept?  Any other ideas?
> 
> Unfortunately I'm not 100% sure what's causing it, but it's pretty easy for 
> us to reproduce even on our development machines. Just to show there's no 
> accept_mutex or multi_accept in our config.
> 
> ```
> # grep accept /etc/nginx/mail.conf
> # 
> ```
> 
> And here's what a cut down version of our config looks like.
> 
> ```
> worker_processes  auto;
> worker_shutdown_timeout 5m;
> 
> events {
>     use epoll;
>     worker_connections  65536;
> }
> ...
> mail {
>     auth_http http://unix:/var/run/nginx/mail_auth.sock:/nginx/;
>     imap_client_buffer  16k;
>     imap_capabilities "IMAP4" "IMAP4rev1" "LITERAL+" "ENABLE" "UIDPLUS" 
> "SASL-IR" "NAMESPACE" "CONDSTORE" "SORT" "LIST-EXTENDED" "QRESYNC" "MOVE" 
> "SPECIAL-USE" "CREATE-SPECIAL-USE" "IDLE";
>     ssl_session_cache shared:sslcache:50m;
>     ssl_session_timeout 30m;
> 
>     server {
>       listen 10.a.b.c:993 ssl reuseport;
>       auth_http_header "ServerHostname" "imap.foo";
>       ssl_prefer_server_ciphers on;
>       ssl_protocols ...
>       ssl_ciphers ...;
>       ssl_certificate      ...;
>       ssl_certificate_key  ...;
>       protocol imap;
>       proxy on;
>       proxy_timeout  1h;
>     }
> ```
> 
> With that on a development machine which has 4 vcpus we see:
> 
> ```
> # ps auxw | grep nginx | grep mail
> root        3839  0.0  0.0  68472  1372 ?        Ss   08:16   0:00 nginx: 
> master process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody      3841  0.0  0.0  95732  3572 ?        S    08:16   0:01 nginx: 
> worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody      3842  0.0  0.0  95732  3284 ?        S    08:16   0:01 nginx: 
> worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody      3843  0.0  0.0  95796  4096 ?        S    08:16   0:01 nginx: 
> worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody      3846  0.0  0.0  95732  3092 ?        S    08:16   0:01 nginx: 
> worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> ```
> 
> Now lets just create 1000 SSL connections and see how they get distributed 
> between those procs.
> 
> ```
> # perl -e 'use IO::Socket::SSL; for (1..1000) { push @s, 
> IO::Socket::SSL->new("imap.foo:993"); } print "done\n"; sleep 1000;'
> done
> ^Z
> [3]+  Stopped
> # for i in 3841 3842 3843 3846; do echo "$i - " `ls /proc/$i/fd | wc -l`; done
> 3841 -  335
> 3842 -  295
> 3843 -  293
> 3846 -  320
> ```
> 
> Reasonably even.
> 
> Now lets change `listen 10.a.b.c:993 ssl reuseport` to `listen 10.a.b.c:993 
> ssl` and restart.
> 
> ```
> # ps auxw | grep nginx | grep mail
> root      559885  0.0  0.0  68472  3104 ?        Ss   21:01   0:00 nginx: 
> master process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody    559886  0.0  0.3  95620 30448 ?        S    21:01   0:00 nginx: 
> worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody    559887  0.0  0.3  95620 30448 ?        S    21:01   0:00 nginx: 
> worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody    559888  0.0  0.3  95620 30448 ?        S    21:01   0:00 nginx: 
> worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody    559889  0.0  0.3  95620 30448 ?        S    21:01   0:00 nginx: 
> worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> # perl -e 'use IO::Socket::SSL; for (1..1000) { push @s, 
> IO::Socket::SSL->new("imap.foo:993"); } print "done\n"; sleep 1000;'
> done
> ^Z
> [5]+  Stopped
> # for i in 559886 559887 559888 559889; do echo "$i - " `ls /proc/$i/fd | wc 
> -l`; done
> 559886 -  1054
> 559887 -  57
> 559888 -  60
> 559889 -  57
> ```
> 
> And as you can see, a completely uneven distribution of connections between 
> processes! This doesn't just occur on our development machines either (e.g. 
> it's not related to the source IP or anything), it occurs on production 
> systems with connections arriving from real world customers and clients 
> scattered around the world.
> 
> This is a fairly standard debian buster distribution, though we use a back 
> ported newer kernel, and a recent version of nginx.
> 
> ```
> # uname -a
> Linux xyz 5.10.0-0.bpo.4-amd64 #1 SMP Debian 5.10.19-1~bpo10+1 (2021-03-13) 
> x86_64 GNU/Linux
> # /usr/local/nginx/sbin/nginx -v
> nginx version: nginx/1.20.1
> ```
> 
> As you can see, without the reuseport option, this causes severe scalability 
> problems for us.
> 
> Even without that though, it would just be nice to have some more consistency 
> of the `listen` options between http/stream/mail modules as well.
> 
This looks weird.

We'll try to reproduce this in our lab.  Thanks for the detailed script.

Maxim

-- 
Maxim Konovalov
_______________________________________________
nginx-devel mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx-devel

Reply via email to