On 2023/11/11 15:19, Stuart Henderson wrote:
> Excellent, that is very helpful.
> 
> Here's a simpler nginx.conf to reproduce. Note that the leak goes away
> if you don't use Connection: Upgrade.
> 
> Simple test tool:
> 
> pkg_add http_load
> echo http://127.0.0.1:8123/ > /tmp/urls
> http_load -rate 100 -seconds 10 /tmp/urls
> 
> 
> --------
> worker_processes  4;
> worker_rlimit_nofile 1024;
> events {
>       worker_connections  800;
> }
> 
> http {
>       server {
>               listen       127.0.0.1:8123;
>               server_name  localhost;
>               root         /var/www/htdocs;
> 
>               location / {
>                       proxy_pass https://sym.spacehopper.org/;

err actually don't use that one, pick something local :-)

>                       proxy_http_version 1.1;
> 
>                       proxy_set_header Connection "Upgrade";
> 
>                       #proxy_ssl_protocols TLSv1.2;
>                       proxy_ssl_protocols TLSv1.3;
>               }
>       }
> }
> --------
> 
> 
> On 2023/11/11 14:20, Tobias Fiebig wrote:
> > 
> > Moin,I ran through the experiments i had suggested. As you assumed, this is
> > indeed related to outbound TLS1.3 connections, specifically:
> > 
> > +---------------------------------------------------------------------+
> > | Config (syspatched OpenBSD 7.4)                         | Memleak?  |
> > +---------------------------------------------------------+-----------+
> > | From pkg, TLS1.3 for in and outbound                    | Yes       |
> > | From pkg, no TLS1.3 for inbound                         | Yes       |
> > | From pkg, no TLS1.3 for outbound                        | No        |
> > |                                                         |           |
> > | From ports, TLS1.3 for in and outbound, sub_http module | Yes       |
> > | From ports, no TLS1.3 for inbound, sub_http module      | Yes       |
> > | From ports, no TLS1.3 for outbound, sub_http module     | No        |
> > |                                                         |           |
> > | Selfbuild, TLS1.3 for in and outbound, sub_http module  | Yes       |
> > | Selfbuild, no TLS1.3 for inbound, sub_http module       | Yes       |
> > | Selfbuild, no TLS1.3 for outbound, sub_http module      | No        |
> > +---------------------------------------------------------+-----------+
> > 
> > I found a minimal configuration that reliably triggers the issue for a
> > standard nginx-1.24.0p0 from packages. You can find it here:
> > https://rincewind.home.aperture-labs.org/~tfiebig/malloc/reproduction/
> > 
> > The leak occurs just a few seconds after starting exec.py to send a
> > constant request rate of ~10 concurrent requests, i.e., the difference
> > in memory consumption becomes visible in two machines that are (apart
> > from using TLS1.3 for outbound proxy connections) identical.
> > 
> > The leak seems to occur linearly for the absolute number of requests
> > (1k requests ~= 20mb of memory).
> > 
> > I also see a notably higher CPU utilization if TLS1.3 is enabled for
> > outbound connections (~3-5x) given the same load. For example, when
> > running 100k requests against the test systems, i got (This may be due
> > to lacking CPU instructions for algorithms used in TLS1.3, though;
> > still figured it might be good to note):
> >  
> > https://rincewind.home.aperture-labs.org/~tfiebig/malloc/cpu_utilization_tls13.png
> > 
> > With the req/s being like this:
> > 
> > Non-leaking hosts:
> > #1 o74n1240-self-revp-noprx13.dus01.as59645.net: 
> >     Got 100000 in 309 seconds (323.62/s)
> > #1 o74n1240-pkg-revp-noprx13.dus01.as59645.net: 
> >     Got 100000 in 310 seconds (322.58/s)
> > #1 o74n1240-ports-revp-noprx13.dus01.as59645.net: 
> >     Got 100000 in 312 seconds (320.51/s)
> > 
> > Leaking hosts:
> > #1 o74n1240-self-revp-def.dus01.as59645.net: 
> >     Got 100000 in 346 seconds (289.02/s)
> > #1 o74n1240-ports-revp-nohttp13.dus01.as59645.net: 
> >     Got 100000 in 348 seconds (287.36/s)
> > #1 o74n1240-self-revp-nohttp13.dus01.as59645.net: 
> >     Got 100000 in 348 seconds (287.36/s)
> > #1 o74n1240-ports-revp-def.dus01.as59645.net: 
> >     Got 100000 in 351 seconds (284.90/s)
> > #1 o74n1240-pkg-revp-nohttp13.dus01.as59645.net: 
> >     Got 100000 in 377 seconds (265.25/s)
> > #1 o74n1240-pkg-revp-def.dus01.as59645.net: 
> >     Got 100000 in 382 seconds (261.78/s)
> > 
> > After this run, the leaking instances were at ~2.9GB, while the non-
> > leaking ones were at ~0.8GB active memory.
> > 
> > Do you have any suggestions what else i could test to better identify
> > what is causing this? If the config i referenced does not reproduce
> > this for you, i can also provide access to these test machines
> > (ephemeral boxes; No prod on there. ;-))
> > 
> > With best regards,
> > Tobias
> 

Reply via email to