Re: [PATCH] Improve performance when starting nginx with a lot of locations

2023-10-17 Thread Maxim Dounin
Hello!

On Tue, Oct 17, 2023 at 03:25:41PM +0400, Sergey Kandaurov wrote:

> > On 11 Oct 2023, at 02:56, Maxim Dounin  wrote:
> > 
> > Hello!
> > 
> > On Thu, Oct 05, 2023 at 10:51:26AM +0900, Yusuke Nojima wrote:
> > 
> >> Thank you for your comment!
> >> 
> >>> Could you please provide some more details about the use case,
> >>> such as how locations are used, and what is the approximate number
> >>> of locations being used?
> >> 
> >> Our team provides development environments to our company's engineers and 
> >> QA.
> >> In this environment, engineers and QA can freely create VMs and deploy
> >> applications on them.
> >> 
> >> Our nginx has the role of routing requests from the internet to all
> >> applications deployed in this environment.
> >> Additionally, it allows setting IP address restrictions, BASIC
> >> authentication, TLS client authentication, and other configurations
> >> for each application.
> >> 
> >> To implement these requirements, we generate a location for each 
> >> application.
> >> Currently, there are approximately 40,000 locations in our environment.
> > 
> > Thank you for the details.  Such configuration looks somewhat 
> > sub-optimal, but understandable for a development / test 
> > environment.  And certainly 40k locations is a lot for the sorting 
> > algorithm currently used.
> > 
> >>> Further, since each location contains configuration for
> >>> all modules, such configurations are expected to require a lot of
> >>> memory
> >> 
> >> Each of our nginx processes was consuming 5GB of memory in terms of
> >> resident size.
> >> This is not a problem as our servers have sufficient memory.
> >> 
> >>> Rather, I would suggest recursive top-bottom merge sort implementation
> >>> instead, which is much simpler and uses stack as temporary storage
> >>> (so it'll naturally die if there will be a queue which requires
> >>> more space for sorting than we have).
> 
> For the record, in my tests on M1 sorting 26^3 locations fit into
> 32k stack size (16k stack size renders the environment unusable).
> Judging by this (unscientific) test, running out of stack should
> not be a practicable issue.

The recursive function uses

ngx_queue_t  *q, tail;

on stack, that is, 3 pointers, so it would need more than 1000 
recursive calls to overflow 32k stack, which is not expected to 
happen unless there are more than 2^1000 locations (and it is 
certainly not possible to have more than 2^64 locations on 64-bit 
platforms).

For 26^3 locations, sorting would use maximum recursion depth of 
15, so would require something like 360 bytes of stack (which is 
certainly less than required by other nginx functions).

> >>> 
> >>> Please take a look if it works for you:
> >> 
> >> I think this implementation is simple and easy to understand.
> >> Although the number of traversals of the list will increase compared
> >> to bottom-up, it will not affect the order.
> >> I believe this will provide sufficient optimization in terms of speed.
> > 
> > Thanks for looking.  In my limited testing, it is slightly faster 
> > than your bottom-up implementation (and significantly faster than 
> > the existing insertion sort when many locations are used).
> > 
> > Below is the full patch (code unchanged), I'll commit it as soon 
> > as some other nginx developer will review it.
> > 
> > # HG changeset patch
> > # User Maxim Dounin 
> > # Date 1696977468 -10800
> > #  Wed Oct 11 01:37:48 2023 +0300
> > # Node ID b891840852ee5cc823eee1769d092ab50928919f
> > # Parent  cdda286c0f1b4b10f30d4eb6a63fefb9b8708ecc
> > Core: changed ngx_queue_sort() to use merge sort.
> > 
> > This improves nginx startup times significantly when using very large number
> > of locations due computational complexity of the sorting algorithm being
> 
> due to

Fixed, thnx.

> > used (insertion sort is O(n*n) on average, while merge sort is O(n*log(n))).
> 
> nitpicking: E2MANYPARENS
> 
> Using a colon might looks better (I don't mind though):
> used: insertion sort is O(n*n) on average, while merge sort is O(n*log(n)).

Thanks, changed.

> > In particular, in a test configuration with 20k locations total startup
> > time is reduced from 8 seconds to 0.9 seconds.
> > 
> > Prodded by Yusuke Nojima,
> > https://mailman.nginx.org/pipermail/nginx-devel/2023-September/NUL3Y2FPPFSHMPTFTL65KXSXNTX3NQMK.html
> 
> I like the change, please commit.
> 
> The thing to keep in mind is that it pessimizes the best case
> of sorted locations, which is O(n) with the insertion sort.
> Though given that both old and new algorithms give relatively
> good numbers for the best case (and it is hard to get a
> noticeable startup delay with very large number of locations
> in practice using merge sort), especially compared to the worst
> case of sorting perfectly reverse sorted locations with the
> insertion sort, I believe this is acceptable.

Sure, theoretically insertion sort's best case is slightly better, 
but I don't think it matters: if at all, the differ

Re: [PATCH] Improve performance when starting nginx with a lot of locations

2023-10-17 Thread Sergey Kandaurov

> On 11 Oct 2023, at 02:56, Maxim Dounin  wrote:
> 
> Hello!
> 
> On Thu, Oct 05, 2023 at 10:51:26AM +0900, Yusuke Nojima wrote:
> 
>> Thank you for your comment!
>> 
>>> Could you please provide some more details about the use case,
>>> such as how locations are used, and what is the approximate number
>>> of locations being used?
>> 
>> Our team provides development environments to our company's engineers and QA.
>> In this environment, engineers and QA can freely create VMs and deploy
>> applications on them.
>> 
>> Our nginx has the role of routing requests from the internet to all
>> applications deployed in this environment.
>> Additionally, it allows setting IP address restrictions, BASIC
>> authentication, TLS client authentication, and other configurations
>> for each application.
>> 
>> To implement these requirements, we generate a location for each application.
>> Currently, there are approximately 40,000 locations in our environment.
> 
> Thank you for the details.  Such configuration looks somewhat 
> sub-optimal, but understandable for a development / test 
> environment.  And certainly 40k locations is a lot for the sorting 
> algorithm currently used.
> 
>>> Further, since each location contains configuration for
>>> all modules, such configurations are expected to require a lot of
>>> memory
>> 
>> Each of our nginx processes was consuming 5GB of memory in terms of
>> resident size.
>> This is not a problem as our servers have sufficient memory.
>> 
>>> Rather, I would suggest recursive top-bottom merge sort implementation
>>> instead, which is much simpler and uses stack as temporary storage
>>> (so it'll naturally die if there will be a queue which requires
>>> more space for sorting than we have).

For the record, in my tests on M1 sorting 26^3 locations fit into
32k stack size (16k stack size renders the environment unusable).
Judging by this (unscientific) test, running out of stack should
not be a practicable issue.

>>> 
>>> Please take a look if it works for you:
>> 
>> I think this implementation is simple and easy to understand.
>> Although the number of traversals of the list will increase compared
>> to bottom-up, it will not affect the order.
>> I believe this will provide sufficient optimization in terms of speed.
> 
> Thanks for looking.  In my limited testing, it is slightly faster 
> than your bottom-up implementation (and significantly faster than 
> the existing insertion sort when many locations are used).
> 
> Below is the full patch (code unchanged), I'll commit it as soon 
> as some other nginx developer will review it.
> 
> # HG changeset patch
> # User Maxim Dounin 
> # Date 1696977468 -10800
> #  Wed Oct 11 01:37:48 2023 +0300
> # Node ID b891840852ee5cc823eee1769d092ab50928919f
> # Parent  cdda286c0f1b4b10f30d4eb6a63fefb9b8708ecc
> Core: changed ngx_queue_sort() to use merge sort.
> 
> This improves nginx startup times significantly when using very large number
> of locations due computational complexity of the sorting algorithm being

due to

> used (insertion sort is O(n*n) on average, while merge sort is O(n*log(n))).

nitpicking: E2MANYPARENS

Using a colon might looks better (I don't mind though):
used: insertion sort is O(n*n) on average, while merge sort is O(n*log(n)).

> In particular, in a test configuration with 20k locations total startup
> time is reduced from 8 seconds to 0.9 seconds.
> 
> Prodded by Yusuke Nojima,
> https://mailman.nginx.org/pipermail/nginx-devel/2023-September/NUL3Y2FPPFSHMPTFTL65KXSXNTX3NQMK.html

I like the change, please commit.

The thing to keep in mind is that it pessimizes the best case
of sorted locations, which is O(n) with the insertion sort.
Though given that both old and new algorithms give relatively
good numbers for the best case (and it is hard to get a
noticeable startup delay with very large number of locations
in practice using merge sort), especially compared to the worst
case of sorting perfectly reverse sorted locations with the
insertion sort, I believe this is acceptable.

> 
> diff --git a/src/core/ngx_queue.c b/src/core/ngx_queue.c
> --- a/src/core/ngx_queue.c
> +++ b/src/core/ngx_queue.c
> @@ -9,6 +9,10 @@
> #include 
> 
> 
> +static void ngx_queue_merge(ngx_queue_t *queue, ngx_queue_t *tail,
> +ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *));
> +
> +
> /*
>  * find the middle queue element if the queue has odd number of elements
>  * or the first element of the queue's second part otherwise
> @@ -45,13 +49,13 @@ ngx_queue_middle(ngx_queue_t *queue)
> }
> 
> 
> -/* the stable insertion sort */
> +/* the stable merge sort */
> 
> void
> ngx_queue_sort(ngx_queue_t *queue,
> ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *))
> {
> -ngx_queue_t  *q, *prev, *next;
> +ngx_queue_t  *q, tail;
> 
> q = ngx_queue_head(queue);
> 
> @@ -59,22 +63,44 @@ ngx_queue_sort(ngx_queue_t *queue,
> return;
> }
> 
> -for (q = ngx_queue_next(q); q != ngx_queue_sentin

Re: [PATCH] Improve performance when starting nginx with a lot of locations

2023-10-11 Thread Yusuke Nojima
Thank you for your comments and implementation!
I am looking forward to the acceleration of nginx startup with this patch.

2023年10月11日(水) 7:56 Maxim Dounin :
>
> Hello!
>
> On Thu, Oct 05, 2023 at 10:51:26AM +0900, Yusuke Nojima wrote:
>
> > Thank you for your comment!
> >
> > > Could you please provide some more details about the use case,
> > > such as how locations are used, and what is the approximate number
> > > of locations being used?
> >
> > Our team provides development environments to our company's engineers and 
> > QA.
> > In this environment, engineers and QA can freely create VMs and deploy
> > applications on them.
> >
> > Our nginx has the role of routing requests from the internet to all
> > applications deployed in this environment.
> > Additionally, it allows setting IP address restrictions, BASIC
> > authentication, TLS client authentication, and other configurations
> > for each application.
> >
> > To implement these requirements, we generate a location for each 
> > application.
> > Currently, there are approximately 40,000 locations in our environment.
>
> Thank you for the details.  Such configuration looks somewhat
> sub-optimal, but understandable for a development / test
> environment.  And certainly 40k locations is a lot for the sorting
> algorithm currently used.
>
> > > Further, since each location contains configuration for
> > > all modules, such configurations are expected to require a lot of
> > > memory
> >
> > Each of our nginx processes was consuming 5GB of memory in terms of
> > resident size.
> > This is not a problem as our servers have sufficient memory.
> >
> > > Rather, I would suggest recursive top-bottom merge sort implementation
> > > instead, which is much simpler and uses stack as temporary storage
> > > (so it'll naturally die if there will be a queue which requires
> > > more space for sorting than we have).
> > >
> > > Please take a look if it works for you:
> >
> > I think this implementation is simple and easy to understand.
> > Although the number of traversals of the list will increase compared
> > to bottom-up, it will not affect the order.
> > I believe this will provide sufficient optimization in terms of speed.
>
> Thanks for looking.  In my limited testing, it is slightly faster
> than your bottom-up implementation (and significantly faster than
> the existing insertion sort when many locations are used).
>
> Below is the full patch (code unchanged), I'll commit it as soon
> as some other nginx developer will review it.
>
> # HG changeset patch
> # User Maxim Dounin 
> # Date 1696977468 -10800
> #  Wed Oct 11 01:37:48 2023 +0300
> # Node ID b891840852ee5cc823eee1769d092ab50928919f
> # Parent  cdda286c0f1b4b10f30d4eb6a63fefb9b8708ecc
> Core: changed ngx_queue_sort() to use merge sort.
>
> This improves nginx startup times significantly when using very large number
> of locations due computational complexity of the sorting algorithm being
> used (insertion sort is O(n*n) on average, while merge sort is O(n*log(n))).
> In particular, in a test configuration with 20k locations total startup
> time is reduced from 8 seconds to 0.9 seconds.
>
> Prodded by Yusuke Nojima,
> https://mailman.nginx.org/pipermail/nginx-devel/2023-September/NUL3Y2FPPFSHMPTFTL65KXSXNTX3NQMK.html
>
> diff --git a/src/core/ngx_queue.c b/src/core/ngx_queue.c
> --- a/src/core/ngx_queue.c
> +++ b/src/core/ngx_queue.c
> @@ -9,6 +9,10 @@
>  #include 
>
>
> +static void ngx_queue_merge(ngx_queue_t *queue, ngx_queue_t *tail,
> +ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *));
> +
> +
>  /*
>   * find the middle queue element if the queue has odd number of elements
>   * or the first element of the queue's second part otherwise
> @@ -45,13 +49,13 @@ ngx_queue_middle(ngx_queue_t *queue)
>  }
>
>
> -/* the stable insertion sort */
> +/* the stable merge sort */
>
>  void
>  ngx_queue_sort(ngx_queue_t *queue,
>  ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *))
>  {
> -ngx_queue_t  *q, *prev, *next;
> +ngx_queue_t  *q, tail;
>
>  q = ngx_queue_head(queue);
>
> @@ -59,22 +63,44 @@ ngx_queue_sort(ngx_queue_t *queue,
>  return;
>  }
>
> -for (q = ngx_queue_next(q); q != ngx_queue_sentinel(queue); q = next) {
> +q = ngx_queue_middle(queue);
> +
> +ngx_queue_split(queue, q, &tail);
> +
> +ngx_queue_sort(queue, cmp);
> +ngx_queue_sort(&tail, cmp);
> +
> +ngx_queue_merge(queue, &tail, cmp);
> +}
>
> -prev = ngx_queue_prev(q);
> -next = ngx_queue_next(q);
>
> -ngx_queue_remove(q);
> +static void
> +ngx_queue_merge(ngx_queue_t *queue, ngx_queue_t *tail,
> +ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *))
> +{
> +ngx_queue_t  *q1, *q2;
> +
> +q1 = ngx_queue_head(queue);
> +q2 = ngx_queue_head(tail);
>
> -do {
> -if (cmp(prev, q) <= 0) {
> -break;
> -}
> +for ( ;; ) {
> +if (q1 == ngx_queue_sentinel(queue)) 

Re: [PATCH] Improve performance when starting nginx with a lot of locations

2023-10-10 Thread Maxim Dounin
Hello!

On Thu, Oct 05, 2023 at 10:51:26AM +0900, Yusuke Nojima wrote:

> Thank you for your comment!
> 
> > Could you please provide some more details about the use case,
> > such as how locations are used, and what is the approximate number
> > of locations being used?
> 
> Our team provides development environments to our company's engineers and QA.
> In this environment, engineers and QA can freely create VMs and deploy
> applications on them.
> 
> Our nginx has the role of routing requests from the internet to all
> applications deployed in this environment.
> Additionally, it allows setting IP address restrictions, BASIC
> authentication, TLS client authentication, and other configurations
> for each application.
> 
> To implement these requirements, we generate a location for each application.
> Currently, there are approximately 40,000 locations in our environment.

Thank you for the details.  Such configuration looks somewhat 
sub-optimal, but understandable for a development / test 
environment.  And certainly 40k locations is a lot for the sorting 
algorithm currently used.

> > Further, since each location contains configuration for
> > all modules, such configurations are expected to require a lot of
> > memory
> 
> Each of our nginx processes was consuming 5GB of memory in terms of
> resident size.
> This is not a problem as our servers have sufficient memory.
> 
> > Rather, I would suggest recursive top-bottom merge sort implementation
> > instead, which is much simpler and uses stack as temporary storage
> > (so it'll naturally die if there will be a queue which requires
> > more space for sorting than we have).
> >
> > Please take a look if it works for you:
> 
> I think this implementation is simple and easy to understand.
> Although the number of traversals of the list will increase compared
> to bottom-up, it will not affect the order.
> I believe this will provide sufficient optimization in terms of speed.

Thanks for looking.  In my limited testing, it is slightly faster 
than your bottom-up implementation (and significantly faster than 
the existing insertion sort when many locations are used).

Below is the full patch (code unchanged), I'll commit it as soon 
as some other nginx developer will review it.

# HG changeset patch
# User Maxim Dounin 
# Date 1696977468 -10800
#  Wed Oct 11 01:37:48 2023 +0300
# Node ID b891840852ee5cc823eee1769d092ab50928919f
# Parent  cdda286c0f1b4b10f30d4eb6a63fefb9b8708ecc
Core: changed ngx_queue_sort() to use merge sort.

This improves nginx startup times significantly when using very large number
of locations due computational complexity of the sorting algorithm being
used (insertion sort is O(n*n) on average, while merge sort is O(n*log(n))).
In particular, in a test configuration with 20k locations total startup
time is reduced from 8 seconds to 0.9 seconds.

Prodded by Yusuke Nojima,
https://mailman.nginx.org/pipermail/nginx-devel/2023-September/NUL3Y2FPPFSHMPTFTL65KXSXNTX3NQMK.html

diff --git a/src/core/ngx_queue.c b/src/core/ngx_queue.c
--- a/src/core/ngx_queue.c
+++ b/src/core/ngx_queue.c
@@ -9,6 +9,10 @@
 #include 
 
 
+static void ngx_queue_merge(ngx_queue_t *queue, ngx_queue_t *tail,
+ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *));
+
+
 /*
  * find the middle queue element if the queue has odd number of elements
  * or the first element of the queue's second part otherwise
@@ -45,13 +49,13 @@ ngx_queue_middle(ngx_queue_t *queue)
 }
 
 
-/* the stable insertion sort */
+/* the stable merge sort */
 
 void
 ngx_queue_sort(ngx_queue_t *queue,
 ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *))
 {
-ngx_queue_t  *q, *prev, *next;
+ngx_queue_t  *q, tail;
 
 q = ngx_queue_head(queue);
 
@@ -59,22 +63,44 @@ ngx_queue_sort(ngx_queue_t *queue,
 return;
 }
 
-for (q = ngx_queue_next(q); q != ngx_queue_sentinel(queue); q = next) {
+q = ngx_queue_middle(queue);
+
+ngx_queue_split(queue, q, &tail);
+
+ngx_queue_sort(queue, cmp);
+ngx_queue_sort(&tail, cmp);
+
+ngx_queue_merge(queue, &tail, cmp);
+}
 
-prev = ngx_queue_prev(q);
-next = ngx_queue_next(q);
 
-ngx_queue_remove(q);
+static void
+ngx_queue_merge(ngx_queue_t *queue, ngx_queue_t *tail,
+ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *))
+{
+ngx_queue_t  *q1, *q2;
+
+q1 = ngx_queue_head(queue);
+q2 = ngx_queue_head(tail);
 
-do {
-if (cmp(prev, q) <= 0) {
-break;
-}
+for ( ;; ) {
+if (q1 == ngx_queue_sentinel(queue)) {
+ngx_queue_add(queue, tail);
+break;
+}
+
+if (q2 == ngx_queue_sentinel(tail)) {
+break;
+}
 
-prev = ngx_queue_prev(prev);
+if (cmp(q1, q2) <= 0) {
+q1 = ngx_queue_next(q1);
+continue;
+}
 
-} while (prev != ngx_queue_sentinel(queue));
+ngx_queue_remove(q2);
+   

Re: [PATCH] Improve performance when starting nginx with a lot of locations

2023-10-05 Thread Yusuke Nojima
> BASIC auth is known to be CPU consuming as well, hash is calculated on every 
> http request.
> what is a ratio of authenticated requests ? do you see high CPU consumption ?

First of all, this patch is related to the startup time of nginx and
not to the processing time per request.
The processing speed per request is sufficiently reasonable for our use case.

In our environment, requests with BASIC authentication are very rare.
Most users prefer the simpler IP address restriction method.


2023年10月5日(木) 15:17 Илья Шипицин :
>
>
>
> чт, 5 окт. 2023 г. в 03:51, Yusuke Nojima :
>>
>> Thank you for your comment!
>>
>> > Could you please provide some more details about the use case,
>> > such as how locations are used, and what is the approximate number
>> > of locations being used?
>>
>> Our team provides development environments to our company's engineers and QA.
>> In this environment, engineers and QA can freely create VMs and deploy
>> applications on them.
>>
>> Our nginx has the role of routing requests from the internet to all
>> applications deployed in this environment.
>> Additionally, it allows setting IP address restrictions, BASIC
>> authentication, TLS client authentication, and other configurations
>> for each application.
>
>
> BASIC auth is known to be CPU consuming as well, hash is calculated on every 
> http request.
> what is a ratio of authenticated requests ? do you see high CPU consumption ?
>
>>
>>
>> To implement these requirements, we generate a location for each application.
>> Currently, there are approximately 40,000 locations in our environment.
>>
>> > Further, since each location contains configuration for
>> > all modules, such configurations are expected to require a lot of
>> > memory
>>
>> Each of our nginx processes was consuming 5GB of memory in terms of
>> resident size.
>> This is not a problem as our servers have sufficient memory.
>>
>> > Rather, I would suggest recursive top-bottom merge sort implementation
>> > instead, which is much simpler and uses stack as temporary storage
>> > (so it'll naturally die if there will be a queue which requires
>> > more space for sorting than we have).
>> >
>> > Please take a look if it works for you:
>>
>> I think this implementation is simple and easy to understand.
>> Although the number of traversals of the list will increase compared
>> to bottom-up, it will not affect the order.
>> I believe this will provide sufficient optimization in terms of speed.
>>
>>
>> 2023年9月30日(土) 12:38 Maxim Dounin :
>> >
>> > Hello!
>> >
>> > On Fri, Sep 22, 2023 at 03:58:41PM +0900, Yusuke Nojima wrote:
>> >
>> > > # HG changeset patch
>> > > # User Yusuke Nojima 
>> > > # Date 1679555707 -32400
>> > > #  Thu Mar 23 16:15:07 2023 +0900
>> > > # Node ID 6aac98fb135e47ca9cf7ad7d780cf4a10e9aa55c
>> > > # Parent  8771d35d55d0a2b1cefaab04401d6f837f5a05a2
>> > > Improve performance when starting nginx with a lot of locations
>> > >
>> > > Our team has a configuration file with a very large number of
>> > > locations, and we found that starting nginx with this file takes an
>> > > unacceptable amount of time. After investigating the issue, we
>> > > discovered that the root cause of the long startup time is the sorting
>> > > of the location list.
>> >
>> > Interesting.
>> >
>> > Could you please provide some more details about the use case,
>> > such as how locations are used, and what is the approximate number
>> > of locations being used?
>> >
>> > In my practice, it is extremely uncommon to use more than 1k-10k
>> > prefix locations (and even these numbers are huge for normal
>> > setups).  Further, since each location contains configuration for
>> > all modules, such configurations are expected to require a lot of
>> > memory (currently about 5-10KB per empty location, so about
>> > 50-100MB per 10k locations, and 0.5-1G per 100k locations).
>> > Accordingly, using other approaches such as map (assuming exact
>> > match is actually needed) might be beneficial regardless of the
>> > sorting costs.
>> >
>> > Nevertheless, swapping the sorting algorithm to a faster one looks
>> > like an obvious improvement.
>> >
>> > >
>> > > Currently, the sorting algorithm used in nginx is insertion sort,
>> > > which requires O(n^2) time for n locations. We have modified the
>> > > sorting algorithm to use merge sort instead, which has a time
>> > > complexity of O(n log n).
>> > >
>> > > We have tested the modified code using micro-benchmarks and confirmed
>> > > that the new algorithm improves nginx startup time significantly
>> > > (shown below). We believe that this change would be valuable for other
>> > > users who are experiencing similar issues.
>> > >
>> > >
>> > > Table: nginx startup time in seconds
>> > >
>> > > n current patched
>> > > 2000 0.033 0.018
>> > > 4000 0.047 0.028
>> > > 6000 0.062 0.038
>> > > 8000 0.079 0.050
>> > > 1 0.091 0.065
>> > > 12000 0.367 0.081
>> > > 14000 0.683 0.086
>> > > 16000 0.899 0.097
>> > > 18000 1.145 0.110

Re: [PATCH] Improve performance when starting nginx with a lot of locations

2023-10-04 Thread Илья Шипицин
чт, 5 окт. 2023 г. в 03:51, Yusuke Nojima :

> Thank you for your comment!
>
> > Could you please provide some more details about the use case,
> > such as how locations are used, and what is the approximate number
> > of locations being used?
>
> Our team provides development environments to our company's engineers and
> QA.
> In this environment, engineers and QA can freely create VMs and deploy
> applications on them.
>
> Our nginx has the role of routing requests from the internet to all
> applications deployed in this environment.
> Additionally, it allows setting IP address restrictions, BASIC
> authentication, TLS client authentication, and other configurations
> for each application.
>

BASIC auth is known to be CPU consuming as well, hash is calculated on
every http request.
what is a ratio of authenticated requests ? do you see high CPU consumption
?


>
> To implement these requirements, we generate a location for each
> application.
> Currently, there are approximately 40,000 locations in our environment.
>
> > Further, since each location contains configuration for
> > all modules, such configurations are expected to require a lot of
> > memory
>
> Each of our nginx processes was consuming 5GB of memory in terms of
> resident size.
> This is not a problem as our servers have sufficient memory.
>
> > Rather, I would suggest recursive top-bottom merge sort implementation
> > instead, which is much simpler and uses stack as temporary storage
> > (so it'll naturally die if there will be a queue which requires
> > more space for sorting than we have).
> >
> > Please take a look if it works for you:
>
> I think this implementation is simple and easy to understand.
> Although the number of traversals of the list will increase compared
> to bottom-up, it will not affect the order.
> I believe this will provide sufficient optimization in terms of speed.
>
>
> 2023年9月30日(土) 12:38 Maxim Dounin :
> >
> > Hello!
> >
> > On Fri, Sep 22, 2023 at 03:58:41PM +0900, Yusuke Nojima wrote:
> >
> > > # HG changeset patch
> > > # User Yusuke Nojima 
> > > # Date 1679555707 -32400
> > > #  Thu Mar 23 16:15:07 2023 +0900
> > > # Node ID 6aac98fb135e47ca9cf7ad7d780cf4a10e9aa55c
> > > # Parent  8771d35d55d0a2b1cefaab04401d6f837f5a05a2
> > > Improve performance when starting nginx with a lot of locations
> > >
> > > Our team has a configuration file with a very large number of
> > > locations, and we found that starting nginx with this file takes an
> > > unacceptable amount of time. After investigating the issue, we
> > > discovered that the root cause of the long startup time is the sorting
> > > of the location list.
> >
> > Interesting.
> >
> > Could you please provide some more details about the use case,
> > such as how locations are used, and what is the approximate number
> > of locations being used?
> >
> > In my practice, it is extremely uncommon to use more than 1k-10k
> > prefix locations (and even these numbers are huge for normal
> > setups).  Further, since each location contains configuration for
> > all modules, such configurations are expected to require a lot of
> > memory (currently about 5-10KB per empty location, so about
> > 50-100MB per 10k locations, and 0.5-1G per 100k locations).
> > Accordingly, using other approaches such as map (assuming exact
> > match is actually needed) might be beneficial regardless of the
> > sorting costs.
> >
> > Nevertheless, swapping the sorting algorithm to a faster one looks
> > like an obvious improvement.
> >
> > >
> > > Currently, the sorting algorithm used in nginx is insertion sort,
> > > which requires O(n^2) time for n locations. We have modified the
> > > sorting algorithm to use merge sort instead, which has a time
> > > complexity of O(n log n).
> > >
> > > We have tested the modified code using micro-benchmarks and confirmed
> > > that the new algorithm improves nginx startup time significantly
> > > (shown below). We believe that this change would be valuable for other
> > > users who are experiencing similar issues.
> > >
> > >
> > > Table: nginx startup time in seconds
> > >
> > > n current patched
> > > 2000 0.033 0.018
> > > 4000 0.047 0.028
> > > 6000 0.062 0.038
> > > 8000 0.079 0.050
> > > 1 0.091 0.065
> > > 12000 0.367 0.081
> > > 14000 0.683 0.086
> > > 16000 0.899 0.097
> > > 18000 1.145 0.110
> > > 2 1.449 0.122
> > > 22000 1.650 0.137
> > > 24000 2.155 0.151
> > > 26000 3.096 0.155
> > > 28000 3.711 0.168
> > > 3 3.539 0.184
> > > 32000 3.980 0.193
> > > 34000 4.543 0.208
> > > 36000 4.349 0.217
> > > 38000 5.021 0.229
> > > 4 4.918 0.245
> > > 42000 4.835 0.256
> > > 44000 5.159 0.262
> > > 46000 5.802 0.331
> > > 48000 6.205 0.295
> > > 5 5.701 0.308
> > > 52000 5.992 0.335
> > > 54000 6.561 0.323
> > > 56000 6.856 0.333
> > > 58000 6.515 0.347
> > > 6 7.051 0.359
> > > 62000 6.956 0.377
> > > 64000 7.376 0.376
> > > 66000 7.506 0.404
> > > 68000 7.292 0.407
> > > 7 7.422 0.461
> > > 72000 10.090 0.443
> >

Re: [PATCH] Improve performance when starting nginx with a lot of locations

2023-10-04 Thread Yusuke Nojima
Thank you for your comment!

> Could you please provide some more details about the use case,
> such as how locations are used, and what is the approximate number
> of locations being used?

Our team provides development environments to our company's engineers and QA.
In this environment, engineers and QA can freely create VMs and deploy
applications on them.

Our nginx has the role of routing requests from the internet to all
applications deployed in this environment.
Additionally, it allows setting IP address restrictions, BASIC
authentication, TLS client authentication, and other configurations
for each application.

To implement these requirements, we generate a location for each application.
Currently, there are approximately 40,000 locations in our environment.

> Further, since each location contains configuration for
> all modules, such configurations are expected to require a lot of
> memory

Each of our nginx processes was consuming 5GB of memory in terms of
resident size.
This is not a problem as our servers have sufficient memory.

> Rather, I would suggest recursive top-bottom merge sort implementation
> instead, which is much simpler and uses stack as temporary storage
> (so it'll naturally die if there will be a queue which requires
> more space for sorting than we have).
>
> Please take a look if it works for you:

I think this implementation is simple and easy to understand.
Although the number of traversals of the list will increase compared
to bottom-up, it will not affect the order.
I believe this will provide sufficient optimization in terms of speed.


2023年9月30日(土) 12:38 Maxim Dounin :
>
> Hello!
>
> On Fri, Sep 22, 2023 at 03:58:41PM +0900, Yusuke Nojima wrote:
>
> > # HG changeset patch
> > # User Yusuke Nojima 
> > # Date 1679555707 -32400
> > #  Thu Mar 23 16:15:07 2023 +0900
> > # Node ID 6aac98fb135e47ca9cf7ad7d780cf4a10e9aa55c
> > # Parent  8771d35d55d0a2b1cefaab04401d6f837f5a05a2
> > Improve performance when starting nginx with a lot of locations
> >
> > Our team has a configuration file with a very large number of
> > locations, and we found that starting nginx with this file takes an
> > unacceptable amount of time. After investigating the issue, we
> > discovered that the root cause of the long startup time is the sorting
> > of the location list.
>
> Interesting.
>
> Could you please provide some more details about the use case,
> such as how locations are used, and what is the approximate number
> of locations being used?
>
> In my practice, it is extremely uncommon to use more than 1k-10k
> prefix locations (and even these numbers are huge for normal
> setups).  Further, since each location contains configuration for
> all modules, such configurations are expected to require a lot of
> memory (currently about 5-10KB per empty location, so about
> 50-100MB per 10k locations, and 0.5-1G per 100k locations).
> Accordingly, using other approaches such as map (assuming exact
> match is actually needed) might be beneficial regardless of the
> sorting costs.
>
> Nevertheless, swapping the sorting algorithm to a faster one looks
> like an obvious improvement.
>
> >
> > Currently, the sorting algorithm used in nginx is insertion sort,
> > which requires O(n^2) time for n locations. We have modified the
> > sorting algorithm to use merge sort instead, which has a time
> > complexity of O(n log n).
> >
> > We have tested the modified code using micro-benchmarks and confirmed
> > that the new algorithm improves nginx startup time significantly
> > (shown below). We believe that this change would be valuable for other
> > users who are experiencing similar issues.
> >
> >
> > Table: nginx startup time in seconds
> >
> > n current patched
> > 2000 0.033 0.018
> > 4000 0.047 0.028
> > 6000 0.062 0.038
> > 8000 0.079 0.050
> > 1 0.091 0.065
> > 12000 0.367 0.081
> > 14000 0.683 0.086
> > 16000 0.899 0.097
> > 18000 1.145 0.110
> > 2 1.449 0.122
> > 22000 1.650 0.137
> > 24000 2.155 0.151
> > 26000 3.096 0.155
> > 28000 3.711 0.168
> > 3 3.539 0.184
> > 32000 3.980 0.193
> > 34000 4.543 0.208
> > 36000 4.349 0.217
> > 38000 5.021 0.229
> > 4 4.918 0.245
> > 42000 4.835 0.256
> > 44000 5.159 0.262
> > 46000 5.802 0.331
> > 48000 6.205 0.295
> > 5 5.701 0.308
> > 52000 5.992 0.335
> > 54000 6.561 0.323
> > 56000 6.856 0.333
> > 58000 6.515 0.347
> > 6 7.051 0.359
> > 62000 6.956 0.377
> > 64000 7.376 0.376
> > 66000 7.506 0.404
> > 68000 7.292 0.407
> > 7 7.422 0.461
> > 72000 10.090 0.443
> > 74000 18.505 0.463
> > 76000 11.857 0.462
> > 78000 9.752 0.470
> > 8 12.485 0.481
> > 82000 11.027 0.498
> > 84000 9.804 0.523
> > 86000 8.482 0.515
> > 88000 9.838 0.560
> > 9 12.341 0.546
> > 92000 13.881 0.648
> > 94000 8.309 0.635
> > 96000 8.854 0.650
> > 98000 12.871 0.674
> > 10 8.261 0.698
>
> This probably can be reduced to something like 3-5 data points.
>
> >
> > diff -r 8771d35d55d0 -r 6aac98fb135e src/core/ngx_queue.c
> > --- a/src/core/ngx_q

Re: [PATCH] Improve performance when starting nginx with a lot of locations

2023-09-29 Thread Maxim Dounin
Hello!

On Fri, Sep 22, 2023 at 03:58:41PM +0900, Yusuke Nojima wrote:

> # HG changeset patch
> # User Yusuke Nojima 
> # Date 1679555707 -32400
> #  Thu Mar 23 16:15:07 2023 +0900
> # Node ID 6aac98fb135e47ca9cf7ad7d780cf4a10e9aa55c
> # Parent  8771d35d55d0a2b1cefaab04401d6f837f5a05a2
> Improve performance when starting nginx with a lot of locations
> 
> Our team has a configuration file with a very large number of
> locations, and we found that starting nginx with this file takes an
> unacceptable amount of time. After investigating the issue, we
> discovered that the root cause of the long startup time is the sorting
> of the location list.

Interesting.

Could you please provide some more details about the use case, 
such as how locations are used, and what is the approximate number 
of locations being used?

In my practice, it is extremely uncommon to use more than 1k-10k 
prefix locations (and even these numbers are huge for normal 
setups).  Further, since each location contains configuration for 
all modules, such configurations are expected to require a lot of 
memory (currently about 5-10KB per empty location, so about 
50-100MB per 10k locations, and 0.5-1G per 100k locations).  
Accordingly, using other approaches such as map (assuming exact 
match is actually needed) might be beneficial regardless of the 
sorting costs.

Nevertheless, swapping the sorting algorithm to a faster one looks 
like an obvious improvement.

> 
> Currently, the sorting algorithm used in nginx is insertion sort,
> which requires O(n^2) time for n locations. We have modified the
> sorting algorithm to use merge sort instead, which has a time
> complexity of O(n log n).
> 
> We have tested the modified code using micro-benchmarks and confirmed
> that the new algorithm improves nginx startup time significantly
> (shown below). We believe that this change would be valuable for other
> users who are experiencing similar issues.
> 
> 
> Table: nginx startup time in seconds
> 
> n current patched
> 2000 0.033 0.018
> 4000 0.047 0.028
> 6000 0.062 0.038
> 8000 0.079 0.050
> 1 0.091 0.065
> 12000 0.367 0.081
> 14000 0.683 0.086
> 16000 0.899 0.097
> 18000 1.145 0.110
> 2 1.449 0.122
> 22000 1.650 0.137
> 24000 2.155 0.151
> 26000 3.096 0.155
> 28000 3.711 0.168
> 3 3.539 0.184
> 32000 3.980 0.193
> 34000 4.543 0.208
> 36000 4.349 0.217
> 38000 5.021 0.229
> 4 4.918 0.245
> 42000 4.835 0.256
> 44000 5.159 0.262
> 46000 5.802 0.331
> 48000 6.205 0.295
> 5 5.701 0.308
> 52000 5.992 0.335
> 54000 6.561 0.323
> 56000 6.856 0.333
> 58000 6.515 0.347
> 6 7.051 0.359
> 62000 6.956 0.377
> 64000 7.376 0.376
> 66000 7.506 0.404
> 68000 7.292 0.407
> 7 7.422 0.461
> 72000 10.090 0.443
> 74000 18.505 0.463
> 76000 11.857 0.462
> 78000 9.752 0.470
> 8 12.485 0.481
> 82000 11.027 0.498
> 84000 9.804 0.523
> 86000 8.482 0.515
> 88000 9.838 0.560
> 9 12.341 0.546
> 92000 13.881 0.648
> 94000 8.309 0.635
> 96000 8.854 0.650
> 98000 12.871 0.674
> 10 8.261 0.698

This probably can be reduced to something like 3-5 data points.

> 
> diff -r 8771d35d55d0 -r 6aac98fb135e src/core/ngx_queue.c
> --- a/src/core/ngx_queue.c Fri Mar 10 07:43:50 2023 +0300
> +++ b/src/core/ngx_queue.c Thu Mar 23 16:15:07 2023 +0900
> @@ -45,36 +45,103 @@
>  }
> 
> 
> -/* the stable insertion sort */
> +/* merge queue2 into queue1. queue2 becomes empty after merge. */
> +
> +static void
> +ngx_queue_merge(ngx_queue_t *queue1, ngx_queue_t *queue2,
> +ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *))
> +{
> +ngx_queue_t *p1, *p2;

Nitpicking: there are various style issues here and in other places.

> +
> +p1 = ngx_queue_head(queue1);
> +p2 = ngx_queue_head(queue2);
> +
> +while (p1 != ngx_queue_sentinel(queue1)
> +&& p2 != ngx_queue_sentinel(queue2)) {
> +
> +if (cmp(p1, p2) > 0) {
> +ngx_queue_t *next, *prev;
> +
> +next = ngx_queue_next(p2);
> +ngx_queue_remove(p2);
> +prev = ngx_queue_prev(p1);
> +ngx_queue_insert_after(prev, p2);
> +p2 = next;

Nitpicking: there is no need to preserve "next" here, since p2 is 
always the head of queue2 and, and the next element can be 
obtained by ngx_queue_head(queue2).

Also, instead of obtaining "prev" and using 
ngx_queue_insert_after() it would be easier to use 
ngx_queue_insert_before().  It is not currently defined, but it is 
trivial to define one: it is an alias to ngx_queue_insert_tail(), 
much like ngx_queue_insert_after() is an alias to 
ngx_queue_insert_head().

> +} else {
> +p1 = ngx_queue_next(p1);
> +}
> +}
> +if (p2 != ngx_queue_sentinel(queue2)) {
> +ngx_queue_add(queue1, queue2);
> +ngx_queue_init(queue2);
> +}
> +}
> +
> +
> +/* move all elements from src to dest. dest should be empty before call. */
> +
> +static void
> +ngx_queue_move(ngx_queue_t *dest, ngx_queue_t *src)
> +{
> +*dest = *src;
>

[PATCH] Improve performance when starting nginx with a lot of locations

2023-09-21 Thread Yusuke Nojima
# HG changeset patch
# User Yusuke Nojima 
# Date 1679555707 -32400
#  Thu Mar 23 16:15:07 2023 +0900
# Node ID 6aac98fb135e47ca9cf7ad7d780cf4a10e9aa55c
# Parent  8771d35d55d0a2b1cefaab04401d6f837f5a05a2
Improve performance when starting nginx with a lot of locations

Our team has a configuration file with a very large number of
locations, and we found that starting nginx with this file takes an
unacceptable amount of time. After investigating the issue, we
discovered that the root cause of the long startup time is the sorting
of the location list.

Currently, the sorting algorithm used in nginx is insertion sort,
which requires O(n^2) time for n locations. We have modified the
sorting algorithm to use merge sort instead, which has a time
complexity of O(n log n).

We have tested the modified code using micro-benchmarks and confirmed
that the new algorithm improves nginx startup time significantly
(shown below). We believe that this change would be valuable for other
users who are experiencing similar issues.


Table: nginx startup time in seconds

n current patched
2000 0.033 0.018
4000 0.047 0.028
6000 0.062 0.038
8000 0.079 0.050
1 0.091 0.065
12000 0.367 0.081
14000 0.683 0.086
16000 0.899 0.097
18000 1.145 0.110
2 1.449 0.122
22000 1.650 0.137
24000 2.155 0.151
26000 3.096 0.155
28000 3.711 0.168
3 3.539 0.184
32000 3.980 0.193
34000 4.543 0.208
36000 4.349 0.217
38000 5.021 0.229
4 4.918 0.245
42000 4.835 0.256
44000 5.159 0.262
46000 5.802 0.331
48000 6.205 0.295
5 5.701 0.308
52000 5.992 0.335
54000 6.561 0.323
56000 6.856 0.333
58000 6.515 0.347
6 7.051 0.359
62000 6.956 0.377
64000 7.376 0.376
66000 7.506 0.404
68000 7.292 0.407
7 7.422 0.461
72000 10.090 0.443
74000 18.505 0.463
76000 11.857 0.462
78000 9.752 0.470
8 12.485 0.481
82000 11.027 0.498
84000 9.804 0.523
86000 8.482 0.515
88000 9.838 0.560
9 12.341 0.546
92000 13.881 0.648
94000 8.309 0.635
96000 8.854 0.650
98000 12.871 0.674
10 8.261 0.698

diff -r 8771d35d55d0 -r 6aac98fb135e src/core/ngx_queue.c
--- a/src/core/ngx_queue.c Fri Mar 10 07:43:50 2023 +0300
+++ b/src/core/ngx_queue.c Thu Mar 23 16:15:07 2023 +0900
@@ -45,36 +45,103 @@
 }


-/* the stable insertion sort */
+/* merge queue2 into queue1. queue2 becomes empty after merge. */
+
+static void
+ngx_queue_merge(ngx_queue_t *queue1, ngx_queue_t *queue2,
+ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *))
+{
+ngx_queue_t *p1, *p2;
+
+p1 = ngx_queue_head(queue1);
+p2 = ngx_queue_head(queue2);
+
+while (p1 != ngx_queue_sentinel(queue1)
+&& p2 != ngx_queue_sentinel(queue2)) {
+
+if (cmp(p1, p2) > 0) {
+ngx_queue_t *next, *prev;
+
+next = ngx_queue_next(p2);
+ngx_queue_remove(p2);
+prev = ngx_queue_prev(p1);
+ngx_queue_insert_after(prev, p2);
+p2 = next;
+} else {
+p1 = ngx_queue_next(p1);
+}
+}
+if (p2 != ngx_queue_sentinel(queue2)) {
+ngx_queue_add(queue1, queue2);
+ngx_queue_init(queue2);
+}
+}
+
+
+/* move all elements from src to dest. dest should be empty before call. */
+
+static void
+ngx_queue_move(ngx_queue_t *dest, ngx_queue_t *src)
+{
+*dest = *src;
+ngx_queue_init(src);
+
+if (dest->next == src) {
+dest->next = dest;
+} else {
+dest->next->prev = dest;
+}
+if (dest->prev == src) {
+dest->prev = dest;
+} else {
+dest->prev->next = dest;
+}
+}
+
+
+/* the stable merge sort */

 void
 ngx_queue_sort(ngx_queue_t *queue,
 ngx_int_t (*cmp)(const ngx_queue_t *, const ngx_queue_t *))
 {
-ngx_queue_t  *q, *prev, *next;
+ngx_queue_t merged[64], *p, *last;

-q = ngx_queue_head(queue);
-
-if (q == ngx_queue_last(queue)) {
+if (ngx_queue_head(queue) == ngx_queue_last(queue)) {
 return;
 }

-for (q = ngx_queue_next(q); q != ngx_queue_sentinel(queue); q = next) {
+last = merged;

-prev = ngx_queue_prev(q);
-next = ngx_queue_next(q);
+while (!ngx_queue_empty(queue)) {
+/*
+ * Loop invariant:
+ *   merged[i] must have exactly 0 or 2^i elements in sorted order.
+ * For each iteration, we take one element from the given queue and
+ * insert it into merged without violating the invariant condition.
+ */

-ngx_queue_remove(q);
+ngx_queue_t carry, *h;
+
+h = ngx_queue_head(queue);
+ngx_queue_remove(h);
+ngx_queue_init(&carry);
+ngx_queue_insert_head(&carry, h);

-do {
-if (cmp(prev, q) <= 0) {
-break;
-}
+for (p = merged; p != last && !ngx_queue_empty(p); p++) {
+ngx_queue_merge(p, &carry, cmp);
+ngx_queue_move(&carry, p);
+}
+if (p == last) {
+ngx_queue_init(last);
+last++;
+}
+ngx_queue_move(p, &carry);
+}

-