Re: [PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers

2023-02-09 Thread J Carter


On 09/02/2023 16:45, Aleksei Bavshin wrote:

On 2/5/2023 7:01 PM, J Carter wrote:

Hi Aleksei,

Why not permanently assign the task of resolving a given upstream 
server group (all servers/peers within it) to a single worker?


It seems that this approach would resolve the SRV issues, and remove 
the need for the shared queue of tasks.


The load would still be spread evenly for the most realistic 
scenarios - which is where there are many upstream server groups of 
few servers, as opposed to few upstream server groups of many servers.


The intent of the change was exactly opposite, to avoid any permanent 
assignment of periodic tasks to a worker and allow another processes 
to resume resolving if the original assignee exits, no matter if 
normally or abnormally. I'm not even doing enough for that -- I 
should've kept in-progress tasks at the end of the queue with expires 
= resolver timeout + a small constant, and retry from another process 
when the timeout is reached, but the idea was abandoned for a 
minuscule improvement of insertion time. I expect to be asked to 
reconsider, as patch 6/6 does not cover all the possible situations 
where we want to recover a stale task.


Makes sense.

A permanent assignment of a whole upstream would also require 
notifying another processes that the upstream is no longer assigned if 
the worker exits or consistently recovering that assignment over a 
restart of single worker (e.g. after a crash - not a regular 
situation, but one we should take into account nonetheless).


It's a good point, in my mind I had rendezvous hashing + a notification 
sent to all workers when a fellow worker dies - the next worker in the 
rendezvous 'list' would simply assume the dead worker's upstreams while 
the new one inits, and share it back once the replacement worker is up 
(would still use some locks).


Or to keep it simple, just wait for the dead worker's replacement to 
reinit, and pick up the former's stale upstreams.


And the benefit is not quite obvious - I mentioned that resolving SRVs 
with a lot of records may take longer to update the list of peers, but 
the situation with contention is not expected to change significantly* 
if we pin these tasks to a single worker as another worker may be 
doing the same for another upstream. Most importantly, this isn't even 
a bottleneck. It only slightly exacerbates an existing problem with 
certain balancers that already suffer from the overuse of locks, in a 
configuration that was specifically crafted to amplify and highlight 
the difference and is far from these most realistic scenarios.

* Pending verification on a performance test stand.


Well the benefit is that it would prevent the disadvantage you listed, 
and remove at least one other contended lock throughout normal 
operations (the priority queue). But fair enough, yes it makes sense to 
profile it in a wide range of scenarios to see if it's any of those are 
legitimate worries first.



___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel

___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel


Re: [PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers

2023-02-09 Thread Aleksei Bavshin

On 2/5/2023 7:01 PM, J Carter wrote:

Hi Aleksei,

Why not permanently assign the task of resolving a given upstream server 
group (all servers/peers within it) to a single worker?


It seems that this approach would resolve the SRV issues, and remove the 
need for the shared queue of tasks.


The load would still be spread evenly for the most realistic scenarios - 
which is where there are many upstream server groups of few servers, as 
opposed to few upstream server groups of many servers.


The intent of the change was exactly opposite, to avoid any permanent 
assignment of periodic tasks to a worker and allow another processes to 
resume resolving if the original assignee exits, no matter if normally 
or abnormally. I'm not even doing enough for that -- I should've kept 
in-progress tasks at the end of the queue with expires = resolver 
timeout + a small constant, and retry from another process when the 
timeout is reached, but the idea was abandoned for a minuscule 
improvement of insertion time. I expect to be asked to reconsider, as 
patch 6/6 does not cover all the possible situations where we want to 
recover a stale task.


A permanent assignment of a whole upstream would also require notifying 
another processes that the upstream is no longer assigned if the worker 
exits or consistently recovering that assignment over a restart of 
single worker (e.g. after a crash - not a regular situation, but one we 
should take into account nonetheless). And the benefit is not quite 
obvious - I mentioned that resolving SRVs with a lot of records may take 
longer to update the list of peers, but the situation with contention is 
not expected to change significantly* if we pin these tasks to a single 
worker as another worker may be doing the same for another upstream.
Most importantly, this isn't even a bottleneck. It only slightly 
exacerbates an existing problem with certain balancers that already 
suffer from the overuse of locks, in a configuration that was 
specifically crafted to amplify and highlight the difference and is far 
from these most realistic scenarios.


* Pending verification on a performance test stand.
___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel


Re: [PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers

2023-02-05 Thread J Carter

Hi Aleksei,

Why not permanently assign the task of resolving a given upstream server 
group (all servers/peers within it) to a single worker?


It seems that this approach would resolve the SRV issues, and remove the 
need for the shared queue of tasks.


The load would still be spread evenly for the most realistic scenarios - 
which is where there are many upstream server groups of few servers, as 
opposed to few upstream server groups of many servers.



On 01/02/2023 01:37, Aleksei Bavshin wrote:

# HG changeset patch
# User Aleksei Bavshin 
# Date 1670883784 28800
#  Mon Dec 12 14:23:04 2022 -0800
# Node ID f8eb6b94d8f46008eb5f2f1dbc747750d5755506
# Parent  cfae397f1ea87a35c41febab6162fe5142aa767b
Upstream: allow any worker to resolve upstream servers.

This change addresses one of the limitations of the current re-resolve
code, dependency on the worker 0.  Now, each worker is able to pick a
resolve task from a shared priority queue.

The single worker implementation relies on the fact that each peer is
assigned to a specific worker and no other process may access its data.
Thus, it's safe to keep `peer->host.event` in the shared zone and modify
as necessary.  That assumption becomes invalid once we allow any free
worker to update the peer.  Now, all the workers have to know when the
previous resolution result expires and maintain their own timers.  A
single shared event structure is no longer sufficient.

The obvious solution is to make timer events local to a worker by moving
them up to the nearest object in a local memory, upstream.  From the
upstream timer event handler we can walk the list of the peers and pick
these that are expired and not already owned by another process.

To reduce the time spent under a lock we can keep a priority queue of
pending tasks, sorted by expiration time.  Each worker is able to get an
expired server from the head of the queue, perform the name resolution
and put the peer back with a new expiration time.
Per-upstream or per-zone rbtree was considered as a store for pending
tasks, but there won't be any performance benefit until a certain large
number of servers in the upstream.  Per-zone queues also require more
intricate locking.

The benefits of the change are obvious: we're no longer tied to a
lifetime of the first worker process and the distribution of the tasks
is more even.  There are certain disadvantages though:

- SRV record may resolve into multiple A/ records with different TTL
   kept in a worker-local cache of a resolver.  The next query in the
   same worker will reuse all the cache entries that are still valid.
   With the task distribution implemented, any worker may schedule the
   next update of a peer and thus we lose the benefit of a local cache.

- The change introduces an additional short lock on the list of peers
   and allows to acquire existing long locks from different processes.
   For example, it's possible that different workers will resolve large
   SRV records from the same upstream and attempt to update the list of
   peers at the same time.

diff --git a/src/http/modules/ngx_http_upstream_zone_module.c 
b/src/http/modules/ngx_http_upstream_zone_module.c
--- a/src/http/modules/ngx_http_upstream_zone_module.c
+++ b/src/http/modules/ngx_http_upstream_zone_module.c
@@ -10,6 +10,9 @@
  #include 
  
  
+#define NGX_UPSTREAM_RESOLVE_NO_WORKER  (ngx_uint_t) -1

+
+
  static char *ngx_http_upstream_zone(ngx_conf_t *cf, ngx_command_t *cmd,
  void *conf);
  static ngx_int_t ngx_http_upstream_init_zone(ngx_shm_zone_t *shm_zone,
@@ -40,6 +43,13 @@ static ngx_command_t  ngx_http_upstream_
  static ngx_int_t ngx_http_upstream_zone_init_worker(ngx_cycle_t *cycle);
  static void ngx_http_upstream_zone_resolve_timer(ngx_event_t *event);
  static void ngx_http_upstream_zone_resolve_handler(ngx_resolver_ctx_t *ctx);
+static void ngx_http_upstream_zone_resolve_queue_insert(ngx_queue_t *queue,
+ngx_http_upstream_host_t *host);
+static void ngx_http_upstream_zone_start_resolve(
+ngx_http_upstream_srv_conf_t *uscf, ngx_http_upstream_host_t *host);
+static void ngx_http_upstream_zone_schedule_resolve(
+ngx_http_upstream_srv_conf_t *uscf, ngx_http_upstream_host_t *host,
+ngx_msec_t timer);
  
  
  static ngx_http_module_t  ngx_http_upstream_zone_module_ctx = {

@@ -231,6 +241,8 @@ ngx_http_upstream_zone_copy_peers(ngx_sl
  peers->shpool = shpool;
  peers->config = config;
  
+ngx_queue_init(&peers->resolve_queue);

+
  for (peerp = &peers->peer; *peerp; peerp = &peer->next) {
  /* pool is unlocked */
  peer = ngx_http_upstream_zone_copy_peer(peers, *peerp);
@@ -248,6 +260,9 @@ ngx_http_upstream_zone_copy_peers(ngx_sl
  return NULL;
  }
  
+ngx_http_upstream_rr_peer_ref(peers, peer);

+ngx_queue_insert_tail(&peers->resolve_queue, &peer->host->queue);
+
  *peerp = peer;
  peer->id = (*peers->config)++;
  }
@@ -268,6 +283,8 @@ ngx_http_upstream_zo

[PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers

2023-01-31 Thread Aleksei Bavshin
# HG changeset patch
# User Aleksei Bavshin 
# Date 1670883784 28800
#  Mon Dec 12 14:23:04 2022 -0800
# Node ID f8eb6b94d8f46008eb5f2f1dbc747750d5755506
# Parent  cfae397f1ea87a35c41febab6162fe5142aa767b
Upstream: allow any worker to resolve upstream servers.

This change addresses one of the limitations of the current re-resolve
code, dependency on the worker 0.  Now, each worker is able to pick a
resolve task from a shared priority queue.

The single worker implementation relies on the fact that each peer is
assigned to a specific worker and no other process may access its data.
Thus, it's safe to keep `peer->host.event` in the shared zone and modify
as necessary.  That assumption becomes invalid once we allow any free
worker to update the peer.  Now, all the workers have to know when the
previous resolution result expires and maintain their own timers.  A
single shared event structure is no longer sufficient.

The obvious solution is to make timer events local to a worker by moving
them up to the nearest object in a local memory, upstream.  From the
upstream timer event handler we can walk the list of the peers and pick
these that are expired and not already owned by another process.

To reduce the time spent under a lock we can keep a priority queue of
pending tasks, sorted by expiration time.  Each worker is able to get an
expired server from the head of the queue, perform the name resolution
and put the peer back with a new expiration time.
Per-upstream or per-zone rbtree was considered as a store for pending
tasks, but there won't be any performance benefit until a certain large
number of servers in the upstream.  Per-zone queues also require more
intricate locking.

The benefits of the change are obvious: we're no longer tied to a
lifetime of the first worker process and the distribution of the tasks
is more even.  There are certain disadvantages though:

- SRV record may resolve into multiple A/ records with different TTL
  kept in a worker-local cache of a resolver.  The next query in the
  same worker will reuse all the cache entries that are still valid.
  With the task distribution implemented, any worker may schedule the
  next update of a peer and thus we lose the benefit of a local cache.

- The change introduces an additional short lock on the list of peers
  and allows to acquire existing long locks from different processes.
  For example, it's possible that different workers will resolve large
  SRV records from the same upstream and attempt to update the list of
  peers at the same time.

diff --git a/src/http/modules/ngx_http_upstream_zone_module.c 
b/src/http/modules/ngx_http_upstream_zone_module.c
--- a/src/http/modules/ngx_http_upstream_zone_module.c
+++ b/src/http/modules/ngx_http_upstream_zone_module.c
@@ -10,6 +10,9 @@
 #include 
 
 
+#define NGX_UPSTREAM_RESOLVE_NO_WORKER  (ngx_uint_t) -1
+
+
 static char *ngx_http_upstream_zone(ngx_conf_t *cf, ngx_command_t *cmd,
 void *conf);
 static ngx_int_t ngx_http_upstream_init_zone(ngx_shm_zone_t *shm_zone,
@@ -40,6 +43,13 @@ static ngx_command_t  ngx_http_upstream_
 static ngx_int_t ngx_http_upstream_zone_init_worker(ngx_cycle_t *cycle);
 static void ngx_http_upstream_zone_resolve_timer(ngx_event_t *event);
 static void ngx_http_upstream_zone_resolve_handler(ngx_resolver_ctx_t *ctx);
+static void ngx_http_upstream_zone_resolve_queue_insert(ngx_queue_t *queue,
+ngx_http_upstream_host_t *host);
+static void ngx_http_upstream_zone_start_resolve(
+ngx_http_upstream_srv_conf_t *uscf, ngx_http_upstream_host_t *host);
+static void ngx_http_upstream_zone_schedule_resolve(
+ngx_http_upstream_srv_conf_t *uscf, ngx_http_upstream_host_t *host,
+ngx_msec_t timer);
 
 
 static ngx_http_module_t  ngx_http_upstream_zone_module_ctx = {
@@ -231,6 +241,8 @@ ngx_http_upstream_zone_copy_peers(ngx_sl
 peers->shpool = shpool;
 peers->config = config;
 
+ngx_queue_init(&peers->resolve_queue);
+
 for (peerp = &peers->peer; *peerp; peerp = &peer->next) {
 /* pool is unlocked */
 peer = ngx_http_upstream_zone_copy_peer(peers, *peerp);
@@ -248,6 +260,9 @@ ngx_http_upstream_zone_copy_peers(ngx_sl
 return NULL;
 }
 
+ngx_http_upstream_rr_peer_ref(peers, peer);
+ngx_queue_insert_tail(&peers->resolve_queue, &peer->host->queue);
+
 *peerp = peer;
 peer->id = (*peers->config)++;
 }
@@ -268,6 +283,8 @@ ngx_http_upstream_zone_copy_peers(ngx_sl
 backup->shpool = shpool;
 backup->config = config;
 
+ngx_queue_init(&backup->resolve_queue);
+
 for (peerp = &backup->peer; *peerp; peerp = &peer->next) {
 /* pool is unlocked */
 peer = ngx_http_upstream_zone_copy_peer(backup, *peerp);
@@ -285,6 +302,9 @@ ngx_http_upstream_zone_copy_peers(ngx_sl
 return NULL;
 }
 
+ngx_http_upstream_rr_peer_ref(backup, peer);
+ngx_queue_insert_tail(&backup->resolve_queue, &peer->host->queue);
+
 *peerp = pe