Re: [PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers
On 09/02/2023 16:45, Aleksei Bavshin wrote: On 2/5/2023 7:01 PM, J Carter wrote: Hi Aleksei, Why not permanently assign the task of resolving a given upstream server group (all servers/peers within it) to a single worker? It seems that this approach would resolve the SRV issues, and remove the need for the shared queue of tasks. The load would still be spread evenly for the most realistic scenarios - which is where there are many upstream server groups of few servers, as opposed to few upstream server groups of many servers. The intent of the change was exactly opposite, to avoid any permanent assignment of periodic tasks to a worker and allow another processes to resume resolving if the original assignee exits, no matter if normally or abnormally. I'm not even doing enough for that -- I should've kept in-progress tasks at the end of the queue with expires = resolver timeout + a small constant, and retry from another process when the timeout is reached, but the idea was abandoned for a minuscule improvement of insertion time. I expect to be asked to reconsider, as patch 6/6 does not cover all the possible situations where we want to recover a stale task. Makes sense. A permanent assignment of a whole upstream would also require notifying another processes that the upstream is no longer assigned if the worker exits or consistently recovering that assignment over a restart of single worker (e.g. after a crash - not a regular situation, but one we should take into account nonetheless). It's a good point, in my mind I had rendezvous hashing + a notification sent to all workers when a fellow worker dies - the next worker in the rendezvous 'list' would simply assume the dead worker's upstreams while the new one inits, and share it back once the replacement worker is up (would still use some locks). Or to keep it simple, just wait for the dead worker's replacement to reinit, and pick up the former's stale upstreams. And the benefit is not quite obvious - I mentioned that resolving SRVs with a lot of records may take longer to update the list of peers, but the situation with contention is not expected to change significantly* if we pin these tasks to a single worker as another worker may be doing the same for another upstream. Most importantly, this isn't even a bottleneck. It only slightly exacerbates an existing problem with certain balancers that already suffer from the overuse of locks, in a configuration that was specifically crafted to amplify and highlight the difference and is far from these most realistic scenarios. * Pending verification on a performance test stand. Well the benefit is that it would prevent the disadvantage you listed, and remove at least one other contended lock throughout normal operations (the priority queue). But fair enough, yes it makes sense to profile it in a wide range of scenarios to see if it's any of those are legitimate worries first. ___ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel ___ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel
Re: [PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers
On 2/5/2023 7:01 PM, J Carter wrote: Hi Aleksei, Why not permanently assign the task of resolving a given upstream server group (all servers/peers within it) to a single worker? It seems that this approach would resolve the SRV issues, and remove the need for the shared queue of tasks. The load would still be spread evenly for the most realistic scenarios - which is where there are many upstream server groups of few servers, as opposed to few upstream server groups of many servers. The intent of the change was exactly opposite, to avoid any permanent assignment of periodic tasks to a worker and allow another processes to resume resolving if the original assignee exits, no matter if normally or abnormally. I'm not even doing enough for that -- I should've kept in-progress tasks at the end of the queue with expires = resolver timeout + a small constant, and retry from another process when the timeout is reached, but the idea was abandoned for a minuscule improvement of insertion time. I expect to be asked to reconsider, as patch 6/6 does not cover all the possible situations where we want to recover a stale task. A permanent assignment of a whole upstream would also require notifying another processes that the upstream is no longer assigned if the worker exits or consistently recovering that assignment over a restart of single worker (e.g. after a crash - not a regular situation, but one we should take into account nonetheless). And the benefit is not quite obvious - I mentioned that resolving SRVs with a lot of records may take longer to update the list of peers, but the situation with contention is not expected to change significantly* if we pin these tasks to a single worker as another worker may be doing the same for another upstream. Most importantly, this isn't even a bottleneck. It only slightly exacerbates an existing problem with certain balancers that already suffer from the overuse of locks, in a configuration that was specifically crafted to amplify and highlight the difference and is far from these most realistic scenarios. * Pending verification on a performance test stand. ___ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel
Re: [PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers
Hi Aleksei, Why not permanently assign the task of resolving a given upstream server group (all servers/peers within it) to a single worker? It seems that this approach would resolve the SRV issues, and remove the need for the shared queue of tasks. The load would still be spread evenly for the most realistic scenarios - which is where there are many upstream server groups of few servers, as opposed to few upstream server groups of many servers. On 01/02/2023 01:37, Aleksei Bavshin wrote: # HG changeset patch # User Aleksei Bavshin # Date 1670883784 28800 # Mon Dec 12 14:23:04 2022 -0800 # Node ID f8eb6b94d8f46008eb5f2f1dbc747750d5755506 # Parent cfae397f1ea87a35c41febab6162fe5142aa767b Upstream: allow any worker to resolve upstream servers. This change addresses one of the limitations of the current re-resolve code, dependency on the worker 0. Now, each worker is able to pick a resolve task from a shared priority queue. The single worker implementation relies on the fact that each peer is assigned to a specific worker and no other process may access its data. Thus, it's safe to keep `peer->host.event` in the shared zone and modify as necessary. That assumption becomes invalid once we allow any free worker to update the peer. Now, all the workers have to know when the previous resolution result expires and maintain their own timers. A single shared event structure is no longer sufficient. The obvious solution is to make timer events local to a worker by moving them up to the nearest object in a local memory, upstream. From the upstream timer event handler we can walk the list of the peers and pick these that are expired and not already owned by another process. To reduce the time spent under a lock we can keep a priority queue of pending tasks, sorted by expiration time. Each worker is able to get an expired server from the head of the queue, perform the name resolution and put the peer back with a new expiration time. Per-upstream or per-zone rbtree was considered as a store for pending tasks, but there won't be any performance benefit until a certain large number of servers in the upstream. Per-zone queues also require more intricate locking. The benefits of the change are obvious: we're no longer tied to a lifetime of the first worker process and the distribution of the tasks is more even. There are certain disadvantages though: - SRV record may resolve into multiple A/ records with different TTL kept in a worker-local cache of a resolver. The next query in the same worker will reuse all the cache entries that are still valid. With the task distribution implemented, any worker may schedule the next update of a peer and thus we lose the benefit of a local cache. - The change introduces an additional short lock on the list of peers and allows to acquire existing long locks from different processes. For example, it's possible that different workers will resolve large SRV records from the same upstream and attempt to update the list of peers at the same time. diff --git a/src/http/modules/ngx_http_upstream_zone_module.c b/src/http/modules/ngx_http_upstream_zone_module.c --- a/src/http/modules/ngx_http_upstream_zone_module.c +++ b/src/http/modules/ngx_http_upstream_zone_module.c @@ -10,6 +10,9 @@ #include +#define NGX_UPSTREAM_RESOLVE_NO_WORKER (ngx_uint_t) -1 + + static char *ngx_http_upstream_zone(ngx_conf_t *cf, ngx_command_t *cmd, void *conf); static ngx_int_t ngx_http_upstream_init_zone(ngx_shm_zone_t *shm_zone, @@ -40,6 +43,13 @@ static ngx_command_t ngx_http_upstream_ static ngx_int_t ngx_http_upstream_zone_init_worker(ngx_cycle_t *cycle); static void ngx_http_upstream_zone_resolve_timer(ngx_event_t *event); static void ngx_http_upstream_zone_resolve_handler(ngx_resolver_ctx_t *ctx); +static void ngx_http_upstream_zone_resolve_queue_insert(ngx_queue_t *queue, +ngx_http_upstream_host_t *host); +static void ngx_http_upstream_zone_start_resolve( +ngx_http_upstream_srv_conf_t *uscf, ngx_http_upstream_host_t *host); +static void ngx_http_upstream_zone_schedule_resolve( +ngx_http_upstream_srv_conf_t *uscf, ngx_http_upstream_host_t *host, +ngx_msec_t timer); static ngx_http_module_t ngx_http_upstream_zone_module_ctx = { @@ -231,6 +241,8 @@ ngx_http_upstream_zone_copy_peers(ngx_sl peers->shpool = shpool; peers->config = config; +ngx_queue_init(&peers->resolve_queue); + for (peerp = &peers->peer; *peerp; peerp = &peer->next) { /* pool is unlocked */ peer = ngx_http_upstream_zone_copy_peer(peers, *peerp); @@ -248,6 +260,9 @@ ngx_http_upstream_zone_copy_peers(ngx_sl return NULL; } +ngx_http_upstream_rr_peer_ref(peers, peer); +ngx_queue_insert_tail(&peers->resolve_queue, &peer->host->queue); + *peerp = peer; peer->id = (*peers->config)++; } @@ -268,6 +283,8 @@ ngx_http_upstream_zo
[PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers
# HG changeset patch # User Aleksei Bavshin # Date 1670883784 28800 # Mon Dec 12 14:23:04 2022 -0800 # Node ID f8eb6b94d8f46008eb5f2f1dbc747750d5755506 # Parent cfae397f1ea87a35c41febab6162fe5142aa767b Upstream: allow any worker to resolve upstream servers. This change addresses one of the limitations of the current re-resolve code, dependency on the worker 0. Now, each worker is able to pick a resolve task from a shared priority queue. The single worker implementation relies on the fact that each peer is assigned to a specific worker and no other process may access its data. Thus, it's safe to keep `peer->host.event` in the shared zone and modify as necessary. That assumption becomes invalid once we allow any free worker to update the peer. Now, all the workers have to know when the previous resolution result expires and maintain their own timers. A single shared event structure is no longer sufficient. The obvious solution is to make timer events local to a worker by moving them up to the nearest object in a local memory, upstream. From the upstream timer event handler we can walk the list of the peers and pick these that are expired and not already owned by another process. To reduce the time spent under a lock we can keep a priority queue of pending tasks, sorted by expiration time. Each worker is able to get an expired server from the head of the queue, perform the name resolution and put the peer back with a new expiration time. Per-upstream or per-zone rbtree was considered as a store for pending tasks, but there won't be any performance benefit until a certain large number of servers in the upstream. Per-zone queues also require more intricate locking. The benefits of the change are obvious: we're no longer tied to a lifetime of the first worker process and the distribution of the tasks is more even. There are certain disadvantages though: - SRV record may resolve into multiple A/ records with different TTL kept in a worker-local cache of a resolver. The next query in the same worker will reuse all the cache entries that are still valid. With the task distribution implemented, any worker may schedule the next update of a peer and thus we lose the benefit of a local cache. - The change introduces an additional short lock on the list of peers and allows to acquire existing long locks from different processes. For example, it's possible that different workers will resolve large SRV records from the same upstream and attempt to update the list of peers at the same time. diff --git a/src/http/modules/ngx_http_upstream_zone_module.c b/src/http/modules/ngx_http_upstream_zone_module.c --- a/src/http/modules/ngx_http_upstream_zone_module.c +++ b/src/http/modules/ngx_http_upstream_zone_module.c @@ -10,6 +10,9 @@ #include +#define NGX_UPSTREAM_RESOLVE_NO_WORKER (ngx_uint_t) -1 + + static char *ngx_http_upstream_zone(ngx_conf_t *cf, ngx_command_t *cmd, void *conf); static ngx_int_t ngx_http_upstream_init_zone(ngx_shm_zone_t *shm_zone, @@ -40,6 +43,13 @@ static ngx_command_t ngx_http_upstream_ static ngx_int_t ngx_http_upstream_zone_init_worker(ngx_cycle_t *cycle); static void ngx_http_upstream_zone_resolve_timer(ngx_event_t *event); static void ngx_http_upstream_zone_resolve_handler(ngx_resolver_ctx_t *ctx); +static void ngx_http_upstream_zone_resolve_queue_insert(ngx_queue_t *queue, +ngx_http_upstream_host_t *host); +static void ngx_http_upstream_zone_start_resolve( +ngx_http_upstream_srv_conf_t *uscf, ngx_http_upstream_host_t *host); +static void ngx_http_upstream_zone_schedule_resolve( +ngx_http_upstream_srv_conf_t *uscf, ngx_http_upstream_host_t *host, +ngx_msec_t timer); static ngx_http_module_t ngx_http_upstream_zone_module_ctx = { @@ -231,6 +241,8 @@ ngx_http_upstream_zone_copy_peers(ngx_sl peers->shpool = shpool; peers->config = config; +ngx_queue_init(&peers->resolve_queue); + for (peerp = &peers->peer; *peerp; peerp = &peer->next) { /* pool is unlocked */ peer = ngx_http_upstream_zone_copy_peer(peers, *peerp); @@ -248,6 +260,9 @@ ngx_http_upstream_zone_copy_peers(ngx_sl return NULL; } +ngx_http_upstream_rr_peer_ref(peers, peer); +ngx_queue_insert_tail(&peers->resolve_queue, &peer->host->queue); + *peerp = peer; peer->id = (*peers->config)++; } @@ -268,6 +283,8 @@ ngx_http_upstream_zone_copy_peers(ngx_sl backup->shpool = shpool; backup->config = config; +ngx_queue_init(&backup->resolve_queue); + for (peerp = &backup->peer; *peerp; peerp = &peer->next) { /* pool is unlocked */ peer = ngx_http_upstream_zone_copy_peer(backup, *peerp); @@ -285,6 +302,9 @@ ngx_http_upstream_zone_copy_peers(ngx_sl return NULL; } +ngx_http_upstream_rr_peer_ref(backup, peer); +ngx_queue_insert_tail(&backup->resolve_queue, &peer->host->queue); + *peerp = pe