On Tue, Apr 18, 2017 at 02:39:27PM +0200, B.R. via nginx wrote: > 'down' should not translate into any kind of attempt, so nothing should > really appear for the servers in that static state. > For 'unavailable' servers, for the most part the content of the variables > should be the same. > > Starting from the example I provided, here is what I expected to see: > - $upstream_addr: <IP address 1>:<port>, <IP address 2>:<port>, <IP address > 3>:<port>, <IP address 4>:<port>, <IP address 5>:<port>, <IP address > 6>:<port> > - $upstream_response_time: 0.000, 0.000, 0.000, 0.000, 0.001, 0.000 > > That, associated with the 502 response from the HTTP language, is > sufficient to interpret the log entry as: the request failed to find a > proper backend after having attempted communication with the 6 specified > active backends. It is pretty straightforward.
And what about the next request when all of servers are either "down" or "unavailable"? I'm just trying to give you a simple example of when the value of the $upstream_addr variable has a special value equal to the upstream group name, and why it can be useful. > If you want to add something to explicitely states the whole upstream group > is down, this should go to the error log. > At the very least, if the current way of working is kept, the grammar of > the content of the $upstream_* variables should be specified. > > Does not that seem reasonable? > --- > *B. R.* > > On Mon, Apr 17, 2017 at 6:09 PM, Ruslan Ermilov <r...@nginx.com> wrote: > > > On Sat, Apr 15, 2017 at 03:55:20AM +0200, B.R. via nginx wrote: > > > Let me be clear here: > > > I got 6 active servers (not marked down), and the logs show 1 attempt on > > > each. They all failed for a known reason, and there is no problem there. > > > Subsequently, the whole pool was 'down' and the response was 502. > > > Everything perfectly normal so far. > > > > > > What is unclear is the feature (as you classified it) of having a fake > > > node named after the pool appearing in the list of tried upstream > > servers. > > > It brings confusion more than anything else: having a 502 response + the > > > list of all tried (and failed) nodes corresponding with the list of > > active > > > nodes is more than enough to describe what happened. > > > The name of the upstream group does not corresponding to any real asset, > > it > > > is purely virtual classification. It thus makes no sense at all to me to > > > have it appearing as a 7th 'node' in the list... and how do you interpret > > > its response time (where you got also a 7th item in the list)? > > > Moreover, it is confusing, since proxy_pass handles domain names and one > > > could believe nginx treated the upstream group name as such. > > > > Without the six attempts, if all of the servers are unreachable (either > > "down" or "unavailable" because they have failed previously) at the time > > the request starts, what do you expect to see in $upstream_*? > > > > > On Fri, Apr 14, 2017 at 10:21 AM, Ruslan Ermilov <r...@nginx.com> wrote: > > > > > > > On Fri, Apr 14, 2017 at 09:41:36AM +0200, B.R. via nginx wrote: > > > > > Hello, > > > > > > > > > > Reading from upstream > > > > > <https://nginx.org/en/docs/http/ngx_http_upstream_module.htm > > l#upstream> > > > > > docs, on upstream pool exhaustion, every backend should be tried > > once, > > > > and > > > > > then if all fail the response should be crafted based on the one > > from the > > > > > last server attempt. > > > > > So far so good. > > > > > > > > > > I recently faced a server farm which implements a dull nightly > > restart of > > > > > every node, not sequencing it, resulting in the possibility of > > having all > > > > > nodes offline at the same time. > > > > > > > > > > However, I collected log entries which did not match what I was > > expected. > > > > > For 6 backend nodes, I got: > > > > > - log format: $status $body_bytes_sent $request_time $upstream_addr > > > > > $upstream_response_time > > > > > - log entry: 502 568 0.001 <IP address 1>:<port>, <IP address > > 2>:<port>, > > > > > <IP address 3>:<port>, <IP address 4>:<port>, <IP address 5>:<port>, > > <IP > > > > > address 6>:<port>, php-fpm 0.000, 0.000, 0.000, 0.000, 0.001, 0.000, > > > > 0.000 > > > > > I got 7 entries for $upstream_addr & $upstream_response_time, > > instead of > > > > > the expected 6. > > > > > > > > > > Here are the interesting parts of the configuration: > > > > > upstream php-fpm { > > > > > server <machine 1>:<port> down; > > > > > server <machine 2>:<port> down; > > > > > [...] > > > > > server <machine N-5>:<port>; > > > > > server <machine N-4>:<port>; > > > > > server <machine N-3>:<port>; > > > > > server <machine N-2>:<port>; > > > > > server <machine N-1>:<port>; > > > > > server <machine N>:<port>; > > > > > keepalive 128; > > > > > } > > > > > > > > > > server { > > > > > set $fpm_pool "php-fpm$fpm_pool_ID"; > > > > > [...] > > > > > location ~ \.php$ { > > > > > [...] > > > > > fastcgi_read_timeout 600; > > > > > fastcgi_keep_conn on; > > > > > fastcgi_index index.php; > > > > > > > > > > include fastcgi_params; > > > > > fastcgi_param SCRIPT_FILENAME > > > > > $document_root$fastcgi_script_name; > > > > > [...] > > > > > fastcgi_pass $fpm_pool; > > > > > } > > > > > } > > > > > > > > > > The question is: > > > > > php-fpm being an upstream group name, how come has it been tried as a > > > > > domain name in the end? > > > > > Stated otherwise, is this because the upstream group is considered > > > > 'down', > > > > > thus somehow removed from the possibilities, and nginx trying one > > last > > > > time > > > > > the name as a domain name to see if something answers? > > > > > This 7th request is definitely strange to my point of view. Is it a > > bug > > > > or > > > > > a feature? > > > > > > > > A feature. > > > > > > > > Most $upstream_* variables are vectored ones, and the number of entries > > > > in their values corresponds to the number of tries made to select a > > peer. > > > > When a peer cannot be selected at all (as in your case), the status is > > > > 502 and the name equals the upstream group name. > > > > > > > > There could be several reasons why none of the peers can be selected. > > > > For example, some peers are marked "down", and other peers were failing > > > > and are now in the "unavailable" state. > > > > > > > > The number of tries is limited by the number of servers in the group, > > > > unless futher restricted by proxy_next_upstream_tries. In your case, > > > > since there are two "down" servers, and other servers are unavailable, > > > > you reach the situation when a peer cannot be selected. If you comment > > > > out the two "down" servers, and try a few requests in a row when all > > > > servers are physically unavailable, the first log entry will list all > > > > of the attempted servers, and then for the next 10 seconds (in the > > > > default config) you'll see only the upstream group name and 502 in > > > > $upstream_status, until the servers become available again (see > > > > max_fails/fail_timeout). > > > > > > > > Hope this makes things a little bit clearer. > > > > _______________________________________________ > > > > nginx mailing list > > > > nginx@nginx.org > > > > http://mailman.nginx.org/mailman/listinfo/nginx > > > > > > -- > > Ruslan Ermilov > > Assume stupidity not malice > > _______________________________________________ > > nginx mailing list > > nginx@nginx.org > > http://mailman.nginx.org/mailman/listinfo/nginx > > -- Ruslan Ermilov Assume stupidity not malice _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx