Nginx upstream delays

Wiebe Cazemier Mon, 09 Mar 2015 08:37:03 -0700

Hello, 

I have a question about sporadic long upstream response times I'm seeing on 
(two of) our Nginx servers. It's kind of hard to show and quantify, but I'll do 
my best.


One is a Django Gunicorn server. We included the upstream response time in the 
Nginx access log and wrote a script to analyze them. What we see, is that on 
the login page of a website (a page that does almost nothing) 95%-99% of 'GET 
/customer/login/' requests are processed within about 50 ms. The other few 
percent can take several seconds. Sometimes even 5s. Our Munin graphs show no 
correlation in disk latency, cpu time, memory use, etc. 

I also added an access log to Gunicorn, so that I can see how long Gunicorn 
takes to process requests that Nginx thinks take long. Gunicorn has 8 workers. 
It can be seen that there is actually no delay in Gunicorn. For instance, Nginx 
sees this (the long upstream response time is marked red, 3.042s): 




11.22.33.44 - - [06/Mar/2015:10:27:46 +0100] "GET /customer/login/ HTTP/1.1" 
200 8310 0.061 0.121 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:48 +0100] "GET /customer/login/ HTTP/1.1" 
200 8310 0.035 0.092 "-" "Echoping/6.0.2" 
11.22.33.44 - - [ 06/Mar/2015:10:27:52 +0100] "GET /customer/login/ HTTP/1.1" 
200 8310 3.042 3.098 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:53 +0100] "GET /customer/login/ HTTP/1.1" 
200 8310 0.051 0.108 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:54 +0100] "GET /customer/login/ HTTP/1.1" 
200 8310 0.038 0.096 "-" "Echoping/6.0.2" 
x.x.x.x - - [06/Mar/2015:10:27:58 +0100] "POST 
/customer/login/?next=/customer/home/ HTTP/1.1" 302 5 0.123 0.123 




But then the corresponding Gunicorn logs shows normal response times (the 
figure after 'None', in µs) (Corresponding line marked blue): 

<blockquote>


11.22.33.44 - - [06/Mar/2015:10:27:41] "GET /customer/login/ HTTP/1.0" 200 None 
41686 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:42] "GET /customer/login/ HTTP/1.0" 200 None 
27629 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:43] "GET /customer/login/ HTTP/1.0" 200 None 
28143 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:44] "GET /customer/login/ HTTP/1.0" 200 None 
41846 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:45] "GET /customer/login/ HTTP/1.0" 200 None 
30192 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:46] "GET /customer/login/ HTTP/1.0" 200 None 
59382 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:48] "GET /customer/login/ HTTP/1.0" 200 None 
33308 "-" "Echoping/6.0.2" 
11.22.33.44 - - [ 06/Mar/2015:10:27:52 ] "GET /customer/login/ HTTP/1.0" 200 
None 39849 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:53] "GET /customer/login/ HTTP/1.0" 200 None 
48321 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:27:54] "GET /customer/login/ HTTP/1.0" 200 None 
36484 "-" "Echoping/6.0.2" 
x.x.x.x - - [06/Mar/2015:10:27:58] "POST /customer/login/?next=/customer/home/ 
HTTP/1.0" 302 None 122295 
y.y.y.y - - [06/Mar/2015:10:28:02] "GET /customer/login/?next=/customer/home/ 
HTTP/1.0" 200 None 97824 
y.y.y.y - - [06/Mar/2015:10:28:03] "GET /customer/login/?next=/customer/home/ 
HTTP/1.0" 200 None 78162 
11.22.33.44 - - [06/Mar/2015:10:28:26] "GET /customer/login/ HTTP/1.0" 200 None 
38350 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:28:27] "GET /customer/login/ HTTP/1.0" 200 None 
31076 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:28:28] "GET /customer/login/ HTTP/1.0" 200 None 
28536 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:28:30] "GET /customer/login/ HTTP/1.0" 200 None 
30981 "-" "Echoping/6.0.2" 
11.22.33.44 - - [06/Mar/2015:10:28:31] "GET /customer/login/ HTTP/1.0" 200 None 
29920 "-" "Echoping/6.0.2" 

</blockquote>


As I said, there are currently 8 workers. I already increased them from 4. The 
log above shows that there are enough seconds between each request that 8 
workers should be able to handle it. I also created a MySQL slow log, which 
doesn't show the delays. MySQL is always fast. 

Another server we have is Nginx with PHP-FPM (with 150 PHP children in the 
pool), no database access. On one particular recent log of a few hundred 
thousand entries, 99% of requests is done in 129ms. But one response even took 
3170ms. Its PHP proxy settings are: 


<blockquote>

location ~ \.php$ { 
fastcgi_split_path_info ^(.+\.php)(/.+)$; 
# NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini 

# With php5-fpm: 
fastcgi_pass unix:/var/run/php5-fpm.sock; 
fastcgi_index index.php; 
include fastcgi_params; 
} 

</blockquote>


It seems something in the communication between Nginx and the service behind it 
slows down sometimes, but I can't figure out what it might be. Any idea what it 
might be or how to diagnose it better? 

Regards, 

Wiebe Cazemier

_______________________________________________
nginx mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx

Nginx upstream delays

Reply via email to