sri krishna created MESOS-8731:
----------------------------------

             Summary: mesos master APIs become latent
                 Key: MESOS-8731
                 URL: https://issues.apache.org/jira/browse/MESOS-8731
             Project: Mesos
          Issue Type: Bug
          Components: master
    Affects Versions: 1.5.0, 1.4.0
            Reporter: sri krishna


Over a period of time one of the UI API call to the master becomes latent. 
Normally the request that takes less than a second takes up to 20 seconds 
during peak. A lot of the dev team access the UI for logs.

Below are my observations :

In mesos "0.28.1-2.0.20.ubuntu1404"

################################################################

# ab -n 1000 -c 10 
"http://mesos-master1.mesos.bla.net:5050/metrics/snapshot?jsonp=angular.callbacks._4g";
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking mesos-master1.mesos.bla.net (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:
Server Hostname: mesos-master1.mesos.bla.net
Server Port: 5050

Document Path: /metrics/snapshot?jsonp=angular.callbacks._4g
Document Length: 3197 bytes

Concurrency Level: 10
Time taken for tests: 501.010 seconds
Complete requests: 1000
Failed requests: 954
 (Connect: 0, Receive: 0, Length: 954, Exceptions: 0)
Total transferred: 3304510 bytes
HTML transferred: 3195510 bytes
Requests per second: 2.00 [#/sec] (mean)
Time per request: 5010.104 [ms] (mean)
Time per request: 501.010 [ms] (mean, across all concurrent requests)
Transfer rate: 6.44 [Kbytes/sec] received

Connection Times (ms)
 min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 321 4987 286.4 5007 5508
Waiting: 321 4987 286.4 5007 5508
Total: 321 4988 286.4 5007 5508

Percentage of the requests served within a certain time (ms)
 50% 5007
 66% 5007
 75% 5008
 80% 5008
 90% 5008
 95% 5009
 98% 5010
 99% 5506
 100% 5508 (longest request)

################################################################

 

In mesos 1.4 and 1.5 (versions 1.4.0-2.0.1 and 1.5.0-2.0.1) the response of 
these APIs is quite high. 

################################################################

# ab -n 1000 -c 10 
"http://mesos-master3.stage.bla.net:5050/metrics/snapshot?jsonp=angular.callbacks._4g";
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking mesos-master3.stage.bla.net (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
^C

Server Software:
Server Hostname: mesos-master3.stage.bla.net
Server Port: 5050

Document Path: /metrics/snapshot?jsonp=angular.callbacks._4g
Document Length: 6596 bytes

Concurrency Level: 10
Time taken for tests: 1405.182 seconds
Complete requests: 582
Failed requests: 580
 (Connect: 0, Receive: 0, Length: 580, Exceptions: 0)
Total transferred: 3909986 bytes
HTML transferred: 3846548 bytes
Requests per second: 0.41 [#/sec] (mean)
Time per request: 24144.024 [ms] (mean)
Time per request: 2414.402 [ms] (mean, across all concurrent requests)
Transfer rate: 2.72 [Kbytes/sec] received

Connection Times (ms)
 min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 15284 24058 2600.7 23937 31740
Waiting: 15284 24058 2600.7 23937 31740
Total: 15284 24059 2600.7 23938 31740

Percentage of the requests served within a certain time (ms)
 50% 23938
 66% 25074
 75% 25729
 80% 26465
 90% 27605
 95% 28215
 98% 29685
 99% 30595
 100% 31740 (longest request)

################################################################

I think this is causing the others APIs like "/master/slaves/ and "/metrics" to 
become latent. 

At this point we are forcing a re-elect of the the master to bring the times 
down. What can I do to bring this times down? The load on the box is quite 
less. The load average does not cross 2 on a 8 core box.

Let me know if any further info is required. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to