sri krishna created MESOS-8731: ---------------------------------- Summary: mesos master APIs become latent Key: MESOS-8731 URL: https://issues.apache.org/jira/browse/MESOS-8731 Project: Mesos Issue Type: Bug Components: master Affects Versions: 1.5.0, 1.4.0 Reporter: sri krishna
Over a period of time one of the UI API call to the master becomes latent. Normally the request that takes less than a second takes up to 20 seconds during peak. A lot of the dev team access the UI for logs. Below are my observations : In mesos "0.28.1-2.0.20.ubuntu1404" ################################################################ # ab -n 1000 -c 10 "http://mesos-master1.mesos.bla.net:5050/metrics/snapshot?jsonp=angular.callbacks._4g" This is ApacheBench, Version 2.3 <$Revision: 1528965 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking mesos-master1.mesos.bla.net (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Completed 1000 requests Finished 1000 requests Server Software: Server Hostname: mesos-master1.mesos.bla.net Server Port: 5050 Document Path: /metrics/snapshot?jsonp=angular.callbacks._4g Document Length: 3197 bytes Concurrency Level: 10 Time taken for tests: 501.010 seconds Complete requests: 1000 Failed requests: 954 (Connect: 0, Receive: 0, Length: 954, Exceptions: 0) Total transferred: 3304510 bytes HTML transferred: 3195510 bytes Requests per second: 2.00 [#/sec] (mean) Time per request: 5010.104 [ms] (mean) Time per request: 501.010 [ms] (mean, across all concurrent requests) Transfer rate: 6.44 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 321 4987 286.4 5007 5508 Waiting: 321 4987 286.4 5007 5508 Total: 321 4988 286.4 5007 5508 Percentage of the requests served within a certain time (ms) 50% 5007 66% 5007 75% 5008 80% 5008 90% 5008 95% 5009 98% 5010 99% 5506 100% 5508 (longest request) ################################################################ In mesos 1.4 and 1.5 (versions 1.4.0-2.0.1 and 1.5.0-2.0.1) the response of these APIs is quite high. ################################################################ # ab -n 1000 -c 10 "http://mesos-master3.stage.bla.net:5050/metrics/snapshot?jsonp=angular.callbacks._4g" This is ApacheBench, Version 2.3 <$Revision: 1706008 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking mesos-master3.stage.bla.net (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests ^C Server Software: Server Hostname: mesos-master3.stage.bla.net Server Port: 5050 Document Path: /metrics/snapshot?jsonp=angular.callbacks._4g Document Length: 6596 bytes Concurrency Level: 10 Time taken for tests: 1405.182 seconds Complete requests: 582 Failed requests: 580 (Connect: 0, Receive: 0, Length: 580, Exceptions: 0) Total transferred: 3909986 bytes HTML transferred: 3846548 bytes Requests per second: 0.41 [#/sec] (mean) Time per request: 24144.024 [ms] (mean) Time per request: 2414.402 [ms] (mean, across all concurrent requests) Transfer rate: 2.72 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 15284 24058 2600.7 23937 31740 Waiting: 15284 24058 2600.7 23937 31740 Total: 15284 24059 2600.7 23938 31740 Percentage of the requests served within a certain time (ms) 50% 23938 66% 25074 75% 25729 80% 26465 90% 27605 95% 28215 98% 29685 99% 30595 100% 31740 (longest request) ################################################################ I think this is causing the others APIs like "/master/slaves/ and "/metrics" to become latent. At this point we are forcing a re-elect of the the master to bring the times down. What can I do to bring this times down? The load on the box is quite less. The load average does not cross 2 on a 8 core box. Let me know if any further info is required. -- This message was sent by Atlassian JIRA (v7.6.3#76005)