Re: resource fragmentation/mesos-master offers too low
Hey Vinod, Mesos: 0.23.0-1.0.ubuntu1404 Marathon: 0.10.1-1.0.416.ubuntu1404 Chronos: 2.4.0-0.1.20150828104228.ubuntu1404 Single master, 16 slaves, each with 6cpu 6mem, many (perhaps not all) had more resources than mentioned in the logs (mesos offers with 6m mem etc..) or what was required for the marathon tasks (.25cpu 128mem). # cat /etc/mesos-master/offer_timeout 30secs Marathon is running 200-400 tasks at any given time, chronos <20. There aren't too much interesting in the logs, prior to this many lines of Marathon app definition and then the 'Not all basic resources satisfied' spam starts, something like Sep 29 09:35:35 lab-mesos-master1 marathon[60609]: [INFO] [09/29/2015 09:35:35.938] [marathon-akka.actor.default-dispatcher-1396] [akka://marathon/user/MarathonScheduler/$a/DeploymentManager/6d87463d-38 d6-4f20-86ba-9caefce12ad7/$a] Successfully started 0 instances of Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929 09:35:36.444304 60462 master.cpp:4290] Sending 1 offers to framework 20150928-140031-2581335306-5050-60442- (marathon) at scheduler-3cced 735-b1d8-4739-adae-b479a0411593@127.0.1.1:50185 Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929 09:35:36.445269 60462 master.cpp:4290] Sending 3 offers to framework 20150928-104536-2581335306-5050-25731- (chronos-2.4.0) at scheduler- 16cd75ef-d383-4c4c-8239-89a0c1fac536@127.0.1.1:57205 Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: [2015-09-29 09:35:36,447] INFO Received resource offers (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:82) Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: [2015-09-29 09:35:36,447] INFO No tasks scheduled or next task has been disabled. Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:131) Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: [2015-09-29 09:35:36,447] INFO Declining unused offers. (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:89) Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: [2015-09-29 09:35:36,448] INFO Declined unused offers with filter refuseSeconds=5.0 (use --decline_offer_duration to reconfigure) (org.apache.mesos.chro nos.scheduler.mesos.MesosJobFramework:97) Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929 09:35:36.450083 60465 master.cpp:2884] Processing DECLINE call for offers: [ 20150928-140031-2581335306-5050-60442-O436519 ] for framework 20 150928-104536-2581335306-5050-25731- (chronos-2.4.0) at scheduler-16cd75ef-d383-4c4c-8239-89a0c1fac536@127.0.1.1:57205 Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929 09:35:36.451763 60465 master.cpp:2884] Processing DECLINE call for offers: [ 20150928-140031-2581335306-5050-60442-O436520 ] for framework 20 150928-104536-2581335306-5050-25731- (chronos-2.4.0) at scheduler-16cd75ef-d383-4c4c-8239-89a0c1fac536@127.0.1.1:57205 Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929 09:35:36.453340 60465 master.cpp:2884] Processing DECLINE call for offers: [ 20150928-140031-2581335306-5050-60442-O436521 ] for framework 20 150928-104536-2581335306-5050-25731- (chronos-2.4.0) at scheduler-16cd75ef-d383-4c4c-8239-89a0c1fac536@127.0.1.1:57205 Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929 09:35:36.457660 60467 hierarchical.hpp:761] Recovered cpus(*):0.05; mem(*):12; disk(*):41211; ports(*):[31000-31005, 31007-31014, 31016-31063 , 31065-31088, 31090-31146, 31148-31176, 31178-31193, 31195-31226, 31228-31239, 31241-31321, 31323-31367, 31369-31524, 31526-31544, 31546-31562, 31564-31564, 31566-31598, 31600-31667, 31669-31709, 31711 -31711, 31713-31732, 31734-31750, 31752-31850, 31852-31864, 31866-31910, 31912-31992, 31994-32000] (total: cpus(*):6; mem(*):2922; disk(*):41211; ports(*):[31000-32000], allocated: cpus(*):5.95; mem(*): 2910; ports(*):[31006-31006, 31015-31015, 31064-31064, 31089-31089, 31147-31147, 31177-31177, 31194-31194, 31227-31227, 31240-31240, 31322-31322, 31368-31368, 31525-31525, 31545-31545, 31563-31563, 3156 5-31565, 31599-31599, 31668-31668, 31710-31710, 31712-31712, 31733-31733, 31751-31751, 31851-31851, 31865-31865, 31911-31911, 31993-31993]) on slave 20150827-122131-2581335306-5050-34525-S274 from frame work 20150928-104536-2581335306-5050-25731- Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929 09:35:36.459028 60467 hierarchical.hpp:761] Recovered cpus(*):1.45; mem(*):70; disk(*):41211; ports(*):[31000-31236, 31238-31242, 31244-31256 , 31258-31262, 31264-31267, 31269-31307, 31309-31313, 31315-31450, 31452-31486, 31488-31618, 31620-31622, 31624-31634, 31636-31653, 31655-31751, 31753-31826, 31828-31837, 31839-31896, 31898-31962, 31964 -31967, 31969-31983, 31985-32000] (total: cpus(*):6; mem(*):2922; disk(*):41211; ports(*):[31000-32000], allocated: cpus(*):4.55; mem(*):2852; ports(*):[31237-31237, 31243-31243, 31257-31257, 31263-3126 3, 31268-31268, 31308-31308, 31314-31314, 31451-31451, 31487-31487, 31619-31619, 31623-31623, 31635-31635, 316
Re: resource fragmentation/mesos-master offers too low
Can you provide more details about your setup and master logs when this started to happen? Typically, when a framework declines an offer it sets a timeout filter which gives mesos a chance to aggregate free resources on a slave into one big offer. On Tue, Sep 29, 2015 at 6:31 AM, Eren Güven wrote: > Hello, > > I had an incident where mesos kept offering marathon insufficient > resources which resulted in lots of forever WAITING tasks in marathon. > Below are some example logs I saw. It kept repeating the same insufficient > offers even though the cluster had a lot of free resources like ~80cpus and > 45G mem. A master restart solved the issue but I was wondering what can > result in this situation or if anyone had any pointers on where to dig > further. > > Thanks! > > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (160.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 34.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (160.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 62.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (160.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (160.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), > mem NOT SATISFIED (128.0 > 6.0) > cpu SATISFIED (0.25 <= 0.6504), disk SATIS
resource fragmentation/mesos-master offers too low
Hello, I had an incident where mesos kept offering marathon insufficient resources which resulted in lots of forever WAITING tasks in marathon. Below are some example logs I saw. It kept repeating the same insufficient offers even though the cluster had a lot of free resources like ~80cpus and 45G mem. A master restart solved the issue but I was wondering what can result in this situation or if anyone had any pointers on where to dig further. Thanks! cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (160.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.3514), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 34.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (160.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 1.1504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 62.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (160.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.7998), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (160.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 0.6504), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 6.0) cpu SATISFIED (0.25 <= 1.25), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 42.0) cpu SATISFIED (0.25 <= 1.25), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (160.0 > 42.0) cpu SATISFIED (0.25 <= 1.25), disk SATISFIED (0.0 <= 0.0), mem NOT SATISFIED (128.0 > 42.0) cpu SATISFIED (0.25 <= 1.25), disk SATISFIED