I was running a load test on a mesos-cluster, and observed that when mesos is 
running lots of frameworks, offer starvation occurs for certain frameworks, 
i.e. only a subset of frameworks registered with mesos gets offers. Let me 
describe the scenario below:

First phase:
At the beginning, there’s only one framework registered with mesos, which is 
‘Marathon’. The load generator, uses Marathon’s API to launch let’s say 50 
Jenkins masters, with mesos-plugin installed. Once all 50 masters are launched, 
the mesos-cluster now have 51 frameworks registered in total, because the 
mesos-plugin registers itself with mesos-master as a framework.

Second phase:
Now, the load generator goes and triggers couple of build jobs on each Jenkins 
Master. Each framework’s Schedular will now have let’s say 2 items in it’s 
build queue. Once framework get’s a resource offer from Master, it’s schedular 
can perform the build tasks, if the offer matches the resource constraints as 
specified by mesos-plugin.

What I observed was, at the start of second phase, some frameworks (jenkins 
masters) got offers and got their tasks scheduled to run. But, rest of the 
frameworks, didn’t get resource offers from mesos-master, and the build jobs 
scheduled on those, got starved. Tailing jenkins logs on these masters never 
showed: 'Received offers’. Also, according to mesos master logs, mesos was 
sending offers to only a handful of frameworks. The logs below show the message 
from a minute, but I saw the similar behavior at other times, I have added a 
line break after each group of frameworks getting offers:

I0310 17:56:44.703126  1156 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0364
I0310 17:56:45.722951  1156 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0371
I0310 17:56:46.744184  1159 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0377
I0310 17:56:47.768546  1158 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0380
I0310 17:56:48.794517  1156 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0396

I0310 17:56:49.813484  1157 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0364
I0310 17:56:50.833155  1159 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0371
I0310 17:56:51.859712  1158 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0377
I0310 17:56:52.879678  1153 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0380
I0310 17:56:53.904261  1156 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0396

I0310 17:56:54.929472  1155 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0364
I0310 17:56:55.947387  1153 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0371
I0310 17:56:56.975060  1157 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0377
I0310 17:56:57.996995  1159 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0380
I0310 17:56:59.022555  1156 master.cpp:2250] Sending 24 offers to framework 
201403032301-1255541002-5050-1126-0396

Couple of questions:
1. Does running multiple frameworks (say more than 10), have an impact on 
resource allocation strategy ?
2. If a registered framework keeps declining mesos offers for a while, does 
mesos take that into account while sending offers ?

Links:
1. https://github.com/mesosphere/marathon
2. https://github.com/jenkinsci/mesos-plugin

-- 
Mohit

Reply via email to