Re: Mesos 0.27.0 release update
Tests all passed in ubuntu 14.04 with kernel 3.13 and docker engine 1.9.1. On Sat, Jan 23, 2016 at 3:11 AM, Timothy Chen wrote: > Hi all, > > (Kapil, MPark and I) We're still having 3 blocker issues outstanding > at this moment: > > MESOS-4449: SegFault on agent during executor startup (shepherd: Joris) > MESOS-4441: Do not allocate non-revocable resources beyond quota > guarantee. (shepherd: Joris) > MESOS-4410: Introduce protobuf for quota set request. (shepherd: Joris) > > The remaining major tickets are ContainerLogger related and should be > committed today according to Ben. > > We've started to test latest master and will be looking at the test > failures to see what needs to be addressed. > > I encourage everyone to test the latest master on your platform if > possible to catch issues early, and once the Blocker issues are > resolved we'll be sending a RC to test and vote. > > Thanks, > > Tim >
Re: question about DRF and task allocation to slaves
> > My question is - if the slaves to execute the tasks are decided by the > scheduler, how does the DRF get used by the master in balancing the slaves? > Does it not get used for choosing the right slaves for the tasks? Relevant > pointers to the code are also welcome. > I think the DRF allocator (hierarchical.cpp) is mainly used to balance the fair share of roles/frameworks rather than slaves, and it does not get used for choosing the right slaves for tasks (since it has no idea about tasks) which is the responsibility of framework scheduler, instead it is responsible for sending slaves to framework scheduler as resource offer and such process can be affected by framework scheduler as well, e.g., framework scheduler sets filter to refuse a specific slave for a specific duration, https://github.com/apache/mesos/blob/0.26.0/src/master/allocator/mesos/hierarchical.cpp#L761:L806
question about DRF and task allocation to slaves
Hi Devs, I have started looking at the Mesos core code and came upon following observations. I have seen a few dots but haven’t been able to connect them. Please let me know if you can As we know the DRF (Dominant Resource Framework) algorithm is used in Mesos to allocate the tasks to the slaves so that the dominant share is equalized (or as equal as possible). The example can be seen in the paper (https://www.cs.berkeley.edu/~alig/papers/drf.pdf). Thus, it looks like Mesos master has the control to allocate the tasks to the slaves (by internally maintaining the status of all tasks for various frameworks and slaves and their offers). On the other hand, there are schedulers (am using Netflix Fenzo) whose job is to figure out the slaves most suitable for the tasks and the API exposed through the Mesos Java API used by Fenzo is — MesosSchedulerDriver.launchTasks(Collection offerIds, Collection taskIds); The code that I have seen through in Mesos is (master, drf/sorter etc - master.cpp, hierarchical.cpp) showing me on various events (methods in master.cpp) like addFramework(), activeFramework(), addSlave() etc, there is DRF method –>calculateShare() being called in sorter.cpp. But when the tasks are allocate to master and launchTask() method is called in master (by passing the map of tasks assigned to slaves), the slaves which have been calculated by the scheduler and present in the map of tasks and offers get called directly. My question is - if the slaves to execute the tasks are decided by the scheduler, how does the DRF get used by the master in balancing the slaves? Does it not get used for choosing the right slaves for the tasks? Relevant pointers to the code are also welcome. Thanks, Ashish
Re: [Discussion] MESOS-4442: `allocated` may have more resources then `total` in allocator
@team, any comments? Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer Platform OpenSource Technology, STG, IBM GCG +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me On Thu, Jan 21, 2016 at 9:31 PM, Klaus Ma wrote: > Yes, *total*: cpus(*):2 vs. *allocated*: cpus(*):2;cpus(*){REV}:2 > > > Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer > Platform OpenSource Technology, STG, IBM GCG > +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me > > On Thu, Jan 21, 2016 at 5:43 PM, Qian Zhang wrote: > >> In the log you posted, it seems total cpus is also 2 rather than 1, but it >> seem there are 4 allocated cpus (2 non-revocable and 2 revocable)? >> >> I0121 17:08:09.303431 4284416 hierarchical.cpp:528] Slave >> f2d8b550-ed52-44a4-a35a-1fff81d41391-S0 (9.181.90.153) updated with >> oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; >> ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; >> ports(*):[31000-32000]; *cpus(*){REV}:2*) >> >> >> Thanks, >> Qian Zhang >> >> On Thu, Jan 21, 2016 at 5:25 PM, Klaus Ma wrote: >> >> > Hi team, >> > >> > When I double-check the feature interaction between Optimistic Offer >> Phase >> > 1 & Oversubscription, I found an issue that `allocated` may have more >> > resources then `total` in allocator when enable Oversubscription. I'd >> like >> > to get your input on whether this is design behaviour, although the >> impact >> > is low: 1.) allocator will not offer this delta resources, 2) QoS >> > Controller will correct it later by killing the executor. Personally, >> I'd >> > like to keep this assumption in allocator: slave.total always contains >> > slave.allocated. >> > >> > Here's the steps: >> > >> > T1: in cluster, cpus=2: one is revocable and the other one is >> nonRevocable >> > T2: framework1 get offer cpus=2, launch task but estimator report empty >> > resources before executor launched >> > T3: slave.total is updated to cpus=1 in >> > HierarchicalAllocatorProcess::updateSlave >> > T4: in allocate(), slave.total (cpus=1) < slave.allocated (cpus=2) >> > >> > Here's the log I got: >> > >> > I0121 17:08:09.303431 4284416 hierarchical.cpp:528] Slave >> > f2d8b550-ed52-44a4-a35a-1fff81d41391-S0 (9.181.90.153) updated with >> > oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; >> > ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; >> > ports(*):[31000-32000]; *cpus(*){REV}:2*) >> > >> > Please refer to MESOS-4442 for more detail. >> > >> > >> > Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer >> > Platform OpenSource Technology, STG, IBM GCG >> > +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me >> > >> > >
Mesos 0.27.0 release update
Hi all, (Kapil, MPark and I) We're still having 3 blocker issues outstanding at this moment: MESOS-4449: SegFault on agent during executor startup (shepherd: Joris) MESOS-4441: Do not allocate non-revocable resources beyond quota guarantee. (shepherd: Joris) MESOS-4410: Introduce protobuf for quota set request. (shepherd: Joris) The remaining major tickets are ContainerLogger related and should be committed today according to Ben. We've started to test latest master and will be looking at the test failures to see what needs to be addressed. I encourage everyone to test the latest master on your platform if possible to catch issues early, and once the Blocker issues are resolved we'll be sending a RC to test and vote. Thanks, Tim
Core affinity in Mesos
Hi everyone, We have been talking about core affinity in Mesos for a while, and Ian D. has recently been giving this topic thought in his ‘exclusive resources’ proposal [1]. Trying to avoid too conservative placements, latency critical workloads are at risk without it. We are interested in the topic through our work on oversubscription in Serenity [2], as oversubscription was exactly to be able to colocate latency critical and best-effort batch jobs. We had an informal meeting yesterday, going over the proposal and trying to get some cadence behind the capability. It is a tricky but exciting topic: - How do we avoid making task launch even more complex? How do we express the topology and acquire parts of it. Do we use hints on the affinity properties instead? - How do we mix pinned with normal ‘floating’ tasks. - How do we convey information to the resource estimator about the task sensitivity. Note, above list not meant for inlined discussion or answers. Let’s collect feedback on the proposals themselves. Here are our proposed next steps: - We are going to use the ‘Isolation Working Group’ as an umbrella for this. I will fill in details and members. - We will schedule an online meeting within the Wednesday 9AM PST next week discussing next steps. I will share a hangout link when we get closer. - Plan being, getting to designs (maybe more than one) we agree on and then scope out and distribute the work needed to be done. Who ever is interested, join us. The use cases for this work are critical. Maybe we can even work on some representative workloads we can verify our proposal against. Cheers, Niklas PS For comments on the proposal itself, please refer to Ian’s thread for the dev list [3]. [1] https://issues.apache.org/jira/browse/MESOS-4138 [2] https://github.com/mesosphere/serenity [3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html