On the similar lines, Interference-aware scheduling could be one of the desired capabilities from a Resource Manager like Mesos. This essentially is tied into the fact that all data centers/nodes are not really homogeneous. Typically, it is assumed that all placement choices are equally good. Although, different types of machines are mixed within the same cluster, and co-located tasks compete for resources, which leads to negative interference.
In order to solve Interference-aware scheduling problem, one might have to periodically monitor running tasks performance and use the information collected to make better future scheduling decisions. Having explicit information about the environment helps make optimal choices for co-scheduling and workload partitioning, and may yield superior performance on many common workloads. Collected detailed resource utilization and performance profiles from running tasks could be things such as measuring CPU and memory usage, cache misses etc. etc. My question is would such Interference-aware scheduling capability fit into the similar category or it should be something separate altogether. Thanks. Regards, Deepak Vij (Huawei Software Lab., Santa Clara) -----Original Message----- From: Kevin Klues [mailto:[email protected]] Sent: Friday, January 29, 2016 11:28 AM To: [email protected] Subject: Re: Core affinity in Mesos I agree. "Isolation" on it's own is too broad a term. However, since we are talking mostly about reducing interference, which typically implies performance isolation, my vote for the group name is the "Performance Isolation Working Group". On Fri, Jan 29, 2016 at 11:22 AM, Benjamin Mahler <[email protected]> wrote: > Since "Isolation" applies broadly outside of the context of addressing > latency sensitive workloads (e.g. user/pid/network namespacing, > resource limitations (e.g. cpu quota, memory limits, gpu device visibility) it > would be great to choose a more specific name. Some suggestions: > interference, performance-related isolation, colocation, latency > sensitivity. > > Thoughts? > > Looking forward to seeing the discussions here! > > Ben > > On Friday, January 22, 2016, Nielsen, Niklas <[email protected]> > wrote: > >> Hi everyone, >> >> We have been talking about core affinity in Mesos for a while, and Ian D. >> has recently been giving this topic thought in his ‘exclusive resources’ >> proposal [1]. >> Trying to avoid too conservative placements, latency critical workloads >> are at risk without it. >> We are interested in the topic through our work on oversubscription in >> Serenity [2], as oversubscription was exactly to be able to colocate >> latency critical and best-effort batch jobs. >> We had an informal meeting yesterday, going over the proposal and trying >> to get some cadence behind the capability. >> >> It is a tricky but exciting topic: >> - How do we avoid making task launch even more complex? How do we express >> the topology and acquire parts of it. Do we use hints on the affinity >> properties instead? >> - How do we mix pinned with normal ‘floating’ tasks. >> - How do we convey information to the resource estimator about the task >> sensitivity. >> >> Note, above list not meant for inlined discussion or answers. Let’s >> collect feedback on the proposals themselves. >> >> Here are our proposed next steps: >> - We are going to use the ‘Isolation Working Group’ as an umbrella for >> this. I will fill in details and members. >> - We will schedule an online meeting within the Wednesday 9AM PST next >> week discussing next steps. I will share a hangout link when we get closer. >> - Plan being, getting to designs (maybe more than one) we agree on and >> then scope out and distribute the work needed to be done. >> >> Who ever is interested, join us. The use cases for this work are critical. >> Maybe we can even work on some representative workloads we can verify our >> proposal against. >> >> Cheers, >> Niklas >> >> PS For comments on the proposal itself, please refer to Ian’s thread for >> the dev list [3]. >> >> [1] https://issues.apache.org/jira/browse/MESOS-4138 >> [2] https://github.com/mesosphere/serenity >> [3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html >> -- ~Kevin
