Ben, I agree that isolation encompass more than performance isolation, but instead of inflating with too granular working groups, I thought we could start work under the 'isolation' working group. The group was passive before but had an entry in the document. I have no real preference and can rename to 'performance isolation'.
Deepak, We are very interested in that area as well. Placement biases based on interference/sensitivity profiles, balancing power and load, etc. Hope that we can get to a nice decoupled way of doing this, so those details (analysis, objectives, etc) doesn't leak into the allocator. Cheers, Niklas On Fri, Jan 29, 2016 at 8:48 PM, Deepak Vij (A) <[email protected]> wrote: > On the similar lines, Interference-aware scheduling could be one of the > desired capabilities from a Resource Manager like Mesos. This essentially > is tied into the fact that all data centers/nodes are not really > homogeneous. Typically, it is assumed that all placement choices are > equally good. Although, different types of machines are mixed within the > same cluster, and co-located tasks compete for resources, which leads to > negative interference. > > In order to solve Interference-aware scheduling problem, one might have to > periodically monitor running tasks performance and use the information > collected to make better future scheduling decisions. Having explicit > information about the environment helps make optimal choices for > co-scheduling and workload partitioning, and may yield superior performance > on many common workloads. Collected detailed resource utilization and > performance profiles from running tasks could be things such as measuring > CPU and memory usage, cache misses etc. etc. > > My question is would such Interference-aware scheduling capability fit > into the similar category or it should be something separate altogether. > Thanks. > > Regards, > Deepak Vij > (Huawei Software Lab., Santa Clara) > > -----Original Message----- > From: Kevin Klues [mailto:[email protected]] > Sent: Friday, January 29, 2016 11:28 AM > To: [email protected] > Subject: Re: Core affinity in Mesos > > I agree. "Isolation" on it's own is too broad a term. However, since > we are talking mostly about reducing interference, which typically > implies performance isolation, my vote for the group name is the > "Performance Isolation Working Group". > > On Fri, Jan 29, 2016 at 11:22 AM, Benjamin Mahler <[email protected]> > wrote: > > Since "Isolation" applies broadly outside of the context of addressing > > latency sensitive workloads (e.g. user/pid/network namespacing, > > resource limitations (e.g. cpu quota, memory limits, gpu device > visibility) it > > would be great to choose a more specific name. Some suggestions: > > interference, performance-related isolation, colocation, latency > > sensitivity. > > > > Thoughts? > > > > Looking forward to seeing the discussions here! > > > > Ben > > > > On Friday, January 22, 2016, Nielsen, Niklas <[email protected]> > > wrote: > > > >> Hi everyone, > >> > >> We have been talking about core affinity in Mesos for a while, and Ian > D. > >> has recently been giving this topic thought in his ‘exclusive resources’ > >> proposal [1]. > >> Trying to avoid too conservative placements, latency critical workloads > >> are at risk without it. > >> We are interested in the topic through our work on oversubscription in > >> Serenity [2], as oversubscription was exactly to be able to colocate > >> latency critical and best-effort batch jobs. > >> We had an informal meeting yesterday, going over the proposal and trying > >> to get some cadence behind the capability. > >> > >> It is a tricky but exciting topic: > >> - How do we avoid making task launch even more complex? How do we > express > >> the topology and acquire parts of it. Do we use hints on the affinity > >> properties instead? > >> - How do we mix pinned with normal ‘floating’ tasks. > >> - How do we convey information to the resource estimator about the task > >> sensitivity. > >> > >> Note, above list not meant for inlined discussion or answers. Let’s > >> collect feedback on the proposals themselves. > >> > >> Here are our proposed next steps: > >> - We are going to use the ‘Isolation Working Group’ as an umbrella for > >> this. I will fill in details and members. > >> - We will schedule an online meeting within the Wednesday 9AM PST next > >> week discussing next steps. I will share a hangout link when we get > closer. > >> - Plan being, getting to designs (maybe more than one) we agree on and > >> then scope out and distribute the work needed to be done. > >> > >> Who ever is interested, join us. The use cases for this work are > critical. > >> Maybe we can even work on some representative workloads we can verify > our > >> proposal against. > >> > >> Cheers, > >> Niklas > >> > >> PS For comments on the proposal itself, please refer to Ian’s thread for > >> the dev list [3]. > >> > >> [1] https://issues.apache.org/jira/browse/MESOS-4138 > >> [2] https://github.com/mesosphere/serenity > >> [3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html > >> > > > > -- > ~Kevin > -- Niklas
