Perhaps rather than just bolting on "cgroup support", we could instead open a dialogue about having Mesos support be a core feature of Storm.
The current integration is a bit unwieldy & hackish at the moment, arising from the conflicting natures of Mesos and Storm w.r.t. scheduling of resources. i.e., Storm assumes you have existing "slots" for running workers on, whereas Mesos is more dynamic, requiring frameworks that run on top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are needed by the framework's tasks. One example of an issue with Storm-on-Mesos: the Storm logviewer is completely busted when you are using Mesos, I filed a ticket with a description of the issue and proposed modifications to allow it to function: - https://issues.apache.org/jira/browse/STORM-1342 Furthermore, there are fundamental behaviors in Storm that don't mesh well with Mesos: - the interfaces of INimbus (allSlotsAvailableForScheduling(), assignSlots(), getForcedScheduler(), etc.) make it difficult to create an ideal Mesos integration framework, since they don't allow the Mesos integration code to *really* know what's going on from the Nimbus's perspective. e.g., - knowing which topologies & how many workers need to be scheduled at any given moment. - since the integration code cannot know what is actually needed to be run when it receives offers from Mesos, it just hoards those offers, leading to resource starvation in the Mesos cluster. - the "fallback" behavior of allowing the topology to settle for having less worker processes than requested should be disable-able. For carefully tuned topologies it is quite bad to run on less than the expected number of worker processes. - also, this behavior endangers the idea of having the Mesos integration code *only* hoard Mesos offers after a successful round-trip through the allSlotsAvailableForScheduling() polling calls (i.e., only hoard when we know there are pending topologies). It's dangerous because while we wait for another call to allSlotsAvailableForScheduling(), the Nimbus may have decided that it's okie dokie to use less than the requested number of worker processes. I'm sure there are other issues that I can conjure up, but those are the major ones that came to mind instantly. I'm happy to explain more about this, since I realize the above bulleted info may lack context. I wish I knew something about how Twitter's new Heron project addresses the concerns above since it comes with Mesos support out-of-the-box, but it's unclear at this point what they're doing until they open source it. Thanks! - Erik On Wed, Jan 13, 2016 at 6:27 PM, 刘键(Basti Liu) <[email protected]> wrote: > Hi Bobby & Jerry, > > Yes, JStorm implements generic cgroup support. But just only cpu control > is enable when starting worker. > > Regards > Basti > > -----Original Message----- > From: Bobby Evans [mailto:[email protected]] > Sent: Wednesday, January 13, 2016 11:14 PM > To: [email protected] > Subject: Re: JStorm CGroup > > Jerry, > I think most of the code you are going to want to look at is here > https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java > The back end for most of it seems to come from > > > https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container > > Which looks like it implements a somewhat generic cgroup support. > - Bobby > > On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) < > [email protected]> wrote: > > > Hi Jerry, > > Currently, JStorm supports to control the upper limit of cpu time for a > worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup. > e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will > limit the corresponding process to occupy at most 300% cpu (3 cores). > > Regards > Basti > > -----Original Message----- > From: Jerry Peng [mailto:[email protected]] > Sent: Wednesday, January 13, 2016 1:57 PM > To: [email protected] > Subject: JStorm CGroup > > Hello everyone, > > This question is directed more towards the people that worked on JStorm. > If I recall correctly JStorm offers some sort of resource isolation through > CGroups. What kind of support does JStorm offer for resource isolation? > Can someone elaborate on this feature in JStorm. > > Best, > > Jerry > > > > >
