Hello Everyone, Currently at Yahoo, we want to enable the Resource Aware Scheduler we built to have cgroup support. The CGroup code that is part of JStorm looks good and perhaps we can modify it slightly so that the Resource Aware Scheduler can interact with it. What I would like to do is modify the CGroup code that already exists in JStorm to be able to start jvm workers that is limited to the amount of resources that the resource aware scheduler has allocated for that worker and move it to Storm. I would like to have a discussion (especially with people that worked on JStorm) about how we can integrate support for the resource aware scheduler into the existing CGroups code. Also, I know the folks at Alibaba is working on converting the supervisor.clj to java which is tied to launching workers and in the future would include CGoups. What is the status of that? Best, Boyang Jerry Peng
On Thursday, January 14, 2016 9:25 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote: I would love to see true support for mesos, YARN, openstack, etc. added, but I also see stand alone mode offering a lot more flexibility, especially in the area of scheduling, than a two level scheduler can currently offer. It is on my roadmap to look into after the JStorm migration (just started), Resource Aware Scheduling (almost done needs testing and better isolation), and adding in automatic elasticity around topology specified SLAs (working with a few researchers around some prototypes in this area). To be able to support running on other cluster technologies in a proper way we need to provide plugability in a few different places. First we need a way for a scheduler/cluster to request topology specific dedicated resources, and for nimbus to provision, manage, monitor, and ideally resize (for elasticity) those resources. With security and resource aware scheduling, we need these external requests to be on a per topology bases, not bolted on like they are now. This would also necessitate the schedulers being updated so that they could take advantage of these new APIs requesting external resources either when a topology explicitly asks to be on a given external resource, or optionally when dedicated resources are no longer available and the topology has specified the proper configurations/credentials to allow it to run using those external resources. That handles scheduling, but there are some additional features that storm offers which other systems don't yet offer, and many never will. For example the storm blob store API is similar to the dist cache in YARN, but it we can do in place replacement without relaunching. We also favor fast fail and I don't think all of these types of clusters will nor should offer the process monitoring and re-spawning needed for it. As such we would need some sort of a supervisor that would also run under YARN/mesos, etc to provide this extra functionality. I have not totally thought about all of what it would need from a plugability standpoint to make that work. There is also the logviewer which does more then just logs, so we would need some pluggable way to be able to point people to where their logs/artifacts are, and to monitor the resource usage of the logs (perhaps that part should move off to the supervisor). All of that seems like a lot more work compared to providing a pluggable interface in the supervisor that would allow for it to provision, manage, monitor, and again possibly resize, local workers. In fact I see a lot of potential overlap between the two of them and the pluggability that would be needed in the supervisor for running on mesos, YARN, etc. - Bobby On Thursday, January 14, 2016 12:39 AM, Erik Weathers <eweath...@groupon.com.INVALID> wrote: Perhaps rather than just bolting on "cgroup support", we could instead open a dialogue about having Mesos support be a core feature of Storm. The current integration is a bit unwieldy & hackish at the moment, arising from the conflicting natures of Mesos and Storm w.r.t. scheduling of resources. i.e., Storm assumes you have existing "slots" for running workers on, whereas Mesos is more dynamic, requiring frameworks that run on top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are needed by the framework's tasks. One example of an issue with Storm-on-Mesos: the Storm logviewer is completely busted when you are using Mesos, I filed a ticket with a description of the issue and proposed modifications to allow it to function: - https://issues.apache.org/jira/browse/STORM-1342 Furthermore, there are fundamental behaviors in Storm that don't mesh well with Mesos: - the interfaces of INimbus (allSlotsAvailableForScheduling(), assignSlots(), getForcedScheduler(), etc.) make it difficult to create an ideal Mesos integration framework, since they don't allow the Mesos integration code to *really* know what's going on from the Nimbus's perspective. e.g., - knowing which topologies & how many workers need to be scheduled at any given moment. - since the integration code cannot know what is actually needed to be run when it receives offers from Mesos, it just hoards those offers, leading to resource starvation in the Mesos cluster. - the "fallback" behavior of allowing the topology to settle for having less worker processes than requested should be disable-able. For carefully tuned topologies it is quite bad to run on less than the expected number of worker processes. - also, this behavior endangers the idea of having the Mesos integration code *only* hoard Mesos offers after a successful round-trip through the allSlotsAvailableForScheduling() polling calls (i.e., only hoard when we know there are pending topologies). It's dangerous because while we wait for another call to allSlotsAvailableForScheduling(), the Nimbus may have decided that it's okie dokie to use less than the requested number of worker processes. I'm sure there are other issues that I can conjure up, but those are the major ones that came to mind instantly. I'm happy to explain more about this, since I realize the above bulleted info may lack context. I wish I knew something about how Twitter's new Heron project addresses the concerns above since it comes with Mesos support out-of-the-box, but it's unclear at this point what they're doing until they open source it. Thanks! - Erik On Wed, Jan 13, 2016 at 6:27 PM, 刘键(Basti Liu) <basti...@alibaba-inc.com> wrote: > Hi Bobby & Jerry, > > Yes, JStorm implements generic cgroup support. But just only cpu control > is enable when starting worker. > > Regards > Basti > > -----Original Message----- > From: Bobby Evans [mailto:ev...@yahoo-inc.com.INVALID] > Sent: Wednesday, January 13, 2016 11:14 PM > To: dev@storm.apache.org > Subject: Re: JStorm CGroup > > Jerry, > I think most of the code you are going to want to look at is here > https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java > The back end for most of it seems to come from > > > https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container > > Which looks like it implements a somewhat generic cgroup support. > - Bobby > > On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) < > basti...@alibaba-inc.com> wrote: > > > Hi Jerry, > > Currently, JStorm supports to control the upper limit of cpu time for a > worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup. > e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will > limit the corresponding process to occupy at most 300% cpu (3 cores). > > Regards > Basti > > -----Original Message----- > From: Jerry Peng [mailto:jerry.boyang.p...@gmail.com] > Sent: Wednesday, January 13, 2016 1:57 PM > To: dev@storm.apache.org > Subject: JStorm CGroup > > Hello everyone, > > This question is directed more towards the people that worked on JStorm. > If I recall correctly JStorm offers some sort of resource isolation through > CGroups. What kind of support does JStorm offer for resource isolation? > Can someone elaborate on this feature in JStorm. > > Best, > > Jerry > > > > >