Thanks for bringing this project out into the open. Looks like a significant amount of effort, and worthwhile having a hard look. Moreover, it is good to see more projects using or extending Whirr, either under or above (heh or maybe both in this case!).
Something like this sounds like it can add persistence and recovering to provisioning workflows. That's via activiti, right? I imagine how provisionr does resiliency (as well ha/clustering of provisioning tasks) would make an exciting slideshare read. Do keep us up-to-date. WRT whirr: I suppose whirr currently is possible to run embedded as a library, so direct dependency on provisionr isn't going to work provided we wish to continue this mode. That said, could be an interesting experiment to look at another, more mesos-like, "whirr as a service": provisionr as a tlp or a "service" subproject within whirr. interesting discussion regardless. Congrats and thanks for opening up this project! -A jclouds-related footnotes for the curious: Not that this is the forum for it, but you might recall that jclouds 1.6 alpha is blocking on throttling/request efficiency/resilience improvements. Seems by your description that you've simultaneously been attacking this, among your other features. This has been slow on jclouds for a number of reasons, particularly looking for the way to do this without creating a strict service dependency, and without more spaghetti, or a bunch of lib deps. The RAX next gen throttle-thing denial of serviced many of us, and a lot of development time, effectively bumping priority. Steve helped raise an abstraction of http throttle error, which is now in the 1.5.x codeline. Next step is to employ it with a system that shares quality information, potentially out to an external service like hystrix if users choose. RAX raised the throttle globally, due to collaboration with jclouds, but jclouds aren't dropping the ball, are progressing this, and will release a library-only solution as a part of 1.6. WRT resumable workflow, I've looked at several systems that claim the ability to perform lightweight workflow or FSM. There's a ton of tech available for use, but very few have a embedded mode that doesn't do something like start zookeeper, or have ESB or BPM ambitions which tend to bloat the dependency tree. FWIW, my personal opinion is pipeline has the cleanest syntax, though it suffers from no OSS version as yet. 2 weeks ago, I started a conversation with google folks about this, but no news. Regardless, I'll keep folks posted as jclouds has for a long time aimed to have a resumable workflow aptitude without compromising light deps. http://code.google.com/p/appengine-pipeline/ On Fri, Dec 14, 2012 at 7:34 AM, Andrei Savu <[email protected]> wrote: > Hi guys, > > There is no secret that at Axemblr we are using Apache Whirr for > provisioning and initial basic cluster configuration for Hadoop. As soon as > the machines are running we configure Hadoop by leveraging APIs from > existing tools like Cloudera Manager or Ambari. > > All the orchestration needed to make this happen is not trivial if you want > the final system to be predictable, robust, restartable and easy to inspect > while running. > > A few months ago we've realised that we need to re-work the machine > provisioning layer from Whirr and build a system that has the following > features: > > * should be able to provision 10s or 100s of virtual machines by doing a > good job at handling API throttling and by using batch operations as much > as possible > > * all the internal workflows should be persistent and as granular as > possible and each step should be idempotent > > * it should be possible to restart the application server while starting > virtual machines with no impact > > * it should have a modular architecture and provide enough flexibility to > be able to work with a large number of public and private clouds just by > replacing modules > > * it should hide all this complexity behind a simple REST API and a simple > interactive shell > > * it should be able to automatically build gold base images and use the to > spawn large clusters > > We've spent some time looking for existing products that do all this and in > the end we've decided that it's better to start from scratch and build this > system as a new project based on Activiti, Apache Karaf, jclouds and native > sdks. > > The source code is now publicly available at: > > https://github.com/axemblr/axemblr-provisionr > > I would really like to know what you think about the work we've done so > far. The project will improve a lot over the next couple of weeks / months > so I encourage you to stay tunned. > > We want to bring this project to the Apache Foundation later on. I will > give a talk in february at ApacheCon NA on this. > > Cheers, > > -- Andrei Savu / axemblr.com >
