Hi, I am currently working for some projects using Mesos at Atos Toulouse and we are using it on top of a classical IaaS.
After playing with Mesos and looking at some code it appears to me that there is no elasticity mechanism in place. I opened an issue in Jira some months ago here, which contains most of the content of this email : https://issues.apache.org/jira/browse/MESOS-2453 Here is what I have in mind (ppt in the following link for the detailed and visual version ☺ ) : - Add the possibility for a framework to signal that it has some work pending (with or without further semantics regarding what resources is wished ?) - Modify the Mesos algo to call a pluggable driver when no resource is available and at least one framework has some work to do. In this case the driver should scale up the Mesos cluster by launching VMs. How much and of which size is a little tricky here without adding semantics to the framework signal. - We should also add a flag somewhere to mark the slave as "volatile" so we can prefer the use of static resources, and shut down the volatile slaves after some time left unused. https://docs.google.com/presentation/d/1eNQSvDQ64gPNbmf0YVPq9tIWLMCbAHExos5WXrm0uqI/edit?usp=sharing Does it look doable to you ? what do you think about the principle ? Do you think we can add some semantics to the "I have work to do" framework signal without breaking the two-level scheduling principle ? I don't think it violates it since both mechanisms (signaling a need and effectively take a resource from an offer) are fully independent in my proposal but I feel a little out of my league to be sure. This proposal currently doesn't specifically address bin packing, however with the aforementioned modifications in place it should be easy to add since we know which resources are volatile. I have seen some other work (by Netflix for example) address this problem however it always seems to be at the framework level and not inside the core Mesos architecture, is there a reason for that except lack of time for specification/contribution ? http://fr.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-using-apache-mesos-in-the-cloud Regards, Mathieu Velten Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l’intention exclusive des destinataires désignés. Il contient des informations confidentielles et pouvant être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de détruire le message. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse de l’émetteur. L'internet ne garantissant pas l'intégrité de ce message lors de son acheminement, Atos (et ses filiales) décline(nt) toute responsabilité au titre de son contenu. Bien que ce message ait fait l’objet d’un traitement anti-virus lors de son envoi, l’émetteur ne peut garantir l’absence totale de logiciels malveillants dans son contenu et ne pourrait être tenu pour responsable des dommages engendrés par la transmission de l’un d’eux. This message and any attachments (the "message") are intended solely for the addressee(s). It contains confidential information, that may be privileged. If you receive this message in error, please notify the sender immediately and delete the message. Any use of the message in violation of its purpose, any dissemination or disclosure, either wholly or partially is strictly prohibited, unless it has been explicitly authorized by the sender. As its integrity cannot be secured on the internet, Atos and its subsidiaries decline any liability for the content of this message. Although the sender endeavors to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.