On Sep 12, 2013, at 10:30 AM, Michael Basnight wrote: > On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote: > >> Sergey Lukjanov wrote: >> >>> [...] >>> As you can see, resources provisioning is just one of the features and the >>> implementation details are not critical for overall architecture. It >>> performs only the first step of the cluster setup. We’ve been considering >>> Heat for a while, but ended up direct API calls in favor of speed and >>> simplicity. Going forward Heat integration will be done by implementing >>> extension mechanism [3] and [4] as part of Icehouse release. >>> >>> The next part, Hadoop cluster configuration, already extensible and we have >>> several plugins - Vanilla, Hortonworks Data Platform and Cloudera plugin >>> started too. This allow to unify management of different Hadoop >>> distributions under single control plane. The plugins are responsible for >>> correct Hadoop ecosystem configuration at already provisioned resources and >>> use different Hadoop management tools like Ambari to setup and configure >>> all cluster services, so, there are no actual provisioning configs on >>> Savanna side in this case. Savanna and its plugins encapsulate the >>> knowledge of Hadoop internals and default configuration for Hadoop services. >> >> My main gripe with Savanna is that it combines (in its upcoming release) >> what sounds like to me two very different services: Hadoop cluster >> provisioning service (like what Trove does for databases) and a >> MapReduce+ data API service (like what Marconi does for queues). >> >> Making it part of the same project (rather than two separate projects, >> potentially sharing the same program) make discussions about shifting >> some of its clustering ability to another library/project more complex >> than they should be (see below). >> >> Could you explain the benefit of having them within the same service, >> rather than two services with one consuming the other ? > > And for the record, i dont think that Trove is the perfect fit for it today. > We are still working on a clustering API. But when we create it, i would love > the Savanna team's input, so we can try to make a pluggable API thats usable > for people who want MySQL or Cassandra or even Hadoop. Im less a fan of a > clustering library, because in the end, we will both have API calls like POST > /clusters, GET /clusters, and there will be API duplication between the > projects.
+1. I am looking at the new cluster provisioning API in Trove [1] and the one in Savanna [2], and they look quite different right now. Definitely some collaboration is needed even the API spec, not just the backend. [1] https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API#POST_.2Fclusters [2] https://savanna.readthedocs.org/en/latest/userdoc/rest_api_v1.0.html#start-cluster > >> >>> The next topic is “Cluster API”. >>> >>> The concern that was raised is how to extract general clustering >>> functionality to the common library. Cluster provisioning and management >>> topic currently relevant for a number of projects within OpenStack >>> ecosystem: Savanna, Trove, TripleO, Heat, Taskflow. >>> >>> Still each of the projects has their own understanding of what the cluster >>> provisioning is. The idea of extracting common functionality sounds >>> reasonable, but details still need to be worked out. >>> >>> I’ll try to highlight Savanna team current perspective on this question. >>> Notion of “Cluster management” in my perspective has several levels: >>> 1. Resources provisioning and configuration (like instances, networks, >>> storages). Heat is the main tool with possibly additional support from >>> underlying services. For example, instance grouping API extension [5] in >>> Nova would be very useful. >>> 2. Distributed communication/task execution. There is a project in >>> OpenStack ecosystem with the mission to provide a framework for distributed >>> task execution - TaskFlow [6]. It’s been started quite recently. In Savanna >>> we are really looking forward to use more and more of its functionality in >>> I and J cycles as TaskFlow itself getting more mature. >>> 3. Higher level clustering - management of the actual services working on >>> top of the infrastructure. For example, in Savanna configuring HDFS data >>> nodes or in Trove setting up MySQL cluster with Percona or Galera. This >>> operations are typical very specific for the project domain. As for Savanna >>> specifically, we use lots of benefits of Hadoop internals knowledge to >>> deploy and configure it properly. >>> >>> Overall conclusion it seems to be that it make sense to enhance Heat >>> capabilities and invest in Taskflow development, leaving domain-specific >>> operations to the individual projects. >> >> The thing we'd need to clarify (and the incubation period would be used >> to achieve that) is how to reuse as much as possible between the various >> cluster provisioning projects (Trove, the cluster side of Savanna, and >> possibly future projects). Solution can be to create a library used by >> Trove and Savanna, to extend Heat, to make Trove the clustering thing >> beyond just databases... >> >> One way of making sure smart and non-partisan decisions are taken in >> that area would be to make Trove and Savanna part of the same program, >> or make the clustering part of Savanna part of the same program as >> Trove, while the data API part of Savanna could live separately (hence >> my question about two different projects vs. one project above). > > Trove is not, nor will be, a data API. Id like to keep Savanna in its own > program, but I could easily see them as being a big data / data processing > program, while Trove is a cluster provisioning / scaling / administration / > "keep it online" program. > >> >>> I also would like to emphasize that in Savanna Hadoop cluster management is >>> already implemented including scaling support. >>> >>> With all this I do believe Savanna fills an important gap in OpenStack by >>> providing Data Processing capabilities in cloud environment in general and >>> integration with Hadoop ecosystem as the first particular step. >> >> For incubation we bless the goal of the project and the promise that it >> will integrate well with the other existing projects. A >> perfectly-working project can stay in incubation until it achieves >> proper integration and avoids duplication of functionality with other >> integrated projects. A perfectly-working project can also happily live >> outside of OpenStack integrated release if it prefers a more standalone >> approach. > > A good example. Our instance provisioning was also implemented in Trove, but > the goal is to use Heat. So the TC asked us to use Heat for instance > provisioning, and we outlined a set of goals to achieve before we went to > Integrated status. > >> I think there is value in having Savanna in incubation so that we can >> explore those avenues of collaboration between projects. It may take >> more than one cycle of incubation to get it right (in fact, I would not >> be surprised at all if it took us more than one cycle to properly >> separate the roles between Trove / Taskflow / heat / clusterlib). During >> this exploration, Savanna devs may also decide that integration is very >> costly and that their immediate time is better spent adding key >> features, and drop from the incubation track. But in all cases, >> incubation sounds like the right first step to get everyone around the >> same table. >> >> -- >> Thierry Carrez (ttx) >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev