On Sep 12, 2013, at 10:30 AM, Michael Basnight wrote:

> On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote:
> 
>> Sergey Lukjanov wrote:
>> 
>>> [...]
>>> As you can see, resources provisioning is just one of the features and the 
>>> implementation details are not critical for overall architecture. It 
>>> performs only the first step of the cluster setup. We’ve been considering 
>>> Heat for a while, but ended up direct API calls in favor of speed and 
>>> simplicity. Going forward Heat integration will be done by implementing 
>>> extension mechanism [3] and [4] as part of Icehouse release.
>>> 
>>> The next part, Hadoop cluster configuration, already extensible and we have 
>>> several plugins - Vanilla, Hortonworks Data Platform and Cloudera plugin 
>>> started too. This allow to unify management of different Hadoop 
>>> distributions under single control plane. The plugins are responsible for 
>>> correct Hadoop ecosystem configuration at already provisioned resources and 
>>> use different Hadoop management tools like Ambari to setup and configure 
>>> all cluster  services, so, there are no actual provisioning configs on 
>>> Savanna side in this case. Savanna and its plugins encapsulate the 
>>> knowledge of Hadoop internals and default configuration for Hadoop services.
>> 
>> My main gripe with Savanna is that it combines (in its upcoming release)
>> what sounds like to me two very different services: Hadoop cluster
>> provisioning service (like what Trove does for databases) and a
>> MapReduce+ data API service (like what Marconi does for queues).
>> 
>> Making it part of the same project (rather than two separate projects,
>> potentially sharing the same program) make discussions about shifting
>> some of its clustering ability to another library/project more complex
>> than they should be (see below).
>> 
>> Could you explain the benefit of having them within the same service,
>> rather than two services with one consuming the other ?
> 
> And for the record, i dont think that Trove is the perfect fit for it today. 
> We are still working on a clustering API. But when we create it, i would love 
> the Savanna team's input, so we can try to make a pluggable API thats usable 
> for people who want MySQL or Cassandra or even Hadoop. Im less a fan of a 
> clustering library, because in the end, we will both have API calls like POST 
> /clusters, GET /clusters, and there will be API duplication between the 
> projects.


+1. I am looking at the new cluster provisioning API in Trove [1] and the one 
in Savanna [2], and they look quite different right now. Definitely some 
collaboration is needed even the API spec, not just the backend.

[1] 
https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API#POST_.2Fclusters
[2] 
https://savanna.readthedocs.org/en/latest/userdoc/rest_api_v1.0.html#start-cluster


> 
>> 
>>> The next topic is “Cluster API”.
>>> 
>>> The concern that was raised is how to extract general clustering 
>>> functionality to the common library. Cluster provisioning and management 
>>> topic currently relevant for a number of projects within OpenStack 
>>> ecosystem: Savanna, Trove, TripleO, Heat, Taskflow.
>>> 
>>> Still each of the projects has their own understanding of what the cluster 
>>> provisioning is. The idea of extracting common functionality sounds 
>>> reasonable, but details still need to be worked out. 
>>> 
>>> I’ll try to highlight Savanna team current perspective on this question. 
>>> Notion of “Cluster management” in my perspective has several levels:
>>> 1. Resources provisioning and configuration (like instances, networks, 
>>> storages). Heat is the main tool with possibly additional support from 
>>> underlying services. For example, instance grouping API extension [5] in 
>>> Nova would be very useful. 
>>> 2. Distributed communication/task execution. There is a project in 
>>> OpenStack ecosystem with the mission to provide a framework for distributed 
>>> task execution - TaskFlow [6]. It’s been started quite recently. In Savanna 
>>> we are really looking forward to use more and more of its functionality in 
>>> I and J cycles as TaskFlow itself getting more mature.
>>> 3. Higher level clustering - management of the actual services working on 
>>> top of the infrastructure. For example, in Savanna configuring HDFS data 
>>> nodes or in Trove setting up MySQL cluster with Percona or Galera. This 
>>> operations are typical very specific for the project domain. As for Savanna 
>>> specifically, we use lots of benefits of Hadoop internals knowledge to 
>>> deploy and configure it properly.
>>> 
>>> Overall conclusion it seems to be that it make sense to enhance Heat 
>>> capabilities and invest in Taskflow development, leaving domain-specific 
>>> operations to the individual projects.
>> 
>> The thing we'd need to clarify (and the incubation period would be used
>> to achieve that) is how to reuse as much as possible between the various
>> cluster provisioning projects (Trove, the cluster side of Savanna, and
>> possibly future projects). Solution can be to create a library used by
>> Trove and Savanna, to extend Heat, to make Trove the clustering thing
>> beyond just databases...
>> 
>> One way of making sure smart and non-partisan decisions are taken in
>> that area would be to make Trove and Savanna part of the same program,
>> or make the clustering part of Savanna part of the same program as
>> Trove, while the data API part of Savanna could live separately (hence
>> my question about two different projects vs. one project above).
> 
> Trove is not, nor will be, a data API. Id like to keep Savanna in its own 
> program, but I could easily see them as being a big data / data processing 
> program, while Trove is a cluster provisioning / scaling / administration / 
> "keep it online" program.
> 
>> 
>>> I also would like to emphasize that in Savanna Hadoop cluster management is 
>>> already implemented including scaling support.
>>> 
>>> With all this I do believe Savanna fills an important gap in OpenStack by 
>>> providing Data Processing capabilities in cloud environment in general and 
>>> integration with Hadoop ecosystem as the first particular step. 
>> 
>> For incubation we bless the goal of the project and the promise that it
>> will integrate well with the other existing projects. A
>> perfectly-working project can stay in incubation until it achieves
>> proper integration and avoids duplication of functionality with other
>> integrated projects. A perfectly-working project can also happily live
>> outside of OpenStack integrated release if it prefers a more standalone
>> approach.
> 
> A good example. Our instance provisioning was also implemented in Trove, but 
> the goal is to use Heat. So the TC asked us to use Heat for instance 
> provisioning, and we outlined a set of goals to achieve before we went to 
> Integrated status.
> 
>> I think there is value in having Savanna in incubation so that we can
>> explore those avenues of collaboration between projects. It may take
>> more than one cycle of incubation to get it right (in fact, I would not
>> be surprised at all if it took us more than one cycle to properly
>> separate the roles between Trove / Taskflow / heat / clusterlib). During
>> this exploration, Savanna devs may also decide that integration is very
>> costly and that their immediate time is better spent adding key
>> features, and drop from the incubation track. But in all cases,
>> incubation sounds like the right first step to get everyone around the
>> same table.
>> 
>> -- 
>> Thierry Carrez (ttx)
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to