On 1/6/16 8:20 PM, Amrith Kumar wrote:
Kevin Fox writes:

as far as that plugin ever should go. If you need scale up/down, etc, then
your starting to reimplement large swaths of Sahara, and like the Cinder
plugin for Nova, there could be a plugin that works identically to the stand
alone one that converts the same api over to a Sahara compatible one. You
then farm the work over to Sahara.
I believe that this is not the case. The entire framework for integration with 
Cinder, Nova etc., already exists in Trove.

Recall that trove already deals with about a dozen databases, several of which 
have support for clusters.

The code to add HBase support to trove doesn't have to implement all of this 
framework that already exists.

All that is being implemented is (literally) a Trove 'plugin' for HBase and a 
mechanism to build a HBase guest image.

-amrith

Right, I think that's the concern. A plugin for integration with a standalone/pseudo-distributed Hbase deployment has arguably a reasonable scale to be managed by a Trove guestagent. That agent would also fire up the client RPC services necessary for an end user to interact with Hbase remotely. But even the Hbase project views standalone mode as a devel/test capability only. The fully distributed model gets orders of magnitude more complex. Is the agent plugin just wiring into an existing multi-node Hbase deployment somewhere? Is it spawning/growing/shrinking HDFS endpoints itself?

The "we already have cluster support in Trove" argument doesn't really track in a production Hadoop space, IMHO. That's why Sahara was developed.

My $0.02,
\Pete


-----Original Message-----
From: Fox, Kevin M [mailto:[email protected]]
Sent: Wednesday, January 06, 2016 7:32 PM
To: OpenStack Development Mailing List (not for usage questions)
<[email protected]>
Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove

just my 2 cents... I think you can do both. The great thing about Trove is that
its providing an abstract api so users just deal with provisioning db's, scaling
db's, etc.

Having a simple plugin that doesn't depend on all of Sahara, for the case a
user only wants a single node HBase does make sense. Its much easier for an
Op to support that case if thats all their users ever want. But, thats probably
as far as that plugin ever should go. If you need scale up/down, etc, then
your starting to reimplement large swaths of Sahara, and like the Cinder
plugin for Nova, there could be a plugin that works identically to the stand
alone one that converts the same api over to a Sahara compatible one. You
then farm the work over to Sahara.

Then, its up to the ops to choose features and the overhead of supporting
Sahara, or not, and you don't have to support implementing a whole cluster
management system for Trove that already exists.

Thanks,
Kevin
________________________________________
From: Amrith Kumar [[email protected]]
Sent: Wednesday, January 06, 2016 3:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [trove] Adding support for HBase in Trove

TL;DR Should Trove treat HBase as a special database because one use case is
as part of a large multi-node Hadoop cluster, and therefore either not
support it at all, or necessarily use Sahara to provision and manage a cluster?
There are pro's and con's and it is argued that the con's outweigh the pro's
and a blueprint/specification, and an implementation for basic Trove support
for HBase independent of Sahara has been submitted for review. See [3], [4]
and [5]. The benefits include the ability to provide the commonly used (in
development) standalone mode operation, and eliminate the dependency
on an additional OpenStack project thereby simplifying deployment.
Comments and feedback are welcome on the implementation, as well as the
specification and the approach.

The long version follows below.

The OpenStack Trove mission is to provide scalable and reliable Cloud
Database as a Service provisioning functionality for both relational and non-
relational database engines, and to continue to improve its fully-featured
and extensible open source framework [1].

An important aspect of the Trove value proposition is that it provides a
common control plane, a common API, and a common set of abstractions are
used to manage a number of different relational, and non-relational
database technologies. The common API contains primitives to create
database instances and clusters of a number of databases including MySQL
(MariaDB, Percona too), PostgreSQL, MongoDB, Cassandra, CouchDB,
Couchbase, IBM DB2, Vertica, and Redis.

Cluster support is also available for a number of databases including
MongoDB, Percona XtraDB cluster and Vertica, with more to come
imminently.

In effect, Trove is a framework for provisioning and managing the lifecycle of
a number of different database technologies; it provides only the control
plane. Users can do things like provisioning instances and clusters, resizing
them, taking backups and creating new instances and clusters from previous
backups, establish and manage complex topologies including replication and
clustering, and resize instances and clusters.

Trove does interfere with the data plane, the applications interact directly
with the database using the native API's for each database technology.

Users of OpenStack look to Trove to provide a consistent set of interfaces for
managing their database resources in a variety of use-cases ranging from
small-scale prototyping, development, testing, and all the way through
production. Apache HBase is an open-source, distributed, versioned, non-
relational database [2] and users of HBase face many of the challenges that
Trove addresses for other databases. Therefore adding support for HBase in
Trove seems not only reasonable, but also consistent with the goal of the
(Trove) project.

A spec proposing the addition of HBase support for Trove was submitted [3]
and a first phase of code implementing this HBase support has also been
submitted for review [4], [5]. The process that has been followed is
consistent with other Trove datastores; add basic support and then
progressively augment it in subsequent releases. The code submitted allows
you to provision an HBase instance (which will launch on a Nova instance),
build an HBase guest image using the elements provided, resize the storage
and the instance, take a "backup" of the instance and store that backup on
Swift, and at a later time you can launch a new instance from that "backup".

One can operate HBase with or without HDFS; in fact HBase documents the
standalone mode of operation [6] where HBase is completely operational on
a single node and data is stored on the local file system. This standalone
mode provides a very useful construct for development and testing, and at a
later stage an application can be seamlessly migrated to work with an HBase
installation of some other "run mode" like "Fully Distributed".

Code submitted in [4] and [5] as described in [3] implement support for two
modes of operation namely "Standalone" and "Pseudo-Distributed". At a
later stage, support will be added for "Fully Distributed" consistent with the
way in which clustering support was delivered for other datastores like
MySQL and MongoDB.

Some have opined that Trove should not directly get into the business of
orchestrating Hadoop Clusters or anything to do with HBase, arguing that this
is something that Sahara already does, and should remain the sole domain of
Sahara.

I believe that since HBase is perfectly operable without HDFS, it seems
inappropriate to tightly couple HBase with Sahara whose primary motivation
is to provision 'data-intensive application clusters' [7]. Furthermore, as we
have found with other datastores, it is my belief that having a common
implementation model across multiple deployment topologies is a benefit for
Trove. Other considerations such as similarity to other databases supported
by Trove motivated a choice as illustrated in the specification. An architecture
where Trove can function entirely independent of Sahara is also a benefit for
end users, and a model where Trove has dependencies only on other core
OpenStack services considerably simplifies the deployment.

Comments and feedback are welcome on the code, as well as the
specification and the approach.

References:

[1] https://wiki.openstack.org/wiki/Trove#Mission_Statement
[2] https://hbase.apache.org/
[3] https://review.openstack.org/#/c/256079
[4] https://review.openstack.org/#/c/262048/
[5] https://review.openstack.org/#/c/262815/
[6] http://hbase.apache.org/0.94/book/standalone_dist.html
[7] https://wiki.openstack.org/wiki/Sahara

Thanks,

-amrith

--
Amrith Kumar, CTO                   | [email protected]
Tesora, Inc                         | @amrithkumar
125 CambridgePark Drive, Suite 400  | http://www.tesora.com
Cambridge, MA. 02140                |







__________________________________________________________
________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
[email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________
________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
[email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to