As Kevin suggests, I'm adding [sahara] to the subject line. Others in sahara who now see this thread, apologies for sending you a delayed invitation to the party. There's still lots of food and beer so come on in!
-amrith > -----Original Message----- > From: Fox, Kevin M [mailto:kevin....@pnnl.gov] > Sent: Thursday, January 07, 2016 7:32 PM > To: OpenStack Development Mailing List (not for usage questions) > <openstack-dev@lists.openstack.org> > Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove > > While I applaud raising the issue on the mailing list to get more folks to > weigh > in, I think part of the problem maybe the lack of a [sahara] tag on the > subject. > The thread is still tagged to be a Trove centric conversation. All respondents > please consider adding [sahara] to the subject. > > Thanks, > Kevin > ________________________________________ > From: Amrith Kumar [amr...@tesora.com] > Sent: Thursday, January 07, 2016 1:59 PM > To: OpenStack Development Mailing List (not for usage questions) > Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove > > > -----Original Message----- > > From: michael mccune [mailto:m...@redhat.com] > > Sent: Thursday, January 07, 2016 3:12 PM > > To: openstack-dev@lists.openstack.org > > Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove > > > > On 01/07/2016 11:59 AM, Amrith Kumar wrote: > > > From the things that you and Pete (Peter MacKinnon) are saying, I > > > don't > > understand why there is an objection to accepting the currently > > proposed implementation which is clearly for single node deployments? > > Both Standalone and Pseudo-Distributed are by definition, explicitly, > > necessarily, absolutely, positively, definitely single node. I can't > > be more explicit about that. That's all that is being proposed at this > > time. See more comments below. > > > > i didn't think i explicitly objected to the spec, if it seems that way > > then i apologize. after reading the spec and the comments, it seemed > > that there was some question about engagement with the sahara team. i > > wanted to help bring some light to the issues surrounding deploying > > hbase and thought it would be good to participate in the discussion. > > You are correct Michael. There was a suggestion that we should engage with > the Sahara team (in the Trove team meeting yesterday) and that is what > prompted this email thread. So I appreciate your participation as one who is a > member of the Sahara team. > > > > > > Further, the current proposal also chooses an implementation > > > strategy that > > makes it much easier to handle fully-distributed in a different way in > > the future. Consider this, Trove could equally well have dealt with > > HBase using a single datastore for all operating modes. In the current > > implementation, one would create a HBase standalone instance using a > command that included: > > > > > > --datastore hbase-standalone > > > > > > And a pseudo-distributed instance by including > > > > > > --datastore hbase-pseudo-distributed. > > > > > > > and this delineation sounds reasonable to me > > > > > Trove could equally well function by having a single datastore > > > (hbase) but > > this would make hbase-fully-distributed harder to do in a different > > way in the future. I consciously eschewed that path, for this very > > specific reason; it would limit choice in the future. > > > > agreed > > > > > Now, the implementation behind hbase-fully-distributed could be a > > custom Trove guest agent that could (if we decided to go that route) > > interact with Sahara. However, an alternative implementation of > > hbase-fully- distributed could orchestrate everything natively in > > Trove. There is much flexibility in the current proposal, and I submit > > to you that this is being lost in your reading of the specification > > and the current implementation as proposed. > > > > i don't think your characterization of my reading comprehension is fair. > > as i stated earlier, i wanted to participate in the discussion > > surrounding deploying a technology that sahara currently deploys. > > fwiw, i agree with what you are saying here, but i also think it is > > axiomatic, the trove team can choose whichever path it would like for > implementation. > > > > >> i think this sounds reasonable, as long as we are limiting it to > > >> standalone mode. if the deployments start to take on a larger scope > > >> i agree it would be useful to leverage sahara for provisioning and > > >> scaling. > > > > > > Why only standalone? The current proposal explicitly covers only > > standalone and pseudo-distributed which are both valid strictly (add > > other adjectives here to taste) single node topologies and the > > currently submitted specification specifically carves out > > fully-distributed operation as requiring further thought and contemplation. > > > > i think starting with standalone mode (and not pseudo-distributed) is > > a more conservative approach to this. my reason for suggesting > > limiting this to standalone is that even in pseudo-distributed mode > > the need for managing hdfs and zookeeper are present, i wanted to > > highlight some of of the overlap and the issues that will start to creep in > surrounding this deployment. > > > > The current code (submitted for review) provides both standalone and > pseudo-distributed support. You will observe that the standalone and > pseudo-distributed implementations do install zookeeper. As you are no > doubt aware, one of the recommended ways to force the HBase Master > server to always bind to a well-known port in favor of the ephemeral ports is > to stipulate hbase.cluster.distributed is True (see > https://review.openstack.org/#/c/262048/5/scripts/files/elements/ubuntu- > hbase-standalone/install.d/20-install-hbase line 121). So, as it turns out, > the > code to deploy hdfs and zookeeper is already part of the proposed > implementation. > > > > >> as the hbase installation grows beyond the standalone mode there > > >> will necessarily need to be hdfs and zookeeper support to allow for > > >> a proper production deployment. this also brings up questions of > > >> allowing the end- users to supply configurations for the hdfs and > > >> zookeeper processes, not to mention enabling support for high > > >> availability > > hdfs. > > > > > > These are things that Trove already addresses, albeit in a different > > > way > > than Sahara. Users can, as it turns out, specify configuration groups > > which can then be used to launch new instances, and can also be > > associated with groups of instances. > > > > i am merely identifying issues that trove will need to reproduce, i'm > > not deeply familiar with the configuration options that trove exposes > > but i am guessing that it is currently not generating the > > configurations specific to hdfs and zookeeper. > > > > It is equally important, I think, to realize that Trove doesn't have to > produce a > whole lot of new code to handle this as it already has a robust framework > that handles a number of databases. Therefore, with a relatively small code > footprint a prototype that will allow much more flexible configuration > support has been prototyped (that has not been sent up for review yet). The > majority of that code is a codec for XML, the rest of it is almost completely > handled by the framework with the exception of a file specifying the > configuration options that are to be supported. > > Therefore, I'd like to reiterate that Trove, by its very design was intended > to > support a number of databases and therefore already has much of the > framework in place to add support for a new database. Therefore there isn't > a lot of new code that must be 'reproduced' to add this support. > > > >> i can envision a scenario where trove could use sahara to provision > > >> and manage the clusters for hbase/hdfs/zk. this does pose some > > >> questions as we'd have to determine how the trove guest agent would > > >> be installed on the nodes, if there will need to be custom > > >> configurations used by trove, and if sahara will need to provide a > > >> plugin for bare (meaning no data processing > > >> framework) hbase/hdfs/zk clusters. but, i think these could be > > >> solved by either using custom images or a plugin in sahara that > > >> would install the necessary agents/configurations. > > > > > > Let us not underestimate the effort for an end user to now deploy > > > one > > more project. To a user already using Trove for a myriad of databases, > > requiring Sahara for supporting HBase Standalone sounds (to put it > > bluntly) a burden. Requiring it for Fully-Distributed mode may have > > some development benefits but it remains to be seen whether those > > benefits are really worth the contortions that Trove would have to go > > through. And in the Trove architecture, there is flexibility as > > described above to have multiple possible implementations for > > fully-distributed, one that would interface with Sahara and another that > didn't have to. > > > > i agree about the installation issues when we are talking about > > standalone versus distributed. as for the contortions that trove may > > have to go through to integrate with sahara, i think it would be worth > > it, but i'm probably biased here ;) > > > > > Let's be clear that for a person who wants a fully configurable > > > Hadoop > > based deployment with more control, Sahara may be the best option. And > > to one who wants even more control, maybe doing it themselves with > > Nova and customer Glance Images is the way to go. Similarly, a > > Database-as-a- Service comes with the understood boundaries imposed by > > the "as-a- Service" deployment. Not all configuration options may be > > tweakable with a DBaaS, that's well known an understood, not just in > > Trove but also, for example, in Amazon RDS, RedShift or any of the > > other database-as-a-service implementations. The same would be true in > > fully-distributed as well, in the proposal that is currently under > > review. I submit to you that this nuance is being lost in your reading. > > > > i'd like to think that for someone who wants a fully configurable > > hadoop base deployment, sahara is the best option =) > > > > i think we generally agree here about the deployment of "-aaS" > > services in openstack, and again i disagree with your characterization > > of my reading comprehension... > > > > >> of course, this does add a layer of complexity as operators who > > >> wish this type of deployment will need to have both trove and > > >> sahara, but imo this would be easier than replicating the work that > > >> sahara has done with these technologies. > > > > > > I think this is where our opinions differ, as the 'replication' > > > isn't all that > > much given the fact that Trove already provides capabilities to > > cluster databases. But, with that said, nothing in the current > > specification locks us into a specific deployment strategy in the > > future, nor does it preclude multiple implementations of > > fully-distributed, one which could leverage Sahara and one which didn't. > > > > respectfully, i think there is more effort involved with the > > management of the pseudo-distributed mode than standalone, and that is > > more where my comments are oriented towards. mind you, provisioning > > might be a simple matter for trove as it stands now, but i think the > > potential for issues could get deeper with pseudo-distributed. > > Here, again, I want to point out that the issues will definitely be more with > pseudo-distributed than with standalone. But, Trove is already a multi- > database framework and therefore adding support for one more database > doesn't require a whole new implementation. > > > > > i'm glad that you are open to the idea of implementations that may > > involve other projects (namely sahara) in the future. as i said in the > > beginning, given the comments about sahara in the spec and the review > > i wanted to make sure we got a few more eyes on this to bring our > experience to the table. > > Absolutely, that's the intent of the ML conversation. > > > > > regards, > > mike > > > > > __________________________________________________________ > > ________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: OpenStack-dev- > > requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________ > ________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev- > requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________ > ________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev- > requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev