Yes, both HBase and Phoenix are packaged by Bigtop (I did the latter work), but this is largely orthogonal to the question of whether HBase should ship 'convenience binaries' including a SQL shell, except where I hear "let's not do that and point people to alternatives like Bigtop". Am I stating this position correctly? If not, please pardon. If so, thanks for the feedback.
I don't think we want a 'contrib'. We had one of those a long time ago and got rid of it. Note that I'm not suggesting we bring Phoenix *into* HBase. This also goes to Jon's comment about circular dependencies. There are no dependencies at all introduced here. This would be post-build packaging of a convenience binary only. I was hoping to sidestep semvar and multi-branching multi-versioning concerns by making this about convenience binaries only. By definition it's something done for the convenience of users on a best-effort basis. It doesn't have to cover all of our build combinatorics. Otherwise neither we nor the Phoenix project are going to be the position to always have version X to cover release Y of the other. We'd throw our hands up and leave the actual integration of Phoenix with HBase to the users themselves, or to Bigtop, or to commercial vendors. I don't think we need to hamstring ourselves like that, but if the community thinks the 'convenience binary' distinction doesn't matter, or doesn't matter *enough*, then ok, no need to discuss this further. Is that the case? > What if other projects were considered for this special treatment? > Projects like cask and tephra have a large overlap of hbase community > members as well. Would we have to have criteria to determine how/when to > include those project as well? Do we need a criteria in place to cover every related contingency whenever something is proposed? Has there been a problem getting consensus as needed on an ad hoc basis up to now? For all the good that surely will come of it, I think we have also opened the door to a legislative itch with semvar. I fear we will risk a constitutional crisis whenever Hadoop upgrades a dependency, but there *I* go conflating things. (smile) You can strike that as not related to the matter at hand. On Tue, Mar 17, 2015 at 7:26 PM, Sean Busbey <bus...@cloudera.com> wrote: > I like the idea of having an out of the box solution for using phoenix on > top of hbase, but I worry about the conflict when folks want to upgrade one > or the other. Our instructions for replacing Hadoop jars will get > substantially more complicated if they have to include phoenix and its > dependencies. > > Two possible compromise positions: > > * Apache Bigtop - it already integrates the rest of the stack. My apologies > if it's in there (or proposed and rejected), phone access limits my ability > to check. > > * Sub-Project - either hbase or phoenix could start a contrib repo that did > this out of the box combined distro. It could also try to help other > on-ramping problems, like setting up a cluster without having to manage > your own deployment of HDFS / ZK. > > As the subproject matures we'd have a lower risk way of assessing how > coupled hbase and phoenix releases are and what kind of deployment > efficiencies we get. > > -- > Sean > On Mar 17, 2015 7:47 PM, "Jonathan Hsieh" <j...@cloudera.com> wrote: > > > I like Nick's approach of including a hbase (and its deps) inside of > > phoenix releases or having the dockerfile with the components > "installed". > > This coupling seems more easy to manage since phoenix already has two > > branches for 0.94 and 0.98 support -- each could include its own hbase > and > > choose to upgrade point versions or minor versions without introducing > > confusion. That approach is a clean way to deal with semvar breaking > > dependencies in the other hadoop/hbase deps discussion (vs the > > hadoop1-hadoop2 compat stuff we had before). > > > > Only having phoenix binaries in the 0.98 branch may cause confusion. It > > would be a special case and break the new features in trunk convention > and > > if extended could potentially block releases of newer versions. > > > > If we kept the policy intact and Include phoenix in trunk/master (an > notion > > that should rightfully be avoided), we would cause problems if phoenix > > breaking API changes were introduced. It brings in other awkward > questions > > such as how often would we pull in the latest phoenix? are we willing to > > tolerate a broken master build (we sort of do already admittedly but > that > > is not ideal) ? would phoenix be able block a core hbase release? > > > > Are there examples of this kind of "reverse" inclusion in other projects? > > One that seems analogous is curator to zookeeper -- and curator is a > > separate project from zookeeper. > > > > What if other projects were considered for this special treatment? > > Projects like cask and tephra have a large overlap of hbase community > > members as well. Would we have to have criteria to determine how/when to > > include those project as well? > > > > Keeping the already large hbase project's scope and code base focused and > > independent of new circular dependencies seems prudent. > > > > Jon. > > > > > > On Tue, Mar 17, 2015 at 12:54 PM, Nick Dimiduk <ndimi...@gmail.com> > wrote: > > > > > I've been thinking of something along these lines as well. Rather an > > either > > > official Apache project, I was thinking it could be something as simple > > as > > > a github managed dockerfile that stands up a HBase + Phoenix singlenode > > > deal, see if momentum builds. > > > > > > Another idea is Phoenix could include HBase in its binary release, the > > same > > > way HBase includes Hadoop. That way there's an "out of the box" > > > distribution for Phoenix. That would be a discussion for the Phoenix > dev > > > list. > > > > > > -n > > > > > > On Tuesday, March 17, 2015, Andrew Purtell <apurt...@apache.org> > wrote: > > > > > > > Consider if the HBase project starts releasing new "convenience > > > binaries", > > > > in addition to the existing ones, in which we bundle a > > > recent/vetted/stable > > > > version of Phoenix, with the site file changes for loading their > > > > coprocessors already patched in (to hbase-default.xml) For now this > > would > > > > be done for 0.98 only, since that's the only release line supported > by > > an > > > > actively developed Phoenix version. We could also do this for 0.94 > > > releases > > > > with Phoenix 3 if the 0.94 RM wants, but I doubt there would be any > > > demand > > > > for this, Phoenix 3 is inactive because that community has all moved > to > > > 4, > > > > I'd imagine that carries over here. > > > > > > > > Advantages: > > > > > > > > - HBase would ship with a SQL access option. There's the Phoenix JDBC > > > > driver of course, and we'd also bundle the psql and sqlline exec > > wrappers > > > > from the Phoenix binary distribution. We'd have both the jruby shell > > and > > > a > > > > SQL shell, this is a powerful combination. > > > > > > > > - HBase ships with a library that assists users in making efficient > > > queries > > > > if their data is typed, but this doesn't include the server side > > > > optimizations that the Phoenix coprocessors provide, and in that case > > no > > > > hand rolling is necessary. > > > > > > > > - HBase would ship with secondary indexes. These would not cover all > > > > possible use cases and requirements, let's stipulate that now and > hope > > > this > > > > doesn't kick off another circular discussion on that front. > > > Unquestionably > > > > this is a compelling Phoenix feature so some use cases obviously can > > > > benefit, and if users find the combined distribution useful enough we > > > don't > > > > have to discuss secondary indexes in HBase core again. > > > > > > > > - We will have done the necessary integration work for the combined > > > result > > > > to be easy to use. Apache software cat herders will appreciate this. > > > > > > > > - It's totally optional, simply ignore the new binary packages if you > > > don't > > > > care. This is not a Grand Unification proposal. > > > > > > > > Concerns: > > > > > > > > - More work for the RM. Unquestionably. > > > > > > > > - Concerns about the quality of the combined convenience artifact: Is > > > there > > > > an implied warranty? Could we disclaim? Should we disclaim? If not, > how > > > > does HBase do QA on this. Related to the above concern about RM > > > bandwidth. > > > > Maybe Phoenix could help. > > > > > > > > - Increased coupling between the projects. Frankly, I think this > > already > > > > there, we just don't see it until we trip over issues that could have > > > been > > > > avoided with more communication between projects. Pushing on Phoenix > > for > > > > bits for a monthly HBase release cadence will surface issues faster > and > > > > improve communication between the projects. This benefits Phoenix > with > > > more > > > > QA bandwidth. This benefits HBase because we see Phoenix bringing in > a > > > > significant number of users. > > > > > > > > - We may want to revisit again normalizing type support in HBase's > > client > > > > library and Phoenix, eventually. > > > > > > > > I could add more items to the advantage or concern lists but mainly > > want > > > to > > > > float the idea for feedback at this time. > > > > > > > > Thoughts? > > > > > > > > -- > > > > Best regards, > > > > > > > > - Andy > > > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > > Hein > > > > (via Tom White) > > > > > > > > > > > > > > > -- > > // Jonathan Hsieh (shay) > > // HBase Tech Lead, Software Engineer, Cloudera > > // j...@cloudera.com // @jmhsieh > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)