I'm +1 to split the hbase code out from core into crunch-hbase. Any objections?
On Mon, Aug 6, 2012 at 10:58 AM, Matthias Friedrich <[email protected]> wrote: > Hi Josh, > > hadoop-1.0.3 + hbase-0.94.0 + crunch didn't work for me. It requires > avro-1.5.3 and doesn't compile with avro-1.7.0; but I think the real > problem is their use of MethodUtils from commons-lang-2.5 which isn't > in Hadoop's commons-lang-2.4. > > Of course, we can use hbase-0.90.5, downgrade crunch to avro-1.3.3 and > thrift-0.2.0, pray that jersey is irrelevant, tie all other HBase > dependencies to the versions Hadoop uses and hope that it works. It > may work, at the price of forcing some old versions on our users. But > actually, if it works or not isn't the main point, let's have a look > at the user perspective. > > When you use a thirdparty framework like Hadoop, your application > inherits the framework's classpath (*). This means, any other dependency > your application has (including transitive dependencies) has to be > compatible with the framework's dependencies. The more complex your > application is, the more this hurts you. You can't update your > dependencies because the framework locks you in. Porting existing, > complex applications to the framework is nearly impossible. > > I've seen this many times, that's why I evaluate my dependencies > carefully. Crunch itself is pretty minimal when it comes to > its direct dependencies (we could be even more minimal with little > effort). With HBase, however, things look a lot more difficult > and that's going to scare users away. > > I think if we have the chance to make HBase support an optional > feature, much like MapReduce support is optional in Avro, then we > should take it. > > Users are very thankful when you leave them a choice. I'm a user, I > know. I've evaluated dozens of libraries and frameworks and dismissed > quite a few because of dependency conflicts. If you're organized well > enough to have an evaluation checklist, then this will be on it. I'd > like to use Crunch in production one day without bending the rules, so > let's lower the barrier to adoption. > > Regards, > Matthias, stepping off the soap box > > (*) Yes, I know about classloader isolation in Java EE and > HADOOP_USER_CLASSPATH_FIRST. > > On Sunday, 2012-08-05, Josh Wills wrote: > > Hey Matthias, > > > > I'm not quite willing to give up on hbase just yet-- how does 1.0.3 > > +Crunch look against hbase 0.94? Is the primary issue the Avro 1.7.0 > > conflicts? > > > > J > > > > On Sun, Aug 5, 2012 at 2:10 AM, Matthias Friedrich <[email protected]> wrote: > > > Hi, > > > > > > I spent most of Saturday resolving dependency conflicts for CRUNCH-16. > > > Since nobody's going to read a long mail, here are the cliff notes: > > > > > > hadoop-core-1.0.3, hbase-0.90.5, and avro-1.7.0 are incompatible and > > > I found no safe solution to fix it. Moving HBase support to a separate > > > Maven module may be the best solution because it reduces risk for > > > users who don't need HBase. > > > > > > > > > The longer version: > > > > > > The POM of hadoop-core-1.0.3 is in a sorry state. It doesn't list all > > > libraries that are on the runtime classpath, and of these, some are > > > wrong. For example, integration tests using LocalJobRunner don't work > > > unless you add more dependencies yourself (ie. commons-io). Also, > roughly > > > a dozen of hbase-0.90.5's 40 dependencies are in conflict with > > > hadoop-core-1.0.3. This means we have to add quite a few "provided" > > > dependencies with the correct versions ourselves, but these aren't > > > propagated to our users so they have to do the same or risk conflicts > > > at runtime. > > > > > > I resolved the conflicts to a point where our integration tests work > > > which is unfortunately no guarantee that things will work for our > users. > > > Using the dependencies of hadoop-core-1.0.3 + Crunch's, the source > > > distribution of hbase-0.90.5 doesn't even compile. At an interface > > > level, it is incompatible with protobuf-java-2.4.1 (easy enough to fix) > > > and avro-1.7.0 (not so easy to fix). Changing only those dependencies > > > that are interface compatible (about a dozen) unsurprisingly leads to > > > HBase test case failures. This may not affect HBase clients, but you > > > never know. There is no hbase-client library so you always get > > > everything unless you know HBase well enough to get your exclusions > > > right. > > > > > > > > > So, where do we go from here? I can get a patch ready that paints > > > over some of these problems and makes sure that the dependencies we > > > use in our test cases are the same as during runtime. But I really > > > need careful review for this. > > > > > > To be honest, this situation leaves me a bit uneasy. Maybe the best > > > long term solution would be to move HBase support to a separate Maven > > > module that depends on crunch core and not force it on everyone. This > > > will reduce risk greatly for those who don't need HBase. I think it's > > > definitely worth giving it a shot. > > > > > > What do you think, guys? > > > > > > Regards, > > > Matthias > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
