I'm +1 to split the hbase code out from core into crunch-hbase. Any
objections?

On Mon, Aug 6, 2012 at 10:58 AM, Matthias Friedrich <[email protected]> wrote:

> Hi Josh,
>
> hadoop-1.0.3 + hbase-0.94.0 + crunch didn't work for me. It requires
> avro-1.5.3 and doesn't compile with avro-1.7.0; but I think the real
> problem is their use of MethodUtils from commons-lang-2.5 which isn't
> in Hadoop's commons-lang-2.4.
>
> Of course, we can use hbase-0.90.5, downgrade crunch to avro-1.3.3 and
> thrift-0.2.0, pray that jersey is irrelevant, tie all other HBase
> dependencies to the versions Hadoop uses and hope that it works. It
> may work, at the price of forcing some old versions on our users. But
> actually, if it works or not isn't the main point, let's have a look
> at the user perspective.
>
> When you use a thirdparty framework like Hadoop, your application
> inherits the framework's classpath (*). This means, any other dependency
> your application has (including transitive dependencies) has to be
> compatible with the framework's dependencies. The more complex your
> application is, the more this hurts you. You can't update your
> dependencies because the framework locks you in. Porting existing,
> complex applications to the framework is nearly impossible.
>
> I've seen this many times, that's why I evaluate my dependencies
> carefully. Crunch itself is pretty minimal when it comes to
> its direct dependencies (we could be even more minimal with little
> effort). With HBase, however, things look a lot more difficult
> and that's going to scare users away.
>
> I think if we have the chance to make HBase support an optional
> feature, much like MapReduce support is optional in Avro, then we
> should take it.
>
> Users are very thankful when you leave them a choice. I'm a user, I
> know. I've evaluated dozens of libraries and frameworks and dismissed
> quite a few because of dependency conflicts. If you're organized well
> enough to have an evaluation checklist, then this will be on it. I'd
> like to use Crunch in production one day without bending the rules, so
> let's lower the barrier to adoption.
>
> Regards,
>   Matthias, stepping off the soap box
>
> (*) Yes, I know about classloader isolation in Java EE and
> HADOOP_USER_CLASSPATH_FIRST.
>
> On Sunday, 2012-08-05, Josh Wills wrote:
> > Hey Matthias,
> >
> > I'm not quite willing to give up on hbase just yet-- how does 1.0.3
> > +Crunch look against hbase 0.94? Is the primary issue the Avro 1.7.0
> > conflicts?
> >
> > J
> >
> > On Sun, Aug 5, 2012 at 2:10 AM, Matthias Friedrich <[email protected]> wrote:
> > > Hi,
> > >
> > > I spent most of Saturday resolving dependency conflicts for CRUNCH-16.
> > > Since nobody's going to read a long mail, here are the cliff notes:
> > >
> > > hadoop-core-1.0.3, hbase-0.90.5, and avro-1.7.0 are incompatible and
> > > I found no safe solution to fix it. Moving HBase support to a separate
> > > Maven module may be the best solution because it reduces risk for
> > > users who don't need HBase.
> > >
> > >
> > > The longer version:
> > >
> > > The POM of hadoop-core-1.0.3 is in a sorry state. It doesn't list all
> > > libraries that are on the runtime classpath, and of these, some are
> > > wrong. For example, integration tests using LocalJobRunner don't work
> > > unless you add more dependencies yourself (ie. commons-io). Also,
> roughly
> > > a dozen of hbase-0.90.5's 40 dependencies are in conflict with
> > > hadoop-core-1.0.3. This means we have to add quite a few "provided"
> > > dependencies with the correct versions ourselves, but these aren't
> > > propagated to our users so they have to do the same or risk conflicts
> > > at runtime.
> > >
> > > I resolved the conflicts to a point where our integration tests work
> > > which is unfortunately no guarantee that things will work for our
> users.
> > > Using the dependencies of hadoop-core-1.0.3 + Crunch's, the source
> > > distribution of hbase-0.90.5 doesn't even compile. At an interface
> > > level, it is incompatible with protobuf-java-2.4.1 (easy enough to fix)
> > > and avro-1.7.0 (not so easy to fix). Changing only those dependencies
> > > that are interface compatible (about a dozen) unsurprisingly leads to
> > > HBase test case failures. This may not affect HBase clients, but you
> > > never know. There is no hbase-client library so you always get
> > > everything unless you know HBase well enough to get your exclusions
> > > right.
> > >
> > >
> > > So, where do we go from here? I can get a patch ready that paints
> > > over some of these problems and makes sure that the dependencies we
> > > use in our test cases are the same as during runtime. But I really
> > > need careful review for this.
> > >
> > > To be honest, this situation leaves me a bit uneasy. Maybe the best
> > > long term solution would be to move HBase support to a separate Maven
> > > module that depends on crunch core and not force it on everyone. This
> > > will reduce risk greatly for those who don't need HBase. I think it's
> > > definitely worth giving it a shot.
> > >
> > > What do you think, guys?
> > >
> > > Regards,
> > >   Matthias
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Reply via email to