Hi Josh,

hadoop-1.0.3 + hbase-0.94.0 + crunch didn't work for me. It requires
avro-1.5.3 and doesn't compile with avro-1.7.0; but I think the real
problem is their use of MethodUtils from commons-lang-2.5 which isn't
in Hadoop's commons-lang-2.4.

Of course, we can use hbase-0.90.5, downgrade crunch to avro-1.3.3 and
thrift-0.2.0, pray that jersey is irrelevant, tie all other HBase
dependencies to the versions Hadoop uses and hope that it works. It
may work, at the price of forcing some old versions on our users. But
actually, if it works or not isn't the main point, let's have a look
at the user perspective.

When you use a thirdparty framework like Hadoop, your application
inherits the framework's classpath (*). This means, any other dependency
your application has (including transitive dependencies) has to be
compatible with the framework's dependencies. The more complex your
application is, the more this hurts you. You can't update your
dependencies because the framework locks you in. Porting existing,
complex applications to the framework is nearly impossible.

I've seen this many times, that's why I evaluate my dependencies
carefully. Crunch itself is pretty minimal when it comes to
its direct dependencies (we could be even more minimal with little
effort). With HBase, however, things look a lot more difficult
and that's going to scare users away.

I think if we have the chance to make HBase support an optional
feature, much like MapReduce support is optional in Avro, then we
should take it.

Users are very thankful when you leave them a choice. I'm a user, I
know. I've evaluated dozens of libraries and frameworks and dismissed
quite a few because of dependency conflicts. If you're organized well
enough to have an evaluation checklist, then this will be on it. I'd
like to use Crunch in production one day without bending the rules, so
let's lower the barrier to adoption.

Regards,
  Matthias, stepping off the soap box

(*) Yes, I know about classloader isolation in Java EE and 
HADOOP_USER_CLASSPATH_FIRST.

On Sunday, 2012-08-05, Josh Wills wrote:
> Hey Matthias,
> 
> I'm not quite willing to give up on hbase just yet-- how does 1.0.3
> +Crunch look against hbase 0.94? Is the primary issue the Avro 1.7.0
> conflicts?
> 
> J
> 
> On Sun, Aug 5, 2012 at 2:10 AM, Matthias Friedrich <[email protected]> wrote:
> > Hi,
> >
> > I spent most of Saturday resolving dependency conflicts for CRUNCH-16.
> > Since nobody's going to read a long mail, here are the cliff notes:
> >
> > hadoop-core-1.0.3, hbase-0.90.5, and avro-1.7.0 are incompatible and
> > I found no safe solution to fix it. Moving HBase support to a separate
> > Maven module may be the best solution because it reduces risk for
> > users who don't need HBase.
> >
> >
> > The longer version:
> >
> > The POM of hadoop-core-1.0.3 is in a sorry state. It doesn't list all
> > libraries that are on the runtime classpath, and of these, some are
> > wrong. For example, integration tests using LocalJobRunner don't work
> > unless you add more dependencies yourself (ie. commons-io). Also, roughly
> > a dozen of hbase-0.90.5's 40 dependencies are in conflict with
> > hadoop-core-1.0.3. This means we have to add quite a few "provided"
> > dependencies with the correct versions ourselves, but these aren't
> > propagated to our users so they have to do the same or risk conflicts
> > at runtime.
> >
> > I resolved the conflicts to a point where our integration tests work
> > which is unfortunately no guarantee that things will work for our users.
> > Using the dependencies of hadoop-core-1.0.3 + Crunch's, the source
> > distribution of hbase-0.90.5 doesn't even compile. At an interface
> > level, it is incompatible with protobuf-java-2.4.1 (easy enough to fix)
> > and avro-1.7.0 (not so easy to fix). Changing only those dependencies
> > that are interface compatible (about a dozen) unsurprisingly leads to
> > HBase test case failures. This may not affect HBase clients, but you
> > never know. There is no hbase-client library so you always get
> > everything unless you know HBase well enough to get your exclusions
> > right.
> >
> >
> > So, where do we go from here? I can get a patch ready that paints
> > over some of these problems and makes sure that the dependencies we
> > use in our test cases are the same as during runtime. But I really
> > need careful review for this.
> >
> > To be honest, this situation leaves me a bit uneasy. Maybe the best
> > long term solution would be to move HBase support to a separate Maven
> > module that depends on crunch core and not force it on everyone. This
> > will reduce risk greatly for those who don't need HBase. I think it's
> > definitely worth giving it a shot.
> >
> > What do you think, guys?
> >
> > Regards,
> >   Matthias

Reply via email to