Re: Hadoop 2 compatibility issues

Sean Busbey Tue, 14 May 2013 13:53:36 -0700

If a user is referencing any of the Hadoop classes, aren't they supposed to
add a dependency on the appropriate Hadoop artifact anyways?


FWIW, option 4 is what Avro does. Their discussion:

https://issues.apache.org/jira/browse/AVRO-1170




On Tue, May 14, 2013 at 4:40 PM, Christopher <ctubb...@apache.org> wrote:

> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> discussion about how Accumulo 1.5.0 should support Hadoop2.
>
> The problem is basically that profiles should not contain
> dependencies, because profiles don't get activated transitively. A
> slide deck by the Maven developers point this out as a bad practice...
> yet it's a practice we rely on for our current implementation of
> Hadoop2 support
> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> slide 80).
>
> What this means is that even if we go through the work of publishing
> binary artifacts compiled against Hadoop2, neither our Hadoop1
> binaries or our Hadoop2 binaries will be able to transitively resolve
> any dependencies defined in profiles. This has significant
> implications to user code that depends on Accumulo Maven artifacts.
> Every user will essentially have to explicitly add Hadoop dependencies
> for every Accumulo artifact that has dependencies on Hadoop, either
> because we directly or transitively depend on Hadoop (they'll have to
> peek into the profiles in our POMs and copy/paste the profile into
> their project). This becomes more complicated when we consider how
> users will try to use things like Instamo.
>
> There are workarounds, but none of them are really pleasant.
>
> 1. The best way to support both major Hadoop APIs is to have separate
> modules with separate dependencies directly in the POM. This is a fair
> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> This solution also gets us separate binaries for separate supported
> versions, which is useful.
>
> 2. A second option, and the preferred one I think for 1.5.0, is to put
> a Hadoop2 patch in the branch's contrib directory
> (branches/1.5/contrib) that patches the POM files to support building
> against Hadoop2. (Acknowledgement to Keith for suggesting this
> solution.)
>
> 3. A third option is to fork Accumulo, and maintain two separate
> builds (a more traditional technique). This adds merging nightmare for
> features/patches, but gets around some reflection hacks that we may
> have been motivated to do in the past. I'm not a fan of this option,
> particularly because I don't want to replicate the fork nightmare that
> has been the history of early Hadoop itself.
>
> 4. The last option is to do nothing and to continue to build with the
> separate profiles as we are, and make users discover and specify
> transitive dependencies entirely on their own. I think this is the
> worst option, as it essentially amounts to "ignore the problem".
>
> At the very least, it does not seem reasonable to complete
> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>
> Thoughts? Discussion? Vote on option?
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>



-- 
Sean Busbey
Solutions Architect
Cloudera, Inc.
Phone: MAN-VS-BEARD

Re: Hadoop 2 compatibility issues

Reply via email to