Re: Hadoop 2 compatibility issues

Keith Turner Tue, 14 May 2013 15:14:45 -0700

On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <bimargul...@gmail.com>wrote:


> I am a maven developer, and I'm offering this advice based on my
> understanding of reason why that generic advice is offered.
>
> If you have different profiles that _build different results_ but all
> deliver the same GAV, you have chaos.
>

What GAV are we currently producing for hadoop 1 and hadoop 2?


>
> If you have different profiles that test against different versions of
> dependencies, but all deliver the same byte code at the end of the
> day, you don't have chaos.
>
>
>
> On Tue, May 14, 2013 at 5:48 PM, Christopher <ctubb...@apache.org> wrote:
> > I think it's interesting that Option 4 seems to be most preferred...
> > because it's the *only* option that is explicitly advised against by
> > the Maven developers (from the information I've read). I can see its
> > appeal, but I really don't think that we should introduce an explicit
> > problem for users (that applies to users using even the Hadoop version
> > we directly build against... not just those using Hadoop 2... I don't
> > know if that point was clear), to only partially support a version of
> > Hadoop that is still alpha and has never had a stable release.
> >
> > BTW, Option 4 was how I had have achieved a solution for
> > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
> > outstanding, as it may exacerbate the problem.
> >
> > Another implication for Option 4 (the current "solution") is for
> > 1.6.0, with the planned accumulo-maven-plugin... because it means that
> > the accumulo-maven-plugin will need to be configured like this:
> > <plugin>
> >   <groupId>org.apache.accumulo</groupId>
> >   <artifactId>accumulo-maven-plugin</artifactId>
> >   <dependencies>
> >    ... all the required hadoop 1 dependencies to make the plugin work,
> > even though this version only works against hadoop 1 anyway...
> >   </dependencies>
> >   ...
> > </plugin>
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
> >
> > On Tue, May 14, 2013 at 5:42 PM, Christopher <ctubb...@apache.org>
> wrote:
> >> I think Option 2 is the best solution for "waiting until we have the
> >> time to solve the problem correctly", as it ensures that transitive
> >> dependencies work for the stable version of Hadoop, and using Hadoop2
> >> is a very simple documentation issue for how to apply the patch and
> >> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
> >> for users.
> >>
> >> Option 1 is how I'm tentatively thinking about fixing it properly in
> 1.6.0.
> >>
> >>
> >> --
> >> Christopher L Tubbs II
> >> http://gravatar.com/ctubbsii
> >>
> >>
> >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org> wrote:
> >>> I'm an advocate of option 4. You say that it's ignoring the problem,
> >>> whereas I think it's waiting until we have the time to solve the
> problem
> >>> correctly. Your reasoning for this is for standardizing for maven
> >>> conventions, but the other options, while more 'correct' from a maven
> >>> standpoint or a larger headache for our user base and ourselves. In
> either
> >>> case, we're going to be breaking some sort of convention, and while
> it's
> >>> not good, we should be doing the one that's less bad for US. The
> important
> >>> thing here, now, is that the poms work and we should go with the method
> >>> that leaves the work minimal for our end users to utilize them.
> >>>
> >>> I do agree that 1. is the correct option in the long run. More
> >>> specifically, I think it boils down to having a single module
> compatibility
> >>> layer, which is how hbase deals with this issue. But like you said, we
> >>> don't have the time to engineer a proper solution. So let sleeping
> dogs lie
> >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
> >>> cycles to do it right.
> >>>
> >>>
> >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ctubb...@apache.org>
> wrote:
> >>>
> >>>> So, I've run into a problem with ACCUMULO-1402 that requires a larger
> >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >>>>
> >>>> The problem is basically that profiles should not contain
> >>>> dependencies, because profiles don't get activated transitively. A
> >>>> slide deck by the Maven developers point this out as a bad practice...
> >>>> yet it's a practice we rely on for our current implementation of
> >>>> Hadoop2 support
> >>>> (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >>>> slide 80).
> >>>>
> >>>> What this means is that even if we go through the work of publishing
> >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >>>> binaries or our Hadoop2 binaries will be able to transitively resolve
> >>>> any dependencies defined in profiles. This has significant
> >>>> implications to user code that depends on Accumulo Maven artifacts.
> >>>> Every user will essentially have to explicitly add Hadoop dependencies
> >>>> for every Accumulo artifact that has dependencies on Hadoop, either
> >>>> because we directly or transitively depend on Hadoop (they'll have to
> >>>> peek into the profiles in our POMs and copy/paste the profile into
> >>>> their project). This becomes more complicated when we consider how
> >>>> users will try to use things like Instamo.
> >>>>
> >>>> There are workarounds, but none of them are really pleasant.
> >>>>
> >>>> 1. The best way to support both major Hadoop APIs is to have separate
> >>>> modules with separate dependencies directly in the POM. This is a fair
> >>>> amount of work, and in my opinion, would be too disruptive for 1.5.0.
> >>>> This solution also gets us separate binaries for separate supported
> >>>> versions, which is useful.
> >>>>
> >>>> 2. A second option, and the preferred one I think for 1.5.0, is to put
> >>>> a Hadoop2 patch in the branch's contrib directory
> >>>> (branches/1.5/contrib) that patches the POM files to support building
> >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >>>> solution.)
> >>>>
> >>>> 3. A third option is to fork Accumulo, and maintain two separate
> >>>> builds (a more traditional technique). This adds merging nightmare for
> >>>> features/patches, but gets around some reflection hacks that we may
> >>>> have been motivated to do in the past. I'm not a fan of this option,
> >>>> particularly because I don't want to replicate the fork nightmare that
> >>>> has been the history of early Hadoop itself.
> >>>>
> >>>> 4. The last option is to do nothing and to continue to build with the
> >>>> separate profiles as we are, and make users discover and specify
> >>>> transitive dependencies entirely on their own. I think this is the
> >>>> worst option, as it essentially amounts to "ignore the problem".
> >>>>
> >>>> At the very least, it does not seem reasonable to complete
> >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >>>>
> >>>> Thoughts? Discussion? Vote on option?
> >>>>
> >>>> --
> >>>> Christopher L Tubbs II
> >>>> http://gravatar.com/ctubbsii
> >>>>
>

Re: Hadoop 2 compatibility issues

Reply via email to