Re: Hadoop 2 compatibility issues

John Vines Tue, 14 May 2013 16:46:49 -0700

We've written the code such that it works in either, and then we have
profiles which set the hadoop.version for convenience. The profiles also
alternate between using hadoop-client and hadoop-core, but as I mentioned
above, that is unnecessary.


Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 7:42 PM, "Benson Margulies" <bimargul...@gmail.com> wrote:

> On Tue, May 14, 2013 at 7:36 PM, Christopher <ctubb...@apache.org> wrote:
> > Benson-
> >
> > They produce different byte-code. That's why we're even considering
> > this. ACCUMULO-1402 is the ticket under which our intent is to add
> > classifiers, so that they can be distinguished.
>
> whoops, missed that.
>
> Then how do people succeed in just fixing up their dependencies and using
> it?
>
> In any case, speaking as a Maven-maven, classifiers are absolutely,
> positively, a cure worse than the disease. If you want the details
> just ask.
>
> >
> > All-
> >
> > To Keith's point, I think perhaps all this concern is a non-issue...
> > because as Keith points out, the dependencies in question are marked
> > as "provided", and dependency resolution doesn't occur for provided
> > dependencies anyway... so even if we leave off the profiles, we're in
> > the same boat. Maybe not the boat we should be in... but certainly not
> > a sinking one as I had first imagined. It's as afloat as it was
> > before, when they were not in a profile, but still marked as
> > "provided".
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
> >
> > On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <bimargul...@gmail.com>
> wrote:
> >> I just doesn't make very much sense to me to have two different GAV's
> >> for the very same .class files, just to get different dependencies in
> >> the poms. However, if someone really wanted that, I'd look to make
> >> some scripting that created this downstream from the main build.
> >>
> >>
> >> On Tue, May 14, 2013 at 6:16 PM, John Vines <vi...@apache.org> wrote:
> >>> They're the same currently. I was requesting separate gavs for hadoop
> 2.
> >>> It's been on the mailing list and jira.
> >>>
> >>> Sent from my phone, please pardon the typos and brevity.
> >>> On May 14, 2013 6:14 PM, "Keith Turner" <ke...@deenlo.com> wrote:
> >>>
> >>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <
> bimargul...@gmail.com
> >>>> >wrote:
> >>>>
> >>>> > I am a maven developer, and I'm offering this advice based on my
> >>>> > understanding of reason why that generic advice is offered.
> >>>> >
> >>>> > If you have different profiles that _build different results_ but
> all
> >>>> > deliver the same GAV, you have chaos.
> >>>> >
> >>>>
> >>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
> >>>>
> >>>>
> >>>> >
> >>>> > If you have different profiles that test against different versions
> of
> >>>> > dependencies, but all deliver the same byte code at the end of the
> >>>> > day, you don't have chaos.
> >>>> >
> >>>> >
> >>>> >
> >>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ctubb...@apache.org>
> >>>> wrote:
> >>>> > > I think it's interesting that Option 4 seems to be most
> preferred...
> >>>> > > because it's the *only* option that is explicitly advised against
> by
> >>>> > > the Maven developers (from the information I've read). I can see
> its
> >>>> > > appeal, but I really don't think that we should introduce an
> explicit
> >>>> > > problem for users (that applies to users using even the Hadoop
> version
> >>>> > > we directly build against... not just those using Hadoop 2... I
> don't
> >>>> > > know if that point was clear), to only partially support a
> version of
> >>>> > > Hadoop that is still alpha and has never had a stable release.
> >>>> > >
> >>>> > > BTW, Option 4 was how I had have achieved a solution for
> >>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this
> issue
> >>>> > > outstanding, as it may exacerbate the problem.
> >>>> > >
> >>>> > > Another implication for Option 4 (the current "solution") is for
> >>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it means
> that
> >>>> > > the accumulo-maven-plugin will need to be configured like this:
> >>>> > > <plugin>
> >>>> > >   <groupId>org.apache.accumulo</groupId>
> >>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
> >>>> > >   <dependencies>
> >>>> > >    ... all the required hadoop 1 dependencies to make the plugin
> work,
> >>>> > > even though this version only works against hadoop 1 anyway...
> >>>> > >   </dependencies>
> >>>> > >   ...
> >>>> > > </plugin>
> >>>> > >
> >>>> > > --
> >>>> > > Christopher L Tubbs II
> >>>> > > http://gravatar.com/ctubbsii
> >>>> > >
> >>>> > >
> >>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ctubb...@apache.org
> >
> >>>> > wrote:
> >>>> > >> I think Option 2 is the best solution for "waiting until we have
> the
> >>>> > >> time to solve the problem correctly", as it ensures that
> transitive
> >>>> > >> dependencies work for the stable version of Hadoop, and using
> Hadoop2
> >>>> > >> is a very simple documentation issue for how to apply the patch
> and
> >>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a
> problem
> >>>> > >> for users.
> >>>> > >>
> >>>> > >> Option 1 is how I'm tentatively thinking about fixing it
> properly in
> >>>> > 1.6.0.
> >>>> > >>
> >>>> > >>
> >>>> > >> --
> >>>> > >> Christopher L Tubbs II
> >>>> > >> http://gravatar.com/ctubbsii
> >>>> > >>
> >>>> > >>
> >>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vi...@apache.org>
> wrote:
> >>>> > >>> I'm an advocate of option 4. You say that it's ignoring the
> problem,
> >>>> > >>> whereas I think it's waiting until we have the time to solve the
> >>>> > problem
> >>>> > >>> correctly. Your reasoning for this is for standardizing for
> maven
> >>>> > >>> conventions, but the other options, while more 'correct' from a
> maven
> >>>> > >>> standpoint or a larger headache for our user base and
> ourselves. In
> >>>> > either
> >>>> > >>> case, we're going to be breaking some sort of convention, and
> while
> >>>> > it's
> >>>> > >>> not good, we should be doing the one that's less bad for US. The
> >>>> > important
> >>>> > >>> thing here, now, is that the poms work and we should go with the
> >>>> method
> >>>> > >>> that leaves the work minimal for our end users to utilize them.
> >>>> > >>>
> >>>> > >>> I do agree that 1. is the correct option in the long run. More
> >>>> > >>> specifically, I think it boils down to having a single module
> >>>> > compatibility
> >>>> > >>> layer, which is how hbase deals with this issue. But like you
> said,
> >>>> we
> >>>> > >>> don't have the time to engineer a proper solution. So let
> sleeping
> >>>> > dogs lie
> >>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we
> have
> >>>> the
> >>>> > >>> cycles to do it right.
> >>>> > >>>
> >>>> > >>>
> >>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <
> ctubb...@apache.org>
> >>>> > wrote:
> >>>> > >>>
> >>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
> >>>> larger
> >>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
> >>>> > >>>>
> >>>> > >>>> The problem is basically that profiles should not contain
> >>>> > >>>> dependencies, because profiles don't get activated
> transitively. A
> >>>> > >>>> slide deck by the Maven developers point this out as a bad
> >>>> practice...
> >>>> > >>>> yet it's a practice we rely on for our current implementation
> of
> >>>> > >>>> Hadoop2 support
> >>>> > >>>> (
> >>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
> >>>> > >>>> slide 80).
> >>>> > >>>>
> >>>> > >>>> What this means is that even if we go through the work of
> publishing
> >>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
> >>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
> >>>> resolve
> >>>> > >>>> any dependencies defined in profiles. This has significant
> >>>> > >>>> implications to user code that depends on Accumulo Maven
> artifacts.
> >>>> > >>>> Every user will essentially have to explicitly add Hadoop
> >>>> dependencies
> >>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop,
> either
> >>>> > >>>> because we directly or transitively depend on Hadoop (they'll
> have
> >>>> to
> >>>> > >>>> peek into the profiles in our POMs and copy/paste the profile
> into
> >>>> > >>>> their project). This becomes more complicated when we consider
> how
> >>>> > >>>> users will try to use things like Instamo.
> >>>> > >>>>
> >>>> > >>>> There are workarounds, but none of them are really pleasant.
> >>>> > >>>>
> >>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
> >>>> separate
> >>>> > >>>> modules with separate dependencies directly in the POM. This
> is a
> >>>> fair
> >>>> > >>>> amount of work, and in my opinion, would be too disruptive for
> >>>> 1.5.0.
> >>>> > >>>> This solution also gets us separate binaries for separate
> supported
> >>>> > >>>> versions, which is useful.
> >>>> > >>>>
> >>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0,
> is to
> >>>> put
> >>>> > >>>> a Hadoop2 patch in the branch's contrib directory
> >>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
> >>>> building
> >>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
> >>>> > >>>> solution.)
> >>>> > >>>>
> >>>> > >>>> 3. A third option is to fork Accumulo, and maintain two
> separate
> >>>> > >>>> builds (a more traditional technique). This adds merging
> nightmare
> >>>> for
> >>>> > >>>> features/patches, but gets around some reflection hacks that
> we may
> >>>> > >>>> have been motivated to do in the past. I'm not a fan of this
> option,
> >>>> > >>>> particularly because I don't want to replicate the fork
> nightmare
> >>>> that
> >>>> > >>>> has been the history of early Hadoop itself.
> >>>> > >>>>
> >>>> > >>>> 4. The last option is to do nothing and to continue to build
> with
> >>>> the
> >>>> > >>>> separate profiles as we are, and make users discover and
> specify
> >>>> > >>>> transitive dependencies entirely on their own. I think this is
> the
> >>>> > >>>> worst option, as it essentially amounts to "ignore the
> problem".
> >>>> > >>>>
> >>>> > >>>> At the very least, it does not seem reasonable to complete
> >>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
> >>>> > >>>>
> >>>> > >>>> Thoughts? Discussion? Vote on option?
> >>>> > >>>>
> >>>> > >>>> --
> >>>> > >>>> Christopher L Tubbs II
> >>>> > >>>> http://gravatar.com/ctubbsii
> >>>> > >>>>
> >>>> >
> >>>>
>

Re: Hadoop 2 compatibility issues

Reply via email to