Re: [DISCUSS] Dependency compatibility

Nick Dimiduk Wed, 11 Mar 2015 15:26:53 -0700

>
> > Furthermore, "hadoop jar" is how you're supposed to launch YARN apps. If
> we
> > say that doing things via the hbase command is acceptable, we're opening
> > ourselves up to an expansion of what the hbase command has to do. (i.e.
> > perhaps it should detect if the passed class is a YARN driver and then
> use
> > the hadoop jar command? or should it always pass through to the hadoop
> jar
> > command?)
> >
> Traditionally, and in our documentation, HBase owned MR classes (CopyTable,
> Import, etc) are run
> with the hbase script, not the hadoop script. It is a regression in that
> sense still. Yes, there is a
> workaround, but why we bother where we can fix this easily.



Can we side-step some of the issues here by fixing the hbase script for
launching jobs?

On Wed, Mar 11, 2015 at 3:19 PM, Enis Söztutar <[email protected]> wrote:

> On Wed, Mar 11, 2015 at 3:07 PM, Sean Busbey <[email protected]> wrote:
>
> > On Wed, Mar 11, 2015 at 4:49 PM, Enis Söztutar <[email protected]>
> wrote:
> >
> > > >
> > > > It's worth noting that if users follow our ref guide (which says to
> use
> > > > "hadoop jar"), then jobs don't fail. It's only when they attempt to
> > > launch
> > > > jobs using "hbase com.example.MyDriver" that things fail.
> > > >
> > > > Additionally, if we stick to telling users that only the "hadoop jar"
> > > > version is supported, we can rely on the application classpath
> support
> > > > built into Hadoop 2.6+ to make it so jobs built on us get our
> > dependency
> > > > version and not the ones from Hadoop as it changes.
> > > >
> > >
> > > We have learned that the users do not read or follow documentation. And
> > it
> > > is a regression
> > > if launching job using hbase command does not work.
> > >
> > >
> > >
> > They do when things break. ;) An additional troubleshooting section that
> > shows the error and says "remember to use hadoop jar" would nicely help
> > catch searchers.
> >
> > Furthermore, "hadoop jar" is how you're supposed to launch YARN apps. If
> we
> > say that doing things via the hbase command is acceptable, we're opening
> > ourselves up to an expansion of what the hbase command has to do. (i.e.
> > perhaps it should detect if the passed class is a YARN driver and then
> use
> > the hadoop jar command? or should it always pass through to the hadoop
> jar
> > command?)
> >
>
> Traditionally, and in our documentation, HBase owned MR classes (CopyTable,
> Import, etc) are run
> with the hbase script, not the hadoop script. It is a regression in that
> sense still. Yes, there is a
> workaround, but why we bother where we can fix this easily.
>
>
> >
> >
> >
> > > >
> > > >
> > > >
> > > > > So, my proposal is:
> > > > >  - Commit HBASE-13149 to master and 1.1
> > > > >  - Either change the dependency compat story for minor versions to
> > > false,
> > > > > or add a footnote saying that there may be exceptions because of
> the
> > > > > reasons listed above.
> > > > >
> > > >
> > > >
> > > > If we decide we need to do the jackson version bump, what about the
> > > > possibility of moving the code in branch-1 to be version 2.0.0 (and
> > > making
> > > > master 3.0.0). We could start the release process once the changes
> > Andrew
> > > > needs for Phoenix are in place and get it out the door.
> > > >
> > >
> > > I don't think this requires a major version bump. As I was mentioning
> in
> > > the other
> > > thread, HBase is not upgraded too frequently in production. Again, we
> do
> > > not want
> > > to inconvenience the user even further.
> > >
> > >
> > >
> > How would this inconvenience users further? Barring the change in version
> > numbers, it's the same upgrade they would be doing to move to what we're
> > currently calling HBase 1.1. Since version numbers under semver signal
> what
> > we understand about our changeset, it's just us acknowledging that we
> broke
> > some kind of compatibility. A release note that calls out the Jackson
> > dependency as the cause for that compatibility breakage makes the
> > evaluation easy.
> >
>
> The problem is boils down to "major versions are cheap" kind of argument,
> which have
> been discussed in Hadoop context. I do not buy it, because a major version
> upgrade implies
> (though do not have to be) a big change. I don't see why ever we would want
> to bump
> our major version, where the said library only bumped their minor version.
> Jackson could
> have went with 2.0 for those changes between 1.8 and 1.9. Why would we want
> to
> promise more than what our dependencies promise? It is not realistic.
>
>
>
> >
> > In the current state of the code, we'd just need to make some
> documentation
> > changes and then the same upgrade paths as for 1.1 should work just fine.
> > Provided we don't take too long getting the release out, I'd expect many
> > users would just upgrade from 0.98 to (the proposed) 2.0.0.
> >
> > (I mentioned the changes Andrew needs only because it's my understanding
> > that those are the driving factor on branch-1 getting to release, not
> > because I expect them to be breaking.)
> >
> >
> > > >
> > > > It would do a nice job of desensitizing us to major version
> increments
> > > and
> > > > we'd be able to document it as a very safe major version upgrade
> since
> > > the
> > > > only breakage is that dependency. We could then limit the HBase 1.y
> > line
> > > to
> > > > just 1.0.z and add a FAQ item if enough folks ask about why the
> sudden
> > > > increment.
> > > >
> > >
> > > Doing a major version just to update one dependency version is too
> much I
> > > think.
> > >
> > >
> > But that's the point of following semver and defining a compatibility
> > document. The sufficient criteria for a major version bump expressly
> covers
> > updating a single dependency in a non-breaking way.
> >
> > There will be plenty of major version numbers to go through. The thing
> that
> > trips projects up is feeling like major version releases need to be
> > special. If we want to do that, then we shouldn't use semver. We should
> > define our own versioning standard and make it "Marketing, Major, Minor"
> > instead of "Major, Minor, Patch." (I would prefer we not do this.)
> >
> >
> >
> > > >
> > > > I'm -1 on the idea of exceptions for our compatibility story. We
> > already
> > > > note that just because we can break something doesn't mean we will.
> > That
> > > > does a good job of pointing out that we recognize there's a cost.
> > > >
> > >
> > > We do not have to corner ourselves with the rules we have set. I can
> see
> > > how requiring
> > > JDK-8 or Hadoop-3 etc will justify major versions. But not a dependency
> > > library that
> > > users might be transitively depending on. If that is the case, the user
> > is
> > > expected to deal with it.
> > >
> > >
> > If we want to treat those differently then we need to update our
> > compatibility document to call out JVM and Hadoop support as a different
> > thing then the rest of our dependency promises. But we should not do
> this.
> > So long as we are forcing applications that integrate with us to use
> > particular versions of third party libraries, we make it much harder to
> > upgrade when we don't provide stability.
> >
> > --
> > Sean
> >
>

Re: [DISCUSS] Dependency compatibility

Reply via email to