Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

Sean Busbey Fri, 09 Sep 2016 12:57:33 -0700

Failing in a consistent way, with docs that explain the various
expected failures would be sufficient.


On Fri, Sep 9, 2016 at 12:16 PM, Vladimir Rodionov
<vladrodio...@gmail.com> wrote:
> Do not worry Sean, doc is coming today as a preview and our writer Frank
> will be working on a putting  it into Apache repo. Timeline depends on
> Franks schedule but I hope we will get it rather sooner than later.
>
> As for failure testing, we are focusing only on a consistent state of
> backup system data in a presence of any type of failures, We are not going
> to implement  anything more "fancy", than that. We allow both: backup and
> restore to fail. What we do not allow is to have system data corrupted.
> Will it suffice for you? Do you have any other concerns, you want us to
> address?
>
> -Vlad
>
>
> On Fri, Sep 9, 2016 at 10:56 AM, Sean Busbey <bus...@apache.org> wrote:
>
>> "docs will come to Apache soon" does not address my concern around docs at
>> all, unless said docs have already made it into the project repo. I don't
>> want third party resources for using a major and important feature of the
>> project, I want us to provide end users with what they need to get the job
>> done.
>>
>> I see some calls for patience on the failure testing, but the appeal to us
>> having done a bad job of requiring proper tests of previous features just
>> makes me more concerned about not getting them here. I don't want to set
>> yet another bad example that will then be pointed to in the future.
>>
>> On Sep 8, 2016 10:50, "Ted Yu" <yuzhih...@gmail.com> wrote:
>>
>> > Is there any concern which is not addressed ?
>> >
>> > Do we need another Vote thread ?
>> >
>> > Thanks
>> >
>> > On Thu, Sep 8, 2016 at 9:21 AM, Andrew Purtell <apurt...@apache.org>
>> > wrote:
>> >
>> > > Vlad,
>> > >
>> > > I apologize for using the term 'half-baked' in a way that could seem a
>> > > description of HBASE-7912. I meant that as a general hypothetical.
>> > >
>> > > On Wed, Sep 7, 2016 at 9:36 AM, Vladimir Rodionov <
>> > vladrodio...@gmail.com>
>> > > wrote:
>> > >
>> > > > >> I'm not sure that "There is already lots of half-baked code in the
>> > > > branch,
>> > > > so what's the harm in adding more?"
>> > > >
>> > > > I meant - not production - ready yet. This is 2.0 development branch
>> > and,
>> > > > hence many features are in works,
>> > > > not being tested well etc. I do not consider backup as half baked
>> > > feature -
>> > > > it has passed our internal QA and has very good doc, which we will
>> > > provide
>> > > > to Apache shortly.
>> > > >
>> > > > -Vlad
>> > > >
>> > > > On Wed, Sep 7, 2016 at 9:13 AM, Andrew Purtell <apurt...@apache.org>
>> > > > wrote:
>> > > >
>> > > > > We shouldn't admit half baked changes that won't be finished.
>> However
>> > > in
>> > > > > this case the crew working on this feature are long timers and less
>> > > > likely
>> > > > > than just about anyone to leave something in a half baked state. Of
>> > > > course
>> > > > > there is no guarantee how anything will turn out, but I am willing
>> to
>> > > > take
>> > > > > a little on faith if they feel their best path forward now is to
>> > merge
>> > > to
>> > > > > trunk. I only wish I had bandwidth to have done some real kicking
>> of
>> > > the
>> > > > > tires by now. Maybe this week.
>> > > > >
>> > > > > (Yes, I'm using some of that time for this email :-) but I type
>> > fast.)
>> > > > >
>> > > > > That said, I would like to agitate for making 2.0 more real and
>> spend
>> > > > some
>> > > > > time on it now that I'm winding down with 0.98. I think that means
>> > > > > branching for 2.0 real soon now and even evicting things from 2.0
>> > > branch
>> > > > > that aren't finished or stable, leaving them only once again in the
>> > > > master
>> > > > > branch. Or, maybe just evicting them. Let's take it case by case.
>> > > > >
>> > > > > I think this feature can come in relatively safely. As added
>> > insurance,
>> > > > > let's admit the possibility it could be reverted on the 2.0 branch
>> if
>> > > > folks
>> > > > > working on stabilizing 2.0 decide to evict it because it is
>> > unfinished
>> > > or
>> > > > > unstable, because that certainly can happen. I would expect if talk
>> > > like
>> > > > > that starts, we'd get help finishing or stabilizing what's under
>> > > > discussion
>> > > > > for revert. Or, we'd have a revert. Either way the outcome is
>> > > acceptable.
>> > > > >
>> > > > >
>> > > > > On Wed, Sep 7, 2016 at 8:56 AM, Dima Spivak <dimaspi...@apache.org
>> >
>> > > > wrote:
>> > > > >
>> > > > > > I'm not sure that "There is already lots of half-baked code in
>> the
>> > > > > branch,
>> > > > > > so what's the harm in adding more?" is a good code commit
>> > philosophy
>> > > > for
>> > > > > a
>> > > > > > fault-tolerant distributed data store. ;)
>> > > > > >
>> > > > > > More seriously, a lack of test coverage for existing features
>> > > shouldn't
>> > > > > be
>> > > > > > used as justification for introducing new features with the same
>> > > > > > shortcomings. Ultimately, it's the end user who will feel the
>> pain,
>> > > so
>> > > > > > shouldn't we do everything we can to mitigate that?
>> > > > > >
>> > > > > > -Dima
>> > > > > >
>> > > > > > On Wed, Sep 7, 2016 at 8:46 AM, Vladimir Rodionov <
>> > > > > vladrodio...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Sean,
>> > > > > > >
>> > > > > > > * have docs
>> > > > > > >
>> > > > > > > Agree. We have a doc and backup is the most documented feature
>> > :),
>> > > we
>> > > > > > will
>> > > > > > > release it shortly to Apache.
>> > > > > > >
>> > > > > > > * have sunny-day correctness tests
>> > > > > > >
>> > > > > > > Feature has  close to 60 test cases, which run for approx 30
>> min.
>> > > We
>> > > > > can
>> > > > > > > add more, if community do not mind :)
>> > > > > > >
>> > > > > > > * have correctness-in-face-of-failure tests
>> > > > > > >
>> > > > > > > Any examples of these tests in existing features? In works, we
>> > > have a
>> > > > > > clear
>> > > > > > > understanding of what should be done by the time of 2.0
>> release.
>> > > > > > > That is very close goal for us, to verify IT monkey for
>> existing
>> > > > code.
>> > > > > > >
>> > > > > > > * don't rely on things outside of HBase for normal operation
>> > (okay
>> > > > for
>> > > > > > > advanced operation)
>> > > > > > >
>> > > > > > > We do not.
>> > > > > > >
>> > > > > > > Enormous time has been spent already on the development and
>> > testing
>> > > > the
>> > > > > > > feature, it has passed our internal tests and many rounds of
>> code
>> > > > > reviews
>> > > > > > > by HBase committers. We do not mind if someone from HBase
>> > community
>> > > > > > > (outside of HW) will review the code, but it will probably
>> takes
>> > > > > forever
>> > > > > > to
>> > > > > > > wait for volunteer?, the feature is quite large (1MB+
>> cumulative
>> > > > patch)
>> > > > > > >
>> > > > > > > 2.0 branch is full of half baked features, most of them are in
>> > > active
>> > > > > > > development, therefore I am not following you here, Sean? Why
>> > > > > HBASE-7912
>> > > > > > is
>> > > > > > > not good enough yet to be integrated into 2.0 branch?
>> > > > > > >
>> > > > > > > -Vlad
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Sep 7, 2016 at 8:23 AM, Sean Busbey <bus...@apache.org
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > On Tue, Sep 6, 2016 at 10:36 PM, Josh Elser <
>> > > josh.el...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > > > So, the answer to Sean's original question is "as robust as
>> > > > > snapshots
>> > > > > > > > > presently are"? (independence of backup/restore failure
>> > > tolerance
>> > > > > > from
>> > > > > > > > > snapshot failure tolerance)
>> > > > > > > > >
>> > > > > > > > > Is this just a question WRT context of the change, or is it
>> > > means
>> > > > > > for a
>> > > > > > > > veto
>> > > > > > > > > from you, Sean? Just trying to make sure I'm following
>> along
>> > > > > > > adequately.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > I'd say ATM I'm -0, bordering on -1 but not for reasons I can
>> > > > > > articulate
>> > > > > > > > well.
>> > > > > > > >
>> > > > > > > > Here's an attempt.
>> > > > > > > >
>> > > > > > > > We've been trying to move, as a community, towards minimizing
>> > > risk
>> > > > to
>> > > > > > > > downstream folks by getting "complete enough for use" gates
>> in
>> > > > place
>> > > > > > > > before we introduce new features. This was spurred by a some
>> > > > features
>> > > > > > > > getting in half-baked and never making it to "can really use"
>> > > > status
>> > > > > > > > (I'm thinking of distributed log replay and the zk-less
>> > > assignment
>> > > > > > > > stuff, I don't recall if there was more).
>> > > > > > > >
>> > > > > > > > The gates, generally, included things like:
>> > > > > > > >
>> > > > > > > > * have docs
>> > > > > > > > * have sunny-day correctness tests
>> > > > > > > > * have correctness-in-face-of-failure tests
>> > > > > > > > * don't rely on things outside of HBase for normal operation
>> > > (okay
>> > > > > for
>> > > > > > > > advanced operation)
>> > > > > > > >
>> > > > > > > > As an example, we kept the MOB work off in a branch and out
>> of
>> > > > master
>> > > > > > > > until it could pass these criteria. The big exemption we've
>> had
>> > > to
>> > > > > > > > this was the hbase-spark integration, where we all agreed it
>> > > could
>> > > > > > > > land in master because it was very well isolated (the slide
>> > away
>> > > > from
>> > > > > > > > including docs as a first-class part of building up that
>> > > > integration
>> > > > > > > > has led me to doubt the wisdom of this decision).
>> > > > > > > >
>> > > > > > > > We've also been treating inclusion in a "probably will be
>> > > released
>> > > > to
>> > > > > > > > downstream" branches as a higher bar, requiring
>> > > > > > > >
>> > > > > > > > * don't moderately impact performance when the feature isn't
>> in
>> > > use
>> > > > > > > > * don't severely impact performance when the feature is in
>> use
>> > > > > > > > * either default-to-on or show enough demand to believe a
>> > > > non-trivial
>> > > > > > > > number of folks will turn the feature on
>> > > > > > > >
>> > > > > > > > The above has kept MOB and hbase-spark integration out of
>> > > branch-1,
>> > > > > > > > presumably while they've "gotten more stable" in master from
>> > the
>> > > > odd
>> > > > > > > > vendor inclusion.
>> > > > > > > >
>> > > > > > > > Are we going to have a 2.0 release before the end of the
>> year?
>> > > > We're
>> > > > > > > > coming up on 1.5 years since the release of version 1.0;
>> seems
>> > > like
>> > > > > > > > it's about time, though I haven't seen any concrete plans
>> this
>> > > > year.
>> > > > > > > > Presuming we are going to have one by the end of the year, it
>> > > > seems a
>> > > > > > > > bit close to still be adding in "features that need maturing"
>> > on
>> > > > the
>> > > > > > > > branch.
>> > > > > > > >
>> > > > > > > > The lack of a concrete plan for 2.0 keeps me from considering
>> > > these
>> > > > > > > > things blocker at the moment. But I know first hand how much
>> > > > trouble
>> > > > > > > > folks have had with other features that have gone into
>> > downstream
>> > > > > > > > facing releases without robustness checks (i.e. replication),
>> > and
>> > > > I'm
>> > > > > > > > concerned about what we're setting up if 2.0 goes out with
>> this
>> > > > > > > > feature in its current state.
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best regards,
>> > > > >
>> > > > >    - Andy
>> > > > >
>> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
>> > > Hein
>> > > > > (via Tom White)
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > >
>> > >    - Andy
>> > >
>> > > Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>> > > (via Tom White)
>> > >
>> >
>>

Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

Reply via email to