Spent the day playing around with this on 6 nodes.

I've found some rough edges: some known (the docs blocker Vlad pointed out) and some that I think are unknown (tooling is rough -- partially from wrong help messages and, I think, changes in design like Master-submitted MR jobs).

But, Stack's assessment (and Andrew's reminder) that further tweaking would just throw us back into another review cycle is a real concern. It's unfortunate that this feature has lingered so long aside of master to get to this point, but I don't see any realistic resolution to this problem than a merge. In the future, this is something we'll have to try harder to avoid letting happen (looking in the mirror with quota work...)

Thankfully, I can help out with development/review on the outstanding blockers (notably, the two I pruned from Vlad's original of five -- the other three still seem to be improvements). In addition to these blockers, I believe the documentation *must* be updated before a release to note that this feature is still growing -- it does not feel like a quality feature that I've come to expect from this community. This isn't a knock on Vlad and company; this is a hard problem and one that I could not have done better given time constraints, but it is also one that users will demand simplicity and the utmost correctness around. To this end, I will also try to help out to smooth out these issues in the following 2.x release.

So, this leaves me to say: +1 to merge with the caveat that the docs are updated to make sure that any known, user-pitfalls are clearly documented. This vote also comes with as real of a promise that I can make to help avoid any issues with this feature that would prevent a 2.0 release.

Thanks to all the giants whose shoulders I'm standing on to be able to make this vote.

Andrew Purtell wrote:
Great, and I changed my vote to -0 because Stack made a good argument that
making more changes would invalidate review up to this point, and I trust
this will be resolved before release.

On Tue, Mar 14, 2017 at 4:29 PM, Josh Elser<[email protected]>  wrote:

Sorry Andrew, let me clarify as that didn't come out right.

I didn't mean that isn't a conversation worth having _now_, just that I
was intentionally avoiding it in my previous email because I didn't
understand the scope of those issues that Vlad had identified. I wanted to
better understand what they were really meant for a user before coming to
my own decision about whether or not I think they are blockers.

That aside, I would agree with you that HBASE-15227 sounds like a real
blocker. Forcing a new full backup for every table sounds really bad --
that's not the kind of experience we'd want a user to have.

Andrew Purtell wrote:

I'm going to intentional avoid addressing the discussion of shipping
partial features (related, but not relevant at the moment).

Then we are not having the same conversation, because it is precisely
because this is a vote for this feature to go into 2.0, which is already
overdue, so should be released yesterday, that I took mention of "blocker"
at face value. At least one of them seems to certainly approach this
definition. It will be not user friendly, to say the least, to use this in
a large scale deployment with HBASE-15227 unfinished. HBASE-15227
currently
has a severity of BLOCKER. Despite what is going on in our politics, words
matter and we do not get to redefine them for convenience.

Once this work is merged in, how is HBASE-15227 not a blocker for 2.0?
Because Vlad offered to reduce its severity to make me feel better?
Currently the description on the issue is "System must be tolerant to
faults. Backup operations MUST be atomic (no partial completion state in
system table)" Sounds like a blocker to me, indeed. It is an honest
assessment and I don't think anyone is doing the community a favor by
trying to walk that back.


On Tue, Mar 14, 2017 at 1:57 PM, Josh Elser<[email protected]>   wrote:

I took a moment to read through the "blockers" as originally identified by
Vlad, and (to echo Enis' take) I read the majority of them as being
blockers not for the next release, but for a "full-fledged feature". I'm
going to intentional avoid addressing the discussion of shipping partial
features (related, but not relevant at the moment).

HBASE-15227 is actually the one that bothers me the most, with
HBASE-17133
coming in close behind. Vlad, is there any documentation you can point me
to about what the current issues are with the current implementation? For
example, what happens now if the system has some kind of "partial
completion state"?

Documentation is one of those that is really hard to judge. We want to
get
this code out for people to use (and to free up our strained dev
resources), but what good is some feature if the docs are
missing/incomplete?

I think I could stomach the docs being inaccurate (with a clear
disclaimer
that the chapter is incomplete -- that's a 5min task). But, I think I
need
an answer about how the feature handles our common dist-sys category of
problems before I can consider whether I'm ok with the feature hitting
2.0...

I'll also try to throw up a few nodes and play with it to address the
problem as an (ignorant) user ;)


Andrew Purtell wrote:

I don't like that issues were identified as "blockers" but now there is
an
attempt to walk that back.

I don't like that development of this feature has lingered for a long
time
in this unfinished state when this work could have been done by now, now
that we are trying to get a 2.0 out the door. Because this is a
volunteer
project I cannot make any demand that it should be done, but I can
certainly look at the current state and be nonplussed. This will be yet
another half finished thing in 2.0 when this merge happens. Promises to
finish the unfinished work are nice but not currency. Commits are
currency.
I hope at least the fault tolerance changes can be completed and
committed
before we spin a 2.0 RC, and without causing a 2.0 release to slip
further.

Also, marking something experimental should be done on the merits of
that
evaluation, not simply to justify dropping unfinished work into a
release
branch.

I will change my vote to -0.


On Mon, Mar 13, 2017 at 4:05 PM, Enis Söztutar<[email protected]>
  wrote:

I think there is some misconception of using the term "blockers" for

referring to those jiras. My understanding is that those three jiras
are
blockers for the backup functionality to be more mature and more
usable.
But they are not release blockers. Let's say we merged the code in, and
for
some reason those did not get addressed in time. We can still do the
2.0
release without having to wait for the commits. We can instead mark the
"backup" feature as experimental with known issues and go on with the
release. In that sense they are not real release blockers.

We are proposing the merge at this time because of the above that
maintaining this in a branch is becoming extremely costly and not
productive for the HBase community. Realistically, we cannot have the
luxury of having to wait another couple of months and doing yet another
giant round of reviews because the code base is a moving target.

Enis

On Mon, Mar 13, 2017 at 3:46 PM, Devaraj Das<[email protected]>
wrote:

Vlad, on the first point, I think what Stack is saying is that creating

the new branch (as Ted did) ignores the feedback incorporated thus far
in
the iterations of the mega-patch. That's a wrong way to go.
On the separation into a backup module, again, that was reverted to
ease
reviews of the mega-patch, and was noted as work to be done later. I

think
Stack just wants to make the list of remaining work more complete by
citing
that as pending work.
________________________________________
From: Vladimir Rodionov<[email protected]>
Sent: Monday, March 13, 2017 3:09 PM
To: [email protected]
Subject: Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing
3/11/2017

    It ignores the feedback

If I "ignore" feedback, I put my comment - why? I am always  open for

further discussions. If reviewer does not insist on a particular
request

-
it will be dropped. I think it is fair.
he list is incomplete because a bunch of

follow-ons came of the review cycle (including moving backup/restore
out
of

core to live in its own module).
For those who were not following our lengthy conversation on a review

board, separation of a backup code into a separate module has been
done last year, but has been reverted back by request of a reviewer.


-Vladimir

On Mon, Mar 13, 2017 at 2:23 PM, Stack<[email protected]>    wrote:

On Fri, Mar 10, 2017 at 9:09 PM, Stack<[email protected]>    wrote:

On Fri, Mar 10, 2017 at 6:01 PM, Ted Yu<[email protected]>
  wrote:

HBASE-14123 branch has been created, with Vlad's mega patch v61.

The patch put up for VOTE here was done on a branch. The call to

merge
seems to have been premature given the many cycles of review and test

that

happened subsequent (The cycles burned a bunch of dev resource).
The patch as is is now in a state where it is too big for our infra;

rb
and JIRA are creaking under the size and # of iterations.

Adding finish of new JIRAs to this merge implies a new round of
review
and

test of an already massive patch. Who is going to do this work?
Going back to a new branch seems wrong route to take.

St.Ack



To be more explicit, this patch was developed on a branch and then a

bunch
of dev resources were burned getting it into a state where it could be
merged to master. Going back to a branch to bulk up the merge so it
includes more JIRAs than the many it already incorporates is the
wrong
direction for us to be headed in. It ignores the feedback given and
the
work done by Vladimir slimming down an already over-broad scope. It
is

also
predicated on abundant review and testing resource being on tap to
cycle
on

a feature that is useful, but non-core.
The patch is ready for merge IMO. Geoffrey makes a nice list of what
is
still to do though IIRC, the list is incomplete because a bunch of
follow-ons came of the review cycle (including moving backup/restore

out
of

core to live in its own module).
The patch needs three votes to merge. I am not against merge but I am

not
voting for the patch because I do have any more time to spend on this

non-core feature and feel that a vote will have me assume a

responsibility
I will not shirk.
S




FYI
On Fri, Mar 10, 2017 at 3:30 PM, Ted Yu<[email protected]>

wrote:
Thanks for the feedback, Andrew.
How about the following plan:
create branch HBASE-14123 off of master with mega patch v61 as the

first
commit (reviewed by Stack and Enis)

Vlad and I continue development (the 3 blockers) based on
HBASE-14123
    branch
when all of the blockers get +1 and merged into HBASE-14123
branch,
we
propose to community for merging into branch-2 (master branch, if
branch-2
doesn't materialize for whatever reason) again
Cheers


On Fri, Mar 10, 2017 at 3:01 PM, Andrew Purtell<

[email protected]>
wrote:
Thanks for the offer but I like that you were honest about
compiling

a
list
of issues that you thought were blockers for release. Since this
proposal
is a merge into 2.0, and we are trying to release 2.0, I am -1 on
this

merge until those blockers are addressed.
I had a look at the list.
I think the documentation issue is important but not actually a

blocker.
That may be a controversial opinion, but documentation can be
back-filled
worst case. So take HBASE-17133 off the list.

Remaining are effectively HBASE-14417, HBASE-14141, and

HBASE-15227.
They
all have patches attached to the respective JIRAs so completing
this

work
won't be onerous. Get these committed and I will lift my -1. The
others

who
voted +1 on this thread surely can help with that.
Thanks.


On Fri, Mar 10, 2017 at 2:32 PM, Vladimir Rodionov<
[email protected]>
wrote:

No problem I will downgrade Blockers to Majors if it scares
you,

Andrew
[image: [image: 🙂]]
Sent from my iPhone
On Mar 10, 2017, at 1:52 PM, Andrew Purtell<
[email protected]

wrote:
​I know the merge of this feature has lagged substantially. I
think

that
is
regrettable but on another thread we are lamenting that 2.0
is
already
late. Unless I misunderstand, this is a proposal to merge
something
with
known blockers into trunk before we branch it for 2.0 which
will

effectively prevent that release because these blockers will
be
there. I
am
inclined to veto. Probably we should not propose branch
merges
into
code
we
are trying to get out the door with known blockers. Why not
do
that
work
first? It seems an obvious question. Perhaps I am missing
something.

If we can branch for 2.0 now and then merge this, and not
into
the
2.0
branch, I would vote +1 for branch merge even with known
blockers
pending.
On Fri, Mar 10, 2017 at 1:42 PM, Vladimir Rodionov<

[email protected]>
wrote:
They are not blockers for merge - only for 2.0. GA

As I said already the feature is usable right now
We would like to continue working on master and we would

like
to
see
a
commitment from community
Sent from my iPhone
On Mar 10, 2017, at 11:16 AM, Andrew Purtell<

[email protected]
wrote:
Only BLOCKERs and CRITICALs are guaranteed for HBase 2.0
release.
If we have identified blockers, why merge this before they
are
in?
Otherwise we can't release 2.0, and it is overdue.
On Wed, Mar 8, 2017 at 1:32 PM, Vladimir Rodionov<

[email protected]>
wrote:
Hello, HBase folks

For your consideration today is Backup/Restore feature for

Apache
HBAse
2.0.
Backup code is available as a mega patch in HBASE-14123
(v61),
applies
cleanly to the current master, all test PASS, patch has no
other
issues.
The patch has gone through numerous rounds of code reviews
and
has
probably
the most lengthy discussion thread on Apache JIRA
(HBASE-14123)

:)
The work has been split into 3 phases (HBASE-14030, 14123,
14414)
Two
first
are complete, third one is still in progress.
*** Summary of work HBASE-14123

The new feature introduces new command-line extensions to

the
hbase
command
and, from the client side, is accessible through
command-line

only
Operations:
* Create full backup on a list of tables or backup set
* Create incremental backup image for table list or backup

set
* Restore list of tables from a given backup image
* Show current backup progress
* Delete backup image and all related images
* Show history of backups
* Backup set operations: create backup set, add/remove

table
to/from
backup
set, etc
In the current implementation, the feature is already

usable,
meaning
that
users can backup tables and restore them using provided
command-line

tools.
Both: full and incremental backups are supported.
This work is based on original work of IBM team

(HBASE-7912).
The
full
list
of JIRAs included in this mega patch can be found in three
umbrella

JIRAs:
HBASE-14030 (Phase 1), HBASE-14123 (Phase 2) and
HBASE-14414

(Phase 3
-
all
resolved ones made it into the patch)
*** What are the remaining work items

All remaining items can be found in Phase 3 umbrella JIRA:

HBASE-14414.
They are split into 3 groups: BLOCKER, CRITICAL, MAJOR
Only BLOCKERs and CRITICALs are guaranteed for HBase 2.0
release.
***** BLOCKER
* HBASE-14417 Incremental backup and bulk loading ( Patch
available)
* HBASE-14135 HBase Backup/Restore Phase 3: Merge backup
images
* HBASE-14141 HBase Backup/Restore Phase 3: Filter WALs on
backup
to
include only edits from backup tables (Patch available)
* HBASE-17133 Backup documentation
* HBASE-15227 Fault tolerance support

***** CRITICAL

* HBASE-16465 Disable split/merges during backup

We have umbrella JIRA (HBASE-14414) to track all the

remaining
work
All the BLOCKER and CRITICAL JIRAs currently in open state
will
be
implemented by 2.0 release time. Some MAJOR too, but it
depends
on
resource
availability
The former development branch (HBASE-7912) is obsolete and

will
be
closed/deleted after the merge.
We want backup to be a GA feature in 2.0
We are going to support full backward compatibility for

backup
tool in
2.0
and onwards.
**** Configuration

Backup is disabled, by default. To enable it, the

following
configuration
properties must be added to hbase-site.xml:
hbase.backup.enable=true
hbase.master.logcleaner.plugins=YOUR_PLUGINS,org.
apache.hadoop.hbase.backup.master.BackupLogCleaner
hbase.procedure.master.classes=YOUR_CLASSES,org.
apache.hadoop.hbase.backup.master.

LogRollMasterProcedureManager
hbase.procedure.regionserver.classes=YOUR_CLASSES,org.
apache.hadoop.hbase.backup.regionserver.
LogRollRegionServerProcedureMa
nager
I would like to thank IBM team and Jerry He for original

work,
Enis, Ted, Stack, Matteo, Jerry for time spent on code
reviews
Special thanks to Ted Yu for his co-development work.
References:
https://issues.apache.org/jira/browse/HBASE-7912

(original
IBM,
contains
design doc)
https://issues.apache.org/jira/browse/HBASE-14030 (Phase

1)
https://issues.apache.org/jira/browse/HBASE-14123 (Phase
2)
https://issues.apache.org/jira/browse/HBASE-14414 (Phase
3)
Please  vote +1/-1 by midnight Pacific Time (00:00
-0800 GMT) on March 11th  ​on whether or not we should
merge
this
into
the
current master.
-Vladimir Rodionov


--
Best regards,

    - Andy

If you are given a choice, you believe you have acted

freely. -
Raymond
Teller (via Peter Watts)
--
Best regards,

     - Andy

If you are given a choice, you believe you have acted

freely. -
Raymond
Teller (via Peter Watts)
--
Best regards,

      - Andy

If you are given a choice, you believe you have acted freely. -

Raymond
Teller (via Peter Watts)




Reply via email to