Re: [VOTE] Merge branch HBASE-19397 back to master and branch-2.

Umesh Agashe Fri, 09 Mar 2018 10:55:36 -0800

+1

On Fri, Mar 9, 2018 at 10:40 AM, Stack <[email protected]> wrote:


> The intent is that this would come out in 2.1.0? This would be soon after a
> 2.0.0?
>
> S
>
> On Thu, Mar 8, 2018 at 11:41 PM, 张铎(Duo Zhang) <[email protected]>
> wrote:
>
> > Since branch-2.0 has been cut and branch-2 is now 2.1.0-SNAPSHOT, will
> > merge branch HBASE-19397-branch-2 back to branch-2.
> >
> > 2018-01-10 9:20 GMT+08:00 张铎(Duo Zhang) <[email protected]>:
> >
> > > If branch-2.0 will be out soon then let's target this to 2.1. No
> problem.
> > >
> > > Thanks.
> > >
> > > 2018-01-10 1:28 GMT+08:00 Stack <[email protected]>:
> > >
> > >> On Mon, Jan 8, 2018 at 8:19 PM, 张铎(Duo Zhang) <[email protected]>
> > >> wrote:
> > >>
> > >> > OK, let me merge it master first. And then create a
> > HBASE-19397-branch-2
> > >> > which will keep rebasing with the newest branch-2 to see if it is
> > stable
> > >> > enough. Since we can define this as a bug fix/refactoring rather
> than
> > a
> > >> big
> > >> > new feature, it is OK to integrate it at any time. If we think it is
> > >> stable
> > >> > enough before cutting branch-2.0 then we can include it in the 2.0.0
> > >> > release, else let's include it in 2.1(Maybe we can backport it to
> 2.0
> > >> > later?).
> > >> >
> > >> >
> > >>
> > >> I need to cut the Appy-suggested branch-2.0. Shout if
> > HBASE-19397-branch-2
> > >> gets to be too much work and I'll do it sooner rather than later. Or,
> if
> > >> easier on you, just say and I'll make the branch-2.0 now so you can
> just
> > >> commit to branch-2 (branch-2.0 will become hbase2.0, branch-2 will
> > become
> > >> hbase2.1...).
> > >>
> > >> St.Ack
> > >>
> > >>
> > >>
> > >>
> > >> > Thanks all here.
> > >> >
> > >> > 2018-01-09 12:06 GMT+08:00 ashish singhi <[email protected]
> >:
> > >> >
> > >> > > +1 to merge on master and 2.1.
> > >> > > Great work.
> > >> > >
> > >> > > Thanks,
> > >> > > Ashish
> > >> > >
> > >> > > -----Original Message-----
> > >> > > From: 张铎(Duo Zhang) [mailto:[email protected]]
> > >> > > Sent: Tuesday, January 09, 2018 6:53 AM
> > >> > > To: [email protected]
> > >> > > Subject: Re: [VOTE] Merge branch HBASE-19397 back to master and
> > >> branch-2.
> > >> > >
> > >> > > Anyway, if no objections on merging this into master, let's do it?
> > So
> > >> > that
> > >> > > we can start working on the follow-on features, such as table
> based
> > >> > > replication storage, and synchronous replication, etc.
> > >> > >
> > >> > > Thanks.
> > >> > >
> > >> > > 2018-01-09 7:19 GMT+08:00 Stack <[email protected]>:
> > >> > >
> > >> > > > On Mon, Jan 8, 2018 at 2:50 PM, 张铎(Duo Zhang) <
> > >> [email protected]>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > This 'new' feature only changes DDL part, not the core part of
> > >> > > > replication,
> > >> > > > > i.e, how to read wal entries and how to replicate it to the
> > remote
> > >> > > > cluster,
> > >> > > > > etc. And also there is no pb message/storage layout change,
> you
> > >> can
> > >> > > > > think of this as a big refactoring. Theoretically we even do
> not
> > >> > > > > need to add
> > >> > > > new
> > >> > > > > UTs for this feature, i.e, no extra stability works. The only
> > >> > > > > visible change to users is that it may require more time on
> > >> > > > > modifying peers in shell. So in general I think it is less
> hurt
> > to
> > >> > > > > include it in the coming release?
> > >> > > > >
> > >> > > > > And why I think it SHOULD be included in our 2.0 release is
> > that,
> > >> > > > > the synchronous guarantee is really a good thing for our
> > >> replication
> > >> > > > > related UTs. The correctness of the current Test***Replication
> > >> > > > > usually depends
> > >> > > > on a
> > >> > > > > flakey condition - we will not do a log rolling between the
> > >> > > > > modification
> > >> > > > on
> > >> > > > > zk and the actual loading of the modification on RS. And we
> have
> > >> > > > > also
> > >> > > > done
> > >> > > > > a hard work to cleanup the lockings in
> ReplicationSourceManager
> > >> and
> > >> > > > > add a fat comment to say why it should be synchronized in this
> > >> way.
> > >> > > > > In general, the new code is much easier to read, test and
> debug,
> > >> and
> > >> > > > > also reduce the possibility of flakeyness, which could help
> us a
> > >> lot
> > >> > > > > when we start to stabilize our build.
> > >> > > > >
> > >> > > > > Thanks.
> > >> > > > >
> > >> > > > >
> > >> > > > You see it as a big bug fix Duo?
> > >> > > >
> > >> > > Kind of. Just like the AM v1, we can do lots of fix to make it
> more
> > >> > > stable, but we know that we can not fix all the problems since we
> > >> store
> > >> > > state in several places and it is a 'mission impossible' to make
> all
> > >> the
> > >> > > states stay in sync under every situation... So we introduce AM
> v2.
> > >> > > For the replication peer tracking, it is the same problem. It is
> > hard
> > >> to
> > >> > > do fencing with zk watcher since it is asynchronous, so the UTs
> are
> > >> > always
> > >> > > kind of flakey in theoretical. And we depend on replication
> heavily
> > at
> > >> > > Xiaomi, it is always a pain for us.
> > >> > >
> > >> > > >
> > >> > > > I'm late to review. Will have a look after beta-1 goes out. This
> > >> stuff
> > >> > > > looks great from outside, especially distributed procedure
> > framework
> > >> > > > which we need all over the place.
> > >> > > >
> > >> > > > In general I have no problem w/ this in master and an hbase 2.1
> > (2.1
> > >> > > > could be soon after 2.0). Its late for big stuff in 2.0.... but
> > let
> > >> me
> > >> > > > take a looksee sir.
> > >> > > >
> > >> > > Thanks sir. All the concerns here about whether we should merge
> this
> > >> into
> > >> > > 2.0 are reasonable, I know. Although I really want this in 2.0
> > >> because I
> > >> > > believe it will help a lot for stabilizing,  I'm OK with merge it
> to
> > >> 2.1
> > >> > > only if you guys all think so.
> > >> > >
> > >> > > >
> > >> > > > St.Ack
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > > 2018-01-09 4:53 GMT+08:00 Apekshit Sharma <[email protected]
> >:
> > >> > > > >
> > >> > > > > > Same questions as Josh's.
> > >> > > > > > 1) We have RCs for beta1 now, which means only commits that
> > can
> > >> go
> > >> > > > > > in
> > >> > > > are
> > >> > > > > > bug fixes only. This change - although important, needed
> from
> > >> long
> > >> > > > > > time
> > >> > > > > and
> > >> > > > > > well done (testing, summary, etc) - seems rather very large
> to
> > >> get
> > >> > > > > > into
> > >> > > > > 2.0
> > >> > > > > > now. Needs good justification why it has to be 2.1 instead
> of
> > >> 2.0.
> > >> > > > > >
> > >> > > > > > -- Appy
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Mon, Jan 8, 2018 at 8:34 AM, Josh Elser <
> [email protected]
> > >
> > >> > > wrote:
> > >> > > > > >
> > >> > > > > > > -0 From a general project planning point-of-view (not
> based
> > on
> > >> > > > > > > the technical merit of the code) I am uncomfortable about
> > >> > > > > > > pulling in a
> > >> > > > > brand
> > >> > > > > > > new feature after we've already made one beta RC.
> > >> > > > > > >
> > >> > > > > > > Duo -- can you expand on why this feature is so important
> > that
> > >> > > > > > > we
> > >> > > > > should
> > >> > > > > > > break our release plan? Are there problems that would make
> > >> > > > > > > including
> > >> > > > > this
> > >> > > > > > > in a 2.1/3.0 unnecessarily difficult? Any kind of color
> you
> > >> can
> > >> > > > provide
> > >> > > > > > on
> > >> > > > > > > "why does this need to go into 2.0?" would be helpful.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On 1/6/18 1:54 AM, Duo Zhang wrote:
> > >> > > > > > >
> > >> > > > > > >> https://issues.apache.org/jira/browse/HBASE-19397
> > >> > > > > > >>
> > >> > > > > > >> We aim to move the peer modification framework from zk
> > >> watcher
> > >> > > > > > >> to procedure
> > >> > > > > > >> v2 in this issue and the work is done now.
> > >> > > > > > >>
> > >> > > > > > >> Copy the release note here:
> > >> > > > > > >>
> > >> > > > > > >> Introduce 5 procedures to do peer modifications:
> > >> > > > > > >>
> > >> > > > > > >>> AddPeerProcedure
> > >> > > > > > >>> RemovePeerProcedure
> > >> > > > > > >>> UpdatePeerConfigProcedure
> > >> > > > > > >>> EnablePeerProcedure
> > >> > > > > > >>> DisablePeerProcedure
> > >> > > > > > >>>
> > >> > > > > > >>> The procedures are all executed with the following
> stage:
> > >> > > > > > >>> 1. Call pre CP hook, if an exception is thrown then give
> > up
> > >> 2.
> > >> > > > > > >>> Check whether the operation is valid, if not then give
> up
> > 3.
> > >> > > > > > >>> Update peer storage. Notice that if we have entered this
> > >> > > > > > >>> stage,
> > >> > > > > then
> > >> > > > > > >>> we
> > >> > > > > > >>> can not rollback any more.
> > >> > > > > > >>> 4. Schedule sub procedures to refresh the peer config on
> > >> every
> > >> > > RS.
> > >> > > > > > >>> 5. Do post cleanup if any.
> > >> > > > > > >>> 6. Call post CP hook. The exception thrown will be
> ignored
> > >> > > > > > >>> since we
> > >> > > > > > have
> > >> > > > > > >>> already done the work.
> > >> > > > > > >>>
> > >> > > > > > >>> The procedure will hold an exclusive lock on the peer
> id,
> > so
> > >> > > > > > >>> now
> > >> > > > > there
> > >> > > > > > is
> > >> > > > > > >>> no concurrent modifications on a single peer.
> > >> > > > > > >>>
> > >> > > > > > >>> And now it is guaranteed that once the procedure is
> done,
> > >> the
> > >> > > > > > >>> peer modification has already taken effect on all RSes.
> > >> > > > > > >>>
> > >> > > > > > >>> Abstracte a storage layer for replication peer/queue
> > >> > > > > > >>> manangement,
> > >> > > > and
> > >> > > > > > >>> refactored the upper layer to remove zk related
> > >> > > > naming/code/comment.
> > >> > > > > > >>>
> > >> > > > > > >>> Add pre/postExecuteProcedures CP hooks to
> > >> > > > > > >>> RegionServerObserver, and
> > >> > > > > add
> > >> > > > > > >>> permission check for executeProcedures method which
> > requires
> > >> > > > > > >>> the
> > >> > > > > caller
> > >> > > > > > >>> to
> > >> > > > > > >>> be system user or super user.
> > >> > > > > > >>>
> > >> > > > > > >>> On rolling upgrade: just do not do any replication peer
> > >> > > > modifications
> > >> > > > > > >>> during the rolling upgrading. There is no pb/layout
> > changes
> > >> on
> > >> > > > > > >>> the peer/queue storage on zk.
> > >> > > > > > >>>
> > >> > > > > > >>>
> > >> > > > > > >> And there are other benefits.
> > >> > > > > > >> First, we have introduced a general procedure framework
> to
> > >> send
> > >> > > > tasks
> > >> > > > > to
> > >> > > > > > >> RS
> > >> > > > > > >> and report the report back to Master. It can be used to
> > >> > > > > > >> implement
> > >> > > > > other
> > >> > > > > > >> operations such as ACL change.
> > >> > > > > > >> Second, zk is used as a external storage now since we do
> > not
> > >> > > > > > >> depend
> > >> > > > on
> > >> > > > > > zk
> > >> > > > > > >> watcher any more, it will be much easier to implement a
> > >> 'table
> > >> > > > based'
> > >> > > > > > >> replication peer/queue storage.
> > >> > > > > > >>
> > >> > > > > > >> Please vote:
> > >> > > > > > >> [+1] Agree
> > >> > > > > > >> [-1] Disagree
> > >> > > > > > >> [0] Neutral
> > >> > > > > > >>
> > >> > > > > > >> Thanks.
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > >
> > >> > > > > > -- Appy
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [VOTE] Merge branch HBASE-19397 back to master and branch-2.

Reply via email to