Since branch-2.0 has been cut and branch-2 is now 2.1.0-SNAPSHOT, will merge branch HBASE-19397-branch-2 back to branch-2.
2018-01-10 9:20 GMT+08:00 张铎(Duo Zhang) <[email protected]>: > If branch-2.0 will be out soon then let's target this to 2.1. No problem. > > Thanks. > > 2018-01-10 1:28 GMT+08:00 Stack <[email protected]>: > >> On Mon, Jan 8, 2018 at 8:19 PM, 张铎(Duo Zhang) <[email protected]> >> wrote: >> >> > OK, let me merge it master first. And then create a HBASE-19397-branch-2 >> > which will keep rebasing with the newest branch-2 to see if it is stable >> > enough. Since we can define this as a bug fix/refactoring rather than a >> big >> > new feature, it is OK to integrate it at any time. If we think it is >> stable >> > enough before cutting branch-2.0 then we can include it in the 2.0.0 >> > release, else let's include it in 2.1(Maybe we can backport it to 2.0 >> > later?). >> > >> > >> >> I need to cut the Appy-suggested branch-2.0. Shout if HBASE-19397-branch-2 >> gets to be too much work and I'll do it sooner rather than later. Or, if >> easier on you, just say and I'll make the branch-2.0 now so you can just >> commit to branch-2 (branch-2.0 will become hbase2.0, branch-2 will become >> hbase2.1...). >> >> St.Ack >> >> >> >> >> > Thanks all here. >> > >> > 2018-01-09 12:06 GMT+08:00 ashish singhi <[email protected]>: >> > >> > > +1 to merge on master and 2.1. >> > > Great work. >> > > >> > > Thanks, >> > > Ashish >> > > >> > > -----Original Message----- >> > > From: 张铎(Duo Zhang) [mailto:[email protected]] >> > > Sent: Tuesday, January 09, 2018 6:53 AM >> > > To: [email protected] >> > > Subject: Re: [VOTE] Merge branch HBASE-19397 back to master and >> branch-2. >> > > >> > > Anyway, if no objections on merging this into master, let's do it? So >> > that >> > > we can start working on the follow-on features, such as table based >> > > replication storage, and synchronous replication, etc. >> > > >> > > Thanks. >> > > >> > > 2018-01-09 7:19 GMT+08:00 Stack <[email protected]>: >> > > >> > > > On Mon, Jan 8, 2018 at 2:50 PM, 张铎(Duo Zhang) < >> [email protected]> >> > > > wrote: >> > > > >> > > > > This 'new' feature only changes DDL part, not the core part of >> > > > replication, >> > > > > i.e, how to read wal entries and how to replicate it to the remote >> > > > cluster, >> > > > > etc. And also there is no pb message/storage layout change, you >> can >> > > > > think of this as a big refactoring. Theoretically we even do not >> > > > > need to add >> > > > new >> > > > > UTs for this feature, i.e, no extra stability works. The only >> > > > > visible change to users is that it may require more time on >> > > > > modifying peers in shell. So in general I think it is less hurt to >> > > > > include it in the coming release? >> > > > > >> > > > > And why I think it SHOULD be included in our 2.0 release is that, >> > > > > the synchronous guarantee is really a good thing for our >> replication >> > > > > related UTs. The correctness of the current Test***Replication >> > > > > usually depends >> > > > on a >> > > > > flakey condition - we will not do a log rolling between the >> > > > > modification >> > > > on >> > > > > zk and the actual loading of the modification on RS. And we have >> > > > > also >> > > > done >> > > > > a hard work to cleanup the lockings in ReplicationSourceManager >> and >> > > > > add a fat comment to say why it should be synchronized in this >> way. >> > > > > In general, the new code is much easier to read, test and debug, >> and >> > > > > also reduce the possibility of flakeyness, which could help us a >> lot >> > > > > when we start to stabilize our build. >> > > > > >> > > > > Thanks. >> > > > > >> > > > > >> > > > You see it as a big bug fix Duo? >> > > > >> > > Kind of. Just like the AM v1, we can do lots of fix to make it more >> > > stable, but we know that we can not fix all the problems since we >> store >> > > state in several places and it is a 'mission impossible' to make all >> the >> > > states stay in sync under every situation... So we introduce AM v2. >> > > For the replication peer tracking, it is the same problem. It is hard >> to >> > > do fencing with zk watcher since it is asynchronous, so the UTs are >> > always >> > > kind of flakey in theoretical. And we depend on replication heavily at >> > > Xiaomi, it is always a pain for us. >> > > >> > > > >> > > > I'm late to review. Will have a look after beta-1 goes out. This >> stuff >> > > > looks great from outside, especially distributed procedure framework >> > > > which we need all over the place. >> > > > >> > > > In general I have no problem w/ this in master and an hbase 2.1 (2.1 >> > > > could be soon after 2.0). Its late for big stuff in 2.0.... but let >> me >> > > > take a looksee sir. >> > > > >> > > Thanks sir. All the concerns here about whether we should merge this >> into >> > > 2.0 are reasonable, I know. Although I really want this in 2.0 >> because I >> > > believe it will help a lot for stabilizing, I'm OK with merge it to >> 2.1 >> > > only if you guys all think so. >> > > >> > > > >> > > > St.Ack >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > > 2018-01-09 4:53 GMT+08:00 Apekshit Sharma <[email protected]>: >> > > > > >> > > > > > Same questions as Josh's. >> > > > > > 1) We have RCs for beta1 now, which means only commits that can >> go >> > > > > > in >> > > > are >> > > > > > bug fixes only. This change - although important, needed from >> long >> > > > > > time >> > > > > and >> > > > > > well done (testing, summary, etc) - seems rather very large to >> get >> > > > > > into >> > > > > 2.0 >> > > > > > now. Needs good justification why it has to be 2.1 instead of >> 2.0. >> > > > > > >> > > > > > -- Appy >> > > > > > >> > > > > > >> > > > > > On Mon, Jan 8, 2018 at 8:34 AM, Josh Elser <[email protected]> >> > > wrote: >> > > > > > >> > > > > > > -0 From a general project planning point-of-view (not based on >> > > > > > > the technical merit of the code) I am uncomfortable about >> > > > > > > pulling in a >> > > > > brand >> > > > > > > new feature after we've already made one beta RC. >> > > > > > > >> > > > > > > Duo -- can you expand on why this feature is so important that >> > > > > > > we >> > > > > should >> > > > > > > break our release plan? Are there problems that would make >> > > > > > > including >> > > > > this >> > > > > > > in a 2.1/3.0 unnecessarily difficult? Any kind of color you >> can >> > > > provide >> > > > > > on >> > > > > > > "why does this need to go into 2.0?" would be helpful. >> > > > > > > >> > > > > > > >> > > > > > > On 1/6/18 1:54 AM, Duo Zhang wrote: >> > > > > > > >> > > > > > >> https://issues.apache.org/jira/browse/HBASE-19397 >> > > > > > >> >> > > > > > >> We aim to move the peer modification framework from zk >> watcher >> > > > > > >> to procedure >> > > > > > >> v2 in this issue and the work is done now. >> > > > > > >> >> > > > > > >> Copy the release note here: >> > > > > > >> >> > > > > > >> Introduce 5 procedures to do peer modifications: >> > > > > > >> >> > > > > > >>> AddPeerProcedure >> > > > > > >>> RemovePeerProcedure >> > > > > > >>> UpdatePeerConfigProcedure >> > > > > > >>> EnablePeerProcedure >> > > > > > >>> DisablePeerProcedure >> > > > > > >>> >> > > > > > >>> The procedures are all executed with the following stage: >> > > > > > >>> 1. Call pre CP hook, if an exception is thrown then give up >> 2. >> > > > > > >>> Check whether the operation is valid, if not then give up 3. >> > > > > > >>> Update peer storage. Notice that if we have entered this >> > > > > > >>> stage, >> > > > > then >> > > > > > >>> we >> > > > > > >>> can not rollback any more. >> > > > > > >>> 4. Schedule sub procedures to refresh the peer config on >> every >> > > RS. >> > > > > > >>> 5. Do post cleanup if any. >> > > > > > >>> 6. Call post CP hook. The exception thrown will be ignored >> > > > > > >>> since we >> > > > > > have >> > > > > > >>> already done the work. >> > > > > > >>> >> > > > > > >>> The procedure will hold an exclusive lock on the peer id, so >> > > > > > >>> now >> > > > > there >> > > > > > is >> > > > > > >>> no concurrent modifications on a single peer. >> > > > > > >>> >> > > > > > >>> And now it is guaranteed that once the procedure is done, >> the >> > > > > > >>> peer modification has already taken effect on all RSes. >> > > > > > >>> >> > > > > > >>> Abstracte a storage layer for replication peer/queue >> > > > > > >>> manangement, >> > > > and >> > > > > > >>> refactored the upper layer to remove zk related >> > > > naming/code/comment. >> > > > > > >>> >> > > > > > >>> Add pre/postExecuteProcedures CP hooks to >> > > > > > >>> RegionServerObserver, and >> > > > > add >> > > > > > >>> permission check for executeProcedures method which requires >> > > > > > >>> the >> > > > > caller >> > > > > > >>> to >> > > > > > >>> be system user or super user. >> > > > > > >>> >> > > > > > >>> On rolling upgrade: just do not do any replication peer >> > > > modifications >> > > > > > >>> during the rolling upgrading. There is no pb/layout changes >> on >> > > > > > >>> the peer/queue storage on zk. >> > > > > > >>> >> > > > > > >>> >> > > > > > >> And there are other benefits. >> > > > > > >> First, we have introduced a general procedure framework to >> send >> > > > tasks >> > > > > to >> > > > > > >> RS >> > > > > > >> and report the report back to Master. It can be used to >> > > > > > >> implement >> > > > > other >> > > > > > >> operations such as ACL change. >> > > > > > >> Second, zk is used as a external storage now since we do not >> > > > > > >> depend >> > > > on >> > > > > > zk >> > > > > > >> watcher any more, it will be much easier to implement a >> 'table >> > > > based' >> > > > > > >> replication peer/queue storage. >> > > > > > >> >> > > > > > >> Please vote: >> > > > > > >> [+1] Agree >> > > > > > >> [-1] Disagree >> > > > > > >> [0] Neutral >> > > > > > >> >> > > > > > >> Thanks. >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > >> > > > > > -- Appy >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
