+1 for a merge to master firstly ( no-binding ) If merge into the master branch as early as possible, we can develop follow-on features (table based features & sync-replication) as early as possible.
On Tue, Jan 9, 2018 at 9:28 AM, Andrew Purtell <apurt...@apache.org> wrote: > FWIW, you have my +1 for a merge to master. > > > > On Mon, Jan 8, 2018 at 5:22 PM, 张铎(Duo Zhang) <palomino...@gmail.com> > wrote: > > > Anyway, if no objections on merging this into master, let's do it? So > that > > we can start working on the follow-on features, such as table based > > replication storage, and synchronous replication, etc. > > > > Thanks. > > > > 2018-01-09 7:19 GMT+08:00 Stack <st...@duboce.net>: > > > > > On Mon, Jan 8, 2018 at 2:50 PM, 张铎(Duo Zhang) <palomino...@gmail.com> > > > wrote: > > > > > > > This 'new' feature only changes DDL part, not the core part of > > > replication, > > > > i.e, how to read wal entries and how to replicate it to the remote > > > cluster, > > > > etc. And also there is no pb message/storage layout change, you can > > think > > > > of this as a big refactoring. Theoretically we even do not need to > add > > > new > > > > UTs for this feature, i.e, no extra stability works. The only visible > > > > change to users is that it may require more time on modifying peers > in > > > > shell. So in general I think it is less hurt to include it in the > > coming > > > > release? > > > > > > > > And why I think it SHOULD be included in our 2.0 release is that, the > > > > synchronous guarantee is really a good thing for our replication > > related > > > > UTs. The correctness of the current Test***Replication usually > depends > > > on a > > > > flakey condition - we will not do a log rolling between the > > modification > > > on > > > > zk and the actual loading of the modification on RS. And we have also > > > done > > > > a hard work to cleanup the lockings in ReplicationSourceManager and > > add a > > > > fat comment to say why it should be synchronized in this way. In > > general, > > > > the new code is much easier to read, test and debug, and also reduce > > the > > > > possibility of flakeyness, which could help us a lot when we start to > > > > stabilize our build. > > > > > > > > Thanks. > > > > > > > > > > > You see it as a big bug fix Duo? > > > > > Kind of. Just like the AM v1, we can do lots of fix to make it more > stable, > > but we know that we can not fix all the problems since we store state in > > several places and it is a 'mission impossible' to make all the states > stay > > in sync under every situation... So we introduce AM v2. > > For the replication peer tracking, it is the same problem. It is hard to > do > > fencing with zk watcher since it is asynchronous, so the UTs are always > > kind of flakey in theoretical. And we depend on replication heavily at > > Xiaomi, it is always a pain for us. > > > > > > > > I'm late to review. Will have a look after beta-1 goes out. This stuff > > > looks great from outside, especially distributed procedure framework > > which > > > we need all over the place. > > > > > > In general I have no problem w/ this in master and an hbase 2.1 (2.1 > > could > > > be soon after 2.0). Its late for big stuff in 2.0.... but let me take a > > > looksee sir. > > > > > Thanks sir. All the concerns here about whether we should merge this into > > 2.0 are reasonable, I know. Although I really want this in 2.0 because I > > believe it will help a lot for stabilizing, I'm OK with merge it to 2.1 > > only if you guys all think so. > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > 2018-01-09 4:53 GMT+08:00 Apekshit Sharma <a...@cloudera.com>: > > > > > > > > > Same questions as Josh's. > > > > > 1) We have RCs for beta1 now, which means only commits that can go > in > > > are > > > > > bug fixes only. This change - although important, needed from long > > time > > > > and > > > > > well done (testing, summary, etc) - seems rather very large to get > > into > > > > 2.0 > > > > > now. Needs good justification why it has to be 2.1 instead of 2.0. > > > > > > > > > > -- Appy > > > > > > > > > > > > > > > On Mon, Jan 8, 2018 at 8:34 AM, Josh Elser <els...@apache.org> > > wrote: > > > > > > > > > > > -0 From a general project planning point-of-view (not based on > the > > > > > > technical merit of the code) I am uncomfortable about pulling in > a > > > > brand > > > > > > new feature after we've already made one beta RC. > > > > > > > > > > > > Duo -- can you expand on why this feature is so important that we > > > > should > > > > > > break our release plan? Are there problems that would make > > including > > > > this > > > > > > in a 2.1/3.0 unnecessarily difficult? Any kind of color you can > > > provide > > > > > on > > > > > > "why does this need to go into 2.0?" would be helpful. > > > > > > > > > > > > > > > > > > On 1/6/18 1:54 AM, Duo Zhang wrote: > > > > > > > > > > > >> https://issues.apache.org/jira/browse/HBASE-19397 > > > > > >> > > > > > >> We aim to move the peer modification framework from zk watcher > to > > > > > >> procedure > > > > > >> v2 in this issue and the work is done now. > > > > > >> > > > > > >> Copy the release note here: > > > > > >> > > > > > >> Introduce 5 procedures to do peer modifications: > > > > > >> > > > > > >>> AddPeerProcedure > > > > > >>> RemovePeerProcedure > > > > > >>> UpdatePeerConfigProcedure > > > > > >>> EnablePeerProcedure > > > > > >>> DisablePeerProcedure > > > > > >>> > > > > > >>> The procedures are all executed with the following stage: > > > > > >>> 1. Call pre CP hook, if an exception is thrown then give up > > > > > >>> 2. Check whether the operation is valid, if not then give up > > > > > >>> 3. Update peer storage. Notice that if we have entered this > > stage, > > > > then > > > > > >>> we > > > > > >>> can not rollback any more. > > > > > >>> 4. Schedule sub procedures to refresh the peer config on every > > RS. > > > > > >>> 5. Do post cleanup if any. > > > > > >>> 6. Call post CP hook. The exception thrown will be ignored > since > > we > > > > > have > > > > > >>> already done the work. > > > > > >>> > > > > > >>> The procedure will hold an exclusive lock on the peer id, so > now > > > > there > > > > > is > > > > > >>> no concurrent modifications on a single peer. > > > > > >>> > > > > > >>> And now it is guaranteed that once the procedure is done, the > > peer > > > > > >>> modification has already taken effect on all RSes. > > > > > >>> > > > > > >>> Abstracte a storage layer for replication peer/queue > manangement, > > > and > > > > > >>> refactored the upper layer to remove zk related > > > naming/code/comment. > > > > > >>> > > > > > >>> Add pre/postExecuteProcedures CP hooks to RegionServerObserver, > > and > > > > add > > > > > >>> permission check for executeProcedures method which requires > the > > > > caller > > > > > >>> to > > > > > >>> be system user or super user. > > > > > >>> > > > > > >>> On rolling upgrade: just do not do any replication peer > > > modifications > > > > > >>> during the rolling upgrading. There is no pb/layout changes on > > the > > > > > >>> peer/queue storage on zk. > > > > > >>> > > > > > >>> > > > > > >> And there are other benefits. > > > > > >> First, we have introduced a general procedure framework to send > > > tasks > > > > to > > > > > >> RS > > > > > >> and report the report back to Master. It can be used to > implement > > > > other > > > > > >> operations such as ACL change. > > > > > >> Second, zk is used as a external storage now since we do not > > depend > > > on > > > > > zk > > > > > >> watcher any more, it will be much easier to implement a 'table > > > based' > > > > > >> replication peer/queue storage. > > > > > >> > > > > > >> Please vote: > > > > > >> [+1] Agree > > > > > >> [-1] Disagree > > > > > >> [0] Neutral > > > > > >> > > > > > >> Thanks. > > > > > >> > > > > > >> > > > > > > > > > > > > > > > -- > > > > > > > > > > -- Appy > > > > > > > > > > > > > > > > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands > - A23, Crosstalk > -- ============================== Openinx blog : http://openinx.github.io TO BE A GREAT HACKER ! ==============================