Here is my +1. Executed the test suites several times, with -Dsurefire.secondPartForkCount=1, and also the exclusion of flakey tests(including TestFromClientSide) I could get a successful build.
Started two clusters with the code of the latest HBASE-19397, tried adding a new peer, it worked fine. Loaded 1M rows with LTT at the source cluster, and verified at the dest cluster, passed. During the loading, I disabled the peer for a while and then enabled it. It did not effect the correctness. And after disabling the peer, the qps on dest cluster became 0 immediately, which is the expected behavior(compare to the old, asynchronous zk watcher approach). Both clusters have 5 nodes, and the add/remove/enable/disable peer commands in shell can always return within 2 seconds, which is acceptable I think. 2018-01-06 14:54 GMT+08:00 Duo Zhang <zhang...@apache.org>: > https://issues.apache.org/jira/browse/HBASE-19397 > > We aim to move the peer modification framework from zk watcher to > procedure v2 in this issue and the work is done now. > > Copy the release note here: > > Introduce 5 procedures to do peer modifications: >> AddPeerProcedure >> RemovePeerProcedure >> UpdatePeerConfigProcedure >> EnablePeerProcedure >> DisablePeerProcedure >> >> The procedures are all executed with the following stage: >> 1. Call pre CP hook, if an exception is thrown then give up >> 2. Check whether the operation is valid, if not then give up >> 3. Update peer storage. Notice that if we have entered this stage, then >> we can not rollback any more. >> 4. Schedule sub procedures to refresh the peer config on every RS. >> 5. Do post cleanup if any. >> 6. Call post CP hook. The exception thrown will be ignored since we have >> already done the work. >> >> The procedure will hold an exclusive lock on the peer id, so now there is >> no concurrent modifications on a single peer. >> >> And now it is guaranteed that once the procedure is done, the peer >> modification has already taken effect on all RSes. >> >> Abstracte a storage layer for replication peer/queue manangement, and >> refactored the upper layer to remove zk related naming/code/comment. >> >> Add pre/postExecuteProcedures CP hooks to RegionServerObserver, and add >> permission check for executeProcedures method which requires the caller to >> be system user or super user. >> >> On rolling upgrade: just do not do any replication peer modifications >> during the rolling upgrading. There is no pb/layout changes on the >> peer/queue storage on zk. >> > > And there are other benefits. > First, we have introduced a general procedure framework to send tasks to > RS and report the report back to Master. It can be used to implement other > operations such as ACL change. > Second, zk is used as a external storage now since we do not depend on zk > watcher any more, it will be much easier to implement a 'table based' > replication peer/queue storage. > > Please vote: > [+1] Agree > [-1] Disagree > [0] Neutral > > Thanks. > > >