> For the replication peer tracking, it is the same problem. It is hard to
do
> fencing with zk watcher since it is asynchronous, so the UTs are always
> kind of flakey in theoretical.

> the synchronous guarantee is really a good thing for our replication
related UTs.

> And we have also done a hard work to cleanup the lockings in
ReplicationSourceManager and add a
> fat comment to say why it should be synchronized in this way.

That stuff just makes me love it.

Although it's improving stability (some concrete data on this would go long
way), it's a big refactor (750kb patch), and is changing Admin, and will
take a couple days to get reviewed.
As much as I would have loved to see it go in 2.0  (and would have
prioritized it's review if it was 10 days earlier), at this point, it just
seems too late.
Let's do it in 2.1 please.



On Mon, Jan 8, 2018 at 7:34 PM, Stack <st...@duboce.net> wrote:

> On Mon, Jan 8, 2018 at 5:22 PM, 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
>
> > Anyway, if no objections on merging this into master, let's do it? So
> that
> > we can start working on the follow-on features, such as table based
> > replication storage, and synchronous replication, etc.
> >
> >
>
> +1 for Master.
>
>
>
> > > You see it as a big bug fix Duo?
> > >
> >
>
> ...
>
> > For the replication peer tracking, it is the same problem. It is hard to
> do
> > fencing with zk watcher since it is asynchronous, so the UTs are always
> > kind of flakey in theoretical. And we depend on replication heavily at
> > Xiaomi, it is always a pain for us.
> >
> >
> I have sympathy for this argument (Especially if you call it a bug fix or
> can argue that it fixes a bunch of flakies).
>
> 2.0.0 is so late though. We can't do anything that might shove out the date
> even further. No reason 2.1 couldn't follow just after.
>
> Have you tried this addition at scale? Can you point to marked improvement
> in accounting?
>
> What if you set up a branch-2-HBASE-19397 on
>  jenkins; does it pass as many tests or more?
>
> St.Ack
>
>
> >
> > > I'm late to review. Will have a look after beta-1 goes out. This stuff
> > > looks great from outside, especially distributed procedure framework
> > which
> > > we need all over the place.
> > >
> > > In general I have no problem w/ this in master and an hbase 2.1 (2.1
> > could
> > > be soon after 2.0). Its late for big stuff in 2.0.... but let me take a
> > > looksee sir.
> > >
> > Thanks sir. All the concerns here about whether we should merge this into
> > 2.0 are reasonable, I know. Although I really want this in 2.0 because I
> > believe it will help a lot for stabilizing,  I'm OK with merge it to 2.1
> > only if you guys all think so.
> >
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > >
> > > > 2018-01-09 4:53 GMT+08:00 Apekshit Sharma <a...@cloudera.com>:
> > > >
> > > > > Same questions as Josh's.
> > > > > 1) We have RCs for beta1 now, which means only commits that can go
> in
> > > are
> > > > > bug fixes only. This change - although important, needed from long
> > time
> > > > and
> > > > > well done (testing, summary, etc) - seems rather very large to get
> > into
> > > > 2.0
> > > > > now. Needs good justification why it has to be 2.1 instead of 2.0.
> > > > >
> > > > > -- Appy
> > > > >
> > > > >
> > > > > On Mon, Jan 8, 2018 at 8:34 AM, Josh Elser <els...@apache.org>
> > wrote:
> > > > >
> > > > > > -0 From a general project planning point-of-view (not based on
> the
> > > > > > technical merit of the code) I am uncomfortable about pulling in
> a
> > > > brand
> > > > > > new feature after we've already made one beta RC.
> > > > > >
> > > > > > Duo -- can you expand on why this feature is so important that we
> > > > should
> > > > > > break our release plan? Are there problems that would make
> > including
> > > > this
> > > > > > in a 2.1/3.0 unnecessarily difficult? Any kind of color you can
> > > provide
> > > > > on
> > > > > > "why does this need to go into 2.0?" would be helpful.
> > > > > >
> > > > > >
> > > > > > On 1/6/18 1:54 AM, Duo Zhang wrote:
> > > > > >
> > > > > >> https://issues.apache.org/jira/browse/HBASE-19397
> > > > > >>
> > > > > >> We aim to move the peer modification framework from zk watcher
> to
> > > > > >> procedure
> > > > > >> v2 in this issue and the work is done now.
> > > > > >>
> > > > > >> Copy the release note here:
> > > > > >>
> > > > > >> Introduce 5 procedures to do peer modifications:
> > > > > >>
> > > > > >>> AddPeerProcedure
> > > > > >>> RemovePeerProcedure
> > > > > >>> UpdatePeerConfigProcedure
> > > > > >>> EnablePeerProcedure
> > > > > >>> DisablePeerProcedure
> > > > > >>>
> > > > > >>> The procedures are all executed with the following stage:
> > > > > >>> 1. Call pre CP hook, if an exception is thrown then give up
> > > > > >>> 2. Check whether the operation is valid, if not then give up
> > > > > >>> 3. Update peer storage. Notice that if we have entered this
> > stage,
> > > > then
> > > > > >>> we
> > > > > >>> can not rollback any more.
> > > > > >>> 4. Schedule sub procedures to refresh the peer config on every
> > RS.
> > > > > >>> 5. Do post cleanup if any.
> > > > > >>> 6. Call post CP hook. The exception thrown will be ignored
> since
> > we
> > > > > have
> > > > > >>> already done the work.
> > > > > >>>
> > > > > >>> The procedure will hold an exclusive lock on the peer id, so
> now
> > > > there
> > > > > is
> > > > > >>> no concurrent modifications on a single peer.
> > > > > >>>
> > > > > >>> And now it is guaranteed that once the procedure is done, the
> > peer
> > > > > >>> modification has already taken effect on all RSes.
> > > > > >>>
> > > > > >>> Abstracte a storage layer for replication peer/queue
> manangement,
> > > and
> > > > > >>> refactored the upper layer to remove zk related
> > > naming/code/comment.
> > > > > >>>
> > > > > >>> Add pre/postExecuteProcedures CP hooks to RegionServerObserver,
> > and
> > > > add
> > > > > >>> permission check for executeProcedures method which requires
> the
> > > > caller
> > > > > >>> to
> > > > > >>> be system user or super user.
> > > > > >>>
> > > > > >>> On rolling upgrade: just do not do any replication peer
> > > modifications
> > > > > >>> during the rolling upgrading. There is no pb/layout changes on
> > the
> > > > > >>> peer/queue storage on zk.
> > > > > >>>
> > > > > >>>
> > > > > >> And there are other benefits.
> > > > > >> First, we have introduced a general procedure framework to send
> > > tasks
> > > > to
> > > > > >> RS
> > > > > >> and report the report back to Master. It can be used to
> implement
> > > > other
> > > > > >> operations such as ACL change.
> > > > > >> Second, zk is used as a external storage now since we do not
> > depend
> > > on
> > > > > zk
> > > > > >> watcher any more, it will be much easier to implement a 'table
> > > based'
> > > > > >> replication peer/queue storage.
> > > > > >>
> > > > > >> Please vote:
> > > > > >> [+1] Agree
> > > > > >> [-1] Disagree
> > > > > >> [0] Neutral
> > > > > >>
> > > > > >> Thanks.
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > -- Appy
> > > > >
> > > >
> > >
> >
>



-- 

-- Appy

Reply via email to