Oh, this is my fault. I mean the old behavior IS to go to primary replica
first, which is what we want to change here.

And what I commented  on jira, is to say that we do not need to get a
performance improvement before merging, it is not the goal of this issue.
And I suggested that if we want to show our advantage, we need to get the
primary replica fucked up. I do not know why then the discussion went to
the HedgeRead and I could not poll it back. I do not think this should
block the merging but even though it was still very hard to communicate, so
I assumed this means we still have a big gap on what we want to solve here,
thus I voted a -1 here.

I think we need to go back to the beginning, to reach an agreement on the
goal here. Let’s review the design doc again to see if we missed something
which lead us to this situation.

And I need to say that, I do not want to block the issue to be merged. I
tried my best to speed up the process. I suggested to land the changes at
client side to master directly but was refused. I helped to add scan on
specific replica feature soon on branch-2 to let the port to branch-2 can
be landed cleanly.

On a mobile device so can not review the code or PR. Very busy these days.
And the health examination this morning told me that I had a high blood
pressure. Not a good birthday present. Will get back to the issue when
possible.

Thanks.

Stack <[email protected]>于2020年11月17日 周二06:34写道:

> On Sun, Nov 15, 2020 at 11:20 PM 张铎(Duo Zhang) <[email protected]>
> wrote:
>
> > So what is your purpose of distributing the request of region location
> > lookup? It is just because you want to 'distribute the request of region
> > location lookup'?
> >
> > Then I'm -1 on merging. We should reach an agreement on what we want to
> > solve before merging at least.
> >
> >
> HERE.1
>
>
> > I've helped this issue from the design doc step. For me, the purpose for
> > this issue is clear. We want to prevent the hotspot of meta, so the
> > solution is simple, enable meta replica, and then just modify the client
> to
> > not always go to primary replica first(this is the old behavior even with
> > meta replica feature on).
> > And this will introduce another problem that, there is no meta region
> > replication implementation for meta read replicas, which means the
> latency
> > will be large as we can only sync the data between replicas through
> region
> > flush, so we implement meta region replication.
> >
> > So I think it is very important to verify that we have truly distributed
> > the request of region location lookup, and also make sure that we could
> > support more requests of region location lookup. Otherwise this feature
> is
> > useless.
> >
> > And I agree with Andrew that, since the feature is default off on
> branch-2
> > and has no regression, it is OK to merge for now. Theoretically our
> > approach here should work, so even it does not work for now, I think we
> > could fix the problems to make it work.
> >
> >
> HERE.2
>
> I agree with all of the above between HERE.1 and HERE.2 (except the
> suggestion that the old behavior of read replicas is that they went to the
> replica first; they go to the primary first -- see [1], [2]).
>
> Lets work with any misalignment of understanding/communication offline and
> not in the way of merge.
>
> Thanks,
> S
>
> 1. http://hbase.apache.org/book.html#_timeline_consistency "In case a read
> is performed with Consistency.TIMELINE, then the read RPC will be sent to
> the primary region server first."
> 2.
>
> https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java#L195
>
>
>
> > But your reply above made me wonder whether we are talking about the same
> > thing. That's why I'm -1 here. I'm not going to force you to do the test
> > suggested by me, as I said it could be done after merging, just want to
> > reach an agreement on the goal of this feature.
> >
> > Thanks.
> >
> > Stack <[email protected]> 于2020年11月16日周一 下午12:35写道:
> >
> > > On Sun, Nov 15, 2020 at 9:16 AM Andrew Purtell <
> [email protected]
> > >
> > > wrote:
> > >
> > > > I agree with Duo’s comment that a performance gain is unlikely but
> > would
> > > > be orthogonal anyway;
> > >
> > >
> > > Perf observation is just an aside in the issue. Perf is orthogonal as
> you
> > > say above (as long as no regression).
> > >
> > >
> > >
> > > > it’s an availability gain that is the goal. We can assume it based on
> > > > theory of operation and unit test results but the gain should be
> tested
> > > and
> > > > measured on a cluster too.
> > > >
> > >
> > >
> > > The feature is about distributing load on hbase:meta to alleviate
> > > hotspotting; it makes read replicas more live so replicas are more
> likely
> > > to satisfy location lookups making read replicas more effective. That
> > read
> > > replicas improve HA is presumed -- it was the original justification
> for
> > > this years old commit -- but HA is not the focus of this addition;
> hence
> > no
> > > reports on effectiveness in this area.
> > >
> > > I have no problem working on such tests/reports but suggest that they
> are
> > > done post merge.
> > >
> > >
> > >
> > > > That said, the results of the testing thus far indicate no
> regression,
> > > > which gives me confidence to support a merge. Specifically, a merge
> to
> > > > “unblock” 2.4 (we aren’t really blocked, we are waiting), provided
> the
> > > > default there is the feature is configured off. But please indicate
> in
> > > > documentation and release notes that the feature is not widely tested
> > > yet -
> > > > as is customarily done for new functionality like this.
> > > >
> > > >
> > > No problem w/ flagging the feature as new.
> > >
> > > Thanks,
> > > S
> > >
> > >
> > >
> > > >
> > > > > On Nov 15, 2020, at 5:20 AM, 张铎 <[email protected]> wrote:
> > > > >
> > > > > Replied on jira, I think we missed an important scenario when
> > testing.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Stack <[email protected]> 于2020年11月15日周日 上午2:30写道:
> > > > >
> > > > >> HBASE-18070 makes it so hbase:meta read replicas can run closer to
> > the
> > > > >> primary, (< second lags rather than minutes). It adds Async WAL
> > > > >> Replication[1] on the hbase:meta table; i.e. edits are sprayed
> > across
> > > > >> replicas as they arrive at the primary's WAL. Before this work,
> > Async
> > > > WAL
> > > > >> Replication was only available on user-space tables and the only
> > > option
> > > > for
> > > > >> hbase:meta read replicas was reloading the primaries hfiles on a
> > > period
> > > > >> (minutes). HBASE-18070 also adds an optional client-side
> > 'LoadBalance'
> > > > >> policy that favors read replicas ahead of primary reads falling
> back
> > > to
> > > > the
> > > > >> primary on fault. Together, these additions allow distributing
> > > > hbase:meta
> > > > >> read load across primary and replicas alleviating 'hotspotting'.
> > > > >>
> > > > >> I would like to merge the feature to master branch Monday evening
> if
> > > > there
> > > > >> is no objection (Soon after I'll merge to branch-2 so this feature
> > can
> > > > >> hopefully be included in the upcoming 2.4.0RC).
> > > > >>
> > > > >> * For the design, see [2].
> > > > >> * For an amalgamated PR of the 5 or 6 reviewed PRs that comprise
> > this
> > > > >> feature, see [3].
> > > > >> * For a PE report that compared performance before and after, see
> > > > >> HBASE-25127 (no regression).
> > > > >> * A report on ITBLL runs is pending to be attached to HBASE-18070
> > but
> > > > runs
> > > > >> so far show no regression with the feature enabled (ITBLL runs
> were
> > > done
> > > > >> against a backport of this feature to branch-2 as the ITBLL state
> of
> > > > master
> > > > >> is currently an unknown).
> > > > >>
> > > > >> Testing continues mainly looking for further improvement and to
> > better
> > > > >> understand this feature in operation. Documentation is included
> but
> > in
> > > > need
> > > > >> of polish (working on it).
> > > > >>
> > > > >> Dump any questions here and I'll be happy to respond. If you need
> > more
> > > > time
> > > > >> to review, just shout.
> > > > >>
> > > > >> Thanks and thanks to all who contributed to this feature; the
> > > reviewers
> > > > and
> > > > >> the testers in particular.
> > > > >>
> > > > >> S
> > > > >>
> > > > >> 1. http://hbase.apache.org/book.html#_asnyc_wal_replication
> > > > >> 2.
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit#
> > > > >> This patch is currently missing HBASE-25280, a bug found in
> testing.
> > > > >> 3. https://github.com/apache/hbase/pull/2643
> > > > >>
> > > >
> > >
> >
>

Reply via email to