On Sun, Nov 15, 2020 at 11:20 PM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:
> So what is your purpose of distributing the request of region location > lookup? It is just because you want to 'distribute the request of region > location lookup'? > > Then I'm -1 on merging. We should reach an agreement on what we want to > solve before merging at least. > > HERE.1 > I've helped this issue from the design doc step. For me, the purpose for > this issue is clear. We want to prevent the hotspot of meta, so the > solution is simple, enable meta replica, and then just modify the client to > not always go to primary replica first(this is the old behavior even with > meta replica feature on). > And this will introduce another problem that, there is no meta region > replication implementation for meta read replicas, which means the latency > will be large as we can only sync the data between replicas through region > flush, so we implement meta region replication. > > So I think it is very important to verify that we have truly distributed > the request of region location lookup, and also make sure that we could > support more requests of region location lookup. Otherwise this feature is > useless. > > And I agree with Andrew that, since the feature is default off on branch-2 > and has no regression, it is OK to merge for now. Theoretically our > approach here should work, so even it does not work for now, I think we > could fix the problems to make it work. > > HERE.2 I agree with all of the above between HERE.1 and HERE.2 (except the suggestion that the old behavior of read replicas is that they went to the replica first; they go to the primary first -- see [1], [2]). Lets work with any misalignment of understanding/communication offline and not in the way of merge. Thanks, S 1. http://hbase.apache.org/book.html#_timeline_consistency "In case a read is performed with Consistency.TIMELINE, then the read RPC will be sent to the primary region server first." 2. https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java#L195 > But your reply above made me wonder whether we are talking about the same > thing. That's why I'm -1 here. I'm not going to force you to do the test > suggested by me, as I said it could be done after merging, just want to > reach an agreement on the goal of this feature. > > Thanks. > > Stack <st...@duboce.net> 于2020年11月16日周一 下午12:35写道: > > > On Sun, Nov 15, 2020 at 9:16 AM Andrew Purtell <andrew.purt...@gmail.com > > > > wrote: > > > > > I agree with Duo’s comment that a performance gain is unlikely but > would > > > be orthogonal anyway; > > > > > > Perf observation is just an aside in the issue. Perf is orthogonal as you > > say above (as long as no regression). > > > > > > > > > it’s an availability gain that is the goal. We can assume it based on > > > theory of operation and unit test results but the gain should be tested > > and > > > measured on a cluster too. > > > > > > > > > The feature is about distributing load on hbase:meta to alleviate > > hotspotting; it makes read replicas more live so replicas are more likely > > to satisfy location lookups making read replicas more effective. That > read > > replicas improve HA is presumed -- it was the original justification for > > this years old commit -- but HA is not the focus of this addition; hence > no > > reports on effectiveness in this area. > > > > I have no problem working on such tests/reports but suggest that they are > > done post merge. > > > > > > > > > That said, the results of the testing thus far indicate no regression, > > > which gives me confidence to support a merge. Specifically, a merge to > > > “unblock” 2.4 (we aren’t really blocked, we are waiting), provided the > > > default there is the feature is configured off. But please indicate in > > > documentation and release notes that the feature is not widely tested > > yet - > > > as is customarily done for new functionality like this. > > > > > > > > No problem w/ flagging the feature as new. > > > > Thanks, > > S > > > > > > > > > > > > > On Nov 15, 2020, at 5:20 AM, 张铎 <palomino...@gmail.com> wrote: > > > > > > > > Replied on jira, I think we missed an important scenario when > testing. > > > > > > > > Thanks. > > > > > > > > Stack <st...@duboce.net> 于2020年11月15日周日 上午2:30写道: > > > > > > > >> HBASE-18070 makes it so hbase:meta read replicas can run closer to > the > > > >> primary, (< second lags rather than minutes). It adds Async WAL > > > >> Replication[1] on the hbase:meta table; i.e. edits are sprayed > across > > > >> replicas as they arrive at the primary's WAL. Before this work, > Async > > > WAL > > > >> Replication was only available on user-space tables and the only > > option > > > for > > > >> hbase:meta read replicas was reloading the primaries hfiles on a > > period > > > >> (minutes). HBASE-18070 also adds an optional client-side > 'LoadBalance' > > > >> policy that favors read replicas ahead of primary reads falling back > > to > > > the > > > >> primary on fault. Together, these additions allow distributing > > > hbase:meta > > > >> read load across primary and replicas alleviating 'hotspotting'. > > > >> > > > >> I would like to merge the feature to master branch Monday evening if > > > there > > > >> is no objection (Soon after I'll merge to branch-2 so this feature > can > > > >> hopefully be included in the upcoming 2.4.0RC). > > > >> > > > >> * For the design, see [2]. > > > >> * For an amalgamated PR of the 5 or 6 reviewed PRs that comprise > this > > > >> feature, see [3]. > > > >> * For a PE report that compared performance before and after, see > > > >> HBASE-25127 (no regression). > > > >> * A report on ITBLL runs is pending to be attached to HBASE-18070 > but > > > runs > > > >> so far show no regression with the feature enabled (ITBLL runs were > > done > > > >> against a backport of this feature to branch-2 as the ITBLL state of > > > master > > > >> is currently an unknown). > > > >> > > > >> Testing continues mainly looking for further improvement and to > better > > > >> understand this feature in operation. Documentation is included but > in > > > need > > > >> of polish (working on it). > > > >> > > > >> Dump any questions here and I'll be happy to respond. If you need > more > > > time > > > >> to review, just shout. > > > >> > > > >> Thanks and thanks to all who contributed to this feature; the > > reviewers > > > and > > > >> the testers in particular. > > > >> > > > >> S > > > >> > > > >> 1. http://hbase.apache.org/book.html#_asnyc_wal_replication > > > >> 2. > > > >> > > > >> > > > > > > https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit# > > > >> This patch is currently missing HBASE-25280, a bug found in testing. > > > >> 3. https://github.com/apache/hbase/pull/2643 > > > >> > > > > > >