[GitHub] [hbase] huaxiangsun commented on pull request #2570: HBASE-25126 Add load balance logic in hbase-client to distribute read…

2020-11-03 Thread GitBox


huaxiangsun commented on pull request #2570:
URL: https://github.com/apache/hbase/pull/2570#issuecomment-720797426


   This request is replaced by a new request against master branch, will do a 
backport once the master patch is reviewed and merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] huaxiangsun commented on pull request #2570: HBASE-25126 Add load balance logic in hbase-client to distribute read…

2020-10-25 Thread GitBox


huaxiangsun commented on pull request #2570:
URL: https://github.com/apache/hbase/pull/2570#issuecomment-716266724


   Will post a new patch based on master, the major change in the master branch 
is that meta replica # is part of meta table, test and selector initialization 
needs to be changed accordingly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] huaxiangsun commented on pull request #2570: HBASE-25126 Add load balance logic in hbase-client to distribute read…

2020-10-22 Thread GitBox


huaxiangsun commented on pull request #2570:
URL: https://github.com/apache/hbase/pull/2570#issuecomment-714760677


   > > Do you think "fallback to primary" logic needs to be passed down from 
rpc retrying caller? Then it needs to be aware of this feature and needs to 
maintain some state. Was trying to avoid it. Yeah, this part can be optimized 
later.
   > 
   > If you want to make a decision based on whether the previous returned 
location is stale, then you need to pass something down from the rpc retrying 
caller to the selector, as only in rpc retrying caller we can get the exception 
we want. Either by adding a parameter as I proposed above to not go to 
secondary replicas, or as your current solution, adding a stale cache in the 
selector and you need to pass down the exception and the related location.
   > My concern here is that, the current 'simple' algorithm may not work well 
for some cases as my mentioned above. I suggest that we remove this part of 
code from the patch here(just do random selecting, without other consideration) 
to get this patch in quickly, and then open another issue to discuss the 
suitable algorithm, or maybe multiple algorithms.
   > 
   
   Thank Duo.
   
   I think I am still not 100% clear here. Let me try to explain my 
understanding of what you thought before moving forward.
   1. My current understanding of rpc retry caller and AsyncRegionLocator.
   
   If a location is stale, rpc retry caller will detect it and pass down 
exception to AsyncRegionLocator(AsyncNonMetaRegionLocator), 
AsyncNonMetaRegionLocator updates/cleans up table cache accordingly.
   
   2. In the next retry, if there is no table cache entry for the key, it will 
call locateInMeta() to fetch location from meta.
   
   In the above flow, rpc retry caller layer is the one detecting stale 
location, it is up to AsyncRegionLocator layer to handle all the logic.
   
   Similarly, meta selector interface tries to do similar and build the state 
it needs to make decision inside itself.
   
   As for different algorithm, i.e, the case you described above, a concrete 
example, lets there there are 5 meta replicas, if the location from meta 
replica 1 stale, it may try other meta replica. We can have a different 
implementation of selector,
   AdvancedSelector, which can implement such algorithm based on the state 
maintained in selector. I.e, the staleCache entry has info to indicate 
fromMetaReplicaId. 
   
   Thought that the selector interface is designed for this purpose, maybe I 
miss something here.
   
   The initial implementation, I kept fromMetaReplica in tableCache entry. When 
the patch was out for review, I dropped this change, as the current simple 
algorithm really does not need this info. The select interface keeps 
fromMetaReplicaId for future enhancement so when it is needed, at least the 
interface does not need to be changed.
   
   I can drop fromMetaReplicaId in the current simple implementation as it is 
not needed anyway.
   
   Maybe you were referring to add a flag in locateInMeta() method? If the flag 
says going with primary only, it will bypass all these logic and goes only to 
primary region so it will give upper layer more control.
   
   Please share more thoughts, thanks.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] huaxiangsun commented on pull request #2570: HBASE-25126 Add load balance logic in hbase-client to distribute read…

2020-10-21 Thread GitBox


huaxiangsun commented on pull request #2570:
URL: https://github.com/apache/hbase/pull/2570#issuecomment-713833194


   Forgot to mention that patch against master, the only nit is testing cases 
as there is no meta replication support in master, will check the tests and see 
how much test can be ported.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] huaxiangsun commented on pull request #2570: HBASE-25126 Add load balance logic in hbase-client to distribute read…

2020-10-21 Thread GitBox


huaxiangsun commented on pull request #2570:
URL: https://github.com/apache/hbase/pull/2570#issuecomment-713829326


   > On a high level design, we used to have a `hbase.meta.replicas.use` config 
to control whether to make use meta replicas at client side. Do we want to 
deprecate this config and let our new configs rule here? Just asking.
   > 
   
   For me, original thought is that `hbase.meta.replicas.use` is an existing 
config, so it still needs to be honored. A new "mode" config is added to 
support LoadBalance(new) and HighlyAvailable(the existing mode). If we want to 
deprecate this config, then, it makes sense.
   
   > For me, I think the scope of `hbase.meta.replicas.use` is too wide, as it 
will impact all the client operations on meta(not sure if we have hacked all 
the places but at least it is designed to). After reviewing the PR, I think our 
approach here is only for the locator related logic right? This is also what I 
expect. In general, I think there are 3 ways we access meta table at client 
side:
   > 
   > 1. Locator related logic. This is the most critical path at client side 
and also makes the most pressure on meta table.
   > 2. Admin related logic. We have delegated most of the operaations through 
master but there are still some places we will access meta directly. But admin 
operation is expected to be low frequent so I do not think we need to deal with 
it here.
   > 3. Users access meta table directly by their own. This is controlled by 
user written code so I do not think we need to deal with it either, users 
should take care of it by their own.
   
   Agreed.
   
   > 
   > I think only 1 is what we really care here, so I suggest that, we just 
narrow the scope of the newly introduced configs to be locator only(maybe by 
adding 'locator' keyword in the config name), and consider it first before 
`hbase.meta.replicas.use` in the locator related logic. So users do not need to 
set `hbase.meta.replicas.use` to true to enable this feature, just select the 
LB mode. If the new config is disabled(set to None maybe), then we honor the 
old `hbase.meta.replicas.use` config.
   
   Ok.
   
   > 
   > In general, I think the abstraction and trick here are both good. Setting 
the replica id directly on Query is a straight forward way to archive our goal 
here, and the chooser or selector is also a good abstraction. The only concern 
is how to implement the 'fallback to primary' logic as we need to pass down 
from the rpc retrying caller of the actual exception type, anyway, this can be 
done later.
   
   Do you think "fallback to primary" logic needs to be passed down from rpc 
retrying caller? Then it needs to be aware of this feature and needs to 
maintain some state. Was trying to avoid it. Yeah, this part can be optimized 
later.
   
   > 
   > And I suggest we just make this PR against master branch, it is only 
client side change and whether we have implement the meta region replication 
should not impact the code. Why I suggest this is that, the code for master and 
branch-2 will be different, as on branch-2, the sync client has its own logic 
to do region locating, which is not well constructed I believe(we expose a 
bunch of locating methods in ClusterConnection interface and use it 
everywhere). So if we want to include this feature in 2.4.0, we'd better make 
this PR against master, and also backport it to branch-2 ASAP.
   
   Yeah, the client side change is quite independent. It will work with today's 
meta replica, only nit is that there will be more stale data. 
   I will put a new patch against master. All implementation will be renamed as 
Catalog instead of meta, as it can be applied to similar location lookup 
service. The config will remain same as here it applies to meta only.
   
   Thanks for the feedbacks, Duo.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org