If there are no big concerns here, I will go with #1 first as it could
solve the problem at least for the new versions of HBase and include
it in 2.6.0.

We can implement other approaches later as improvements.

Thanks.

张铎(Duo Zhang) <palomino...@gmail.com> 于2024年2月8日周四 23:05写道:
>
> Some backgrounds first.
> HBASE-28321 is for solving the problem where master and region server
> both implement ClientMetaService, but if they use different server
> principals, in our code client implementation, we can only config one
> principal pattern which makes it either can not connect to master, or
> can not connect to region server.
>
> In the design doc[1], we described a way to deal with the problem, by
> sending a special preamble header to the rpc server, to let the server
> tell us the correct server principal. And we also describe a fallback
> logic that, if we receive a FatalConnectionException with an
> unexpected header, we could know the remote side is an old server and
> then randomly choose a server principal to connect.
>
> But when implementing, I found out that the fallback logic is not
> easy. As when sending a FatalConnectionException back, in our current
> implementation, we will use this exception to fail all the pending rpc
> calls. And even if we remove this logic, the server will close the
> connection, and still cause all the pending rp calls to fail.
>
> In general, I think there are 4 ways to deal with this problem.
>
> 1. Let it go. Even if we have the fallback logic, it could still fail
> if we choose the wrong server principal at client side, and the
> feature is completely broken between old client and old server under
> this scenario, at least we have fixed for new client and new server.
> And in our compatibility guide, we do not guarantee the compatibility
> between new client and old server.
> 2. Set a flag in the RpcConnection instance, when the upper layer
> issues a retry, we will skip the security preamble call, just randomly
> select a server principal to use.
> 3. Based on #2's effort, issue a special exception for this failure,
> and in AbstractRpcClient, do not finish the stub call with this
> exception, instead, just issue a new call to hide the retry logic to
> the upper layer.
> 4. Retry at rpc connection level.
>
> For #1, we do not need to do anything special.
> For #2 and #3, we need to do more hacking work, but I can still
> imagine how to archive this in our code base
> For #4, I do not have ideas on how to archive this yet...
>
> Thoughts? Thanks.
>
>
> 1. 
> https://docs.google.com/document/d/1Cu-qzAdBGyBKM07aQP06RM0oeFSLPGtQFWuV_TDyBNg/edit?usp=sharing

Reply via email to