xkrogen edited a comment on pull request #3976:
URL: https://github.com/apache/hadoop/pull/3976#issuecomment-1044778317


   Quick disclaimer, it has been a while since I have looked at any 
proxy-provider code.
   
   This PR looks in the wrong direction to me. As you mentioned, 
`failoverProxy` is used to service _write requests_, which must be serviced by 
the active NameNode (in opposition to _read requests_, which can be serviced by 
Observer NNs by looking inside `nameNodeProxies`). So, if you want to contact 
the active NN, the right way to do so is to use `failoverProxy`. Note that 
`msync()` is not a special case here -- other write RPCs also require the 
active NN.
   
   You shared logs that some standby NNs are contacted before the active is 
found. I guess you are using `ConfiguredFailoverProxyProvider`? In this 
implementation, you list multiple NN addresses, and upon startup, the client 
has no idea which one is active. It has to go through and contact each one 
until it finds one which is active. So it is expected that under normal 
operation you will see logs like the ones you shared, where it contacts standby 
NNs while searching for the active. After it finds the active, then it should 
remain sticky, and so (assuming there are no changes in active NN), you should 
only see those logs when the client first submits an RPC.
   
   Your new implementation is trying to scan through the NameNodes and check 
their status to find the active, but this seems to be breaking the contract 
with `failoverProxy`, which is expected to be delegated to for active/standby 
determination.
   
   If you want to change the active/standby determination, you should change 
the behavior in your `AbstractNNFailoverProxyProvider` (e.g. 
`ConfiguredFailoverProxyProvider`), not the behavior of 
`ObserverReadProxyProvider`, which should only layer _on top of_ the 
`AbstractNNFailoverProxyProvider` to provide the additional Observer NN 
functionality.
   
   cc @sunchao @shvachko in case you have any thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to