[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576641#comment-13576641 ]
nkeywal commented on HBASE-7590: -------------------------------- current patch shows the work in progress. All tests passes, with or without the multicast activated. It works also on a real cluster. I've got some work to do still: - I've hijacked the current ClusterStatus protobuf, I'm going to create a specific one - I need to do some cleanup around ServerName & ServerCallable. - plus various. > Add a costless notifications mechanism from master to regionservers & clients > ----------------------------------------------------------------------------- > > Key: HBASE-7590 > URL: https://issues.apache.org/jira/browse/HBASE-7590 > Project: HBase > Issue Type: Bug > Components: Client, master, regionserver > Affects Versions: 0.96.0 > Reporter: nkeywal > Assignee: nkeywal > Attachments: 7590.inprogress.patch > > > t would be very useful to add a mechanism to distribute some information to > the clients and regionservers. Especially It would be useful to know globally > (regionservers + clients apps) that some regionservers are dead. This would > allow: > - to lower the load on the system, without clients using staled information > and going on dead machines > - to make the recovery faster from a client point of view. It's common to use > large timeouts on the client side, so the client may need a lot of time > before declaring a region server dead and trying another one. If the client > receives the information separatly about a region server states, it can take > the right decision, and continue/stop to wait accordingly. > We can also send more information, for example instructions like 'slow down' > to instruct the client to increase the retries delay and so on. > Technically, the master could send this information. To lower the load on > the system, we should: > - have a multicast communication (i.e. the master does not have to connect to > all servers by tcp), with once packet every 10 seconds or so. > - receivers should not depend on this: if the information is available great. > If not, it should not break anything. > - it should be optional. > So at the end we would have a thread in the master sending a protobuf message > about the dead servers on a multicast socket. If the socket is not > configured, it does not do anything. On the client side, when we receive an > information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira