Re: Client/Server Multicast Discovery and Failover

David Blevins Mon, 29 Sep 2008 14:47:23 -0700

Checked in a small test app that allows this stuff to be taken for aspin.


  http://svn.apache.org/repos/asf/geronimo/sandbox/failover/

Hopefully we can use it as a tool to start getting some feedback.Some of the parts of it are getting reworked, but should be runnablesoon.


-David

On Sep 12, 2008, at 5:43 PM, David Blevins wrote:

I've added some functionality to OpenEJB trunk which has beenenabled in Geronimo trunk. Here's an overview of how it works:
DISCOVERY
What we have going on from a tech perspective is each server sendsand receives a multicast heartbeat. Each multicast packet containsa single URI that advertises a service, its group, and itslocation. Say for example "cluster1:ejb:ejbd://thehost:4201". Wecan definitely explore the SLP format as Alan suggests.
There are other advantages of the simple, unchanging, URI style.The URI is essentially stateless as there is no "i'm alive" URI oran "i'm dead" URI, there is simply a URI for each service a serveroffers and its presence on the network indicates its availabilityand its absence indicates the service is no longer available. Inthis way the issues with UDP being unordered and unreliable meltaway as state is no longer a concern and packet sizes are alwayssmall. Complicated libraries that ride atop UDP and attempt tooffer reliability (retransmission) and ordering on UDP can beavoided. UDP/Multicast is only used for discovery and from there onout critical information is transmitted over TCP/IP which isobviously going to do a better job at ensuring reliability andordering.
On the client side of things, a special "multicast://" URL can beused in the InitialContext properties to signify that multicastshould be used to seed the connection process. Such as:
  Properties properties = new Properties();
properties.setProperty(Context.INITIAL_CONTEXT_FACTORY,"org.apache.openejb.client.RemoteInitialContextFactory");properties.setProperty(Context.PROVIDER_URL, "multicast://239.255.2.3:6142");
  InitialContext remoteContext = new InitialContext(properties);
The URL has optional query parameters such as "schemes" and "group"and "timeout" which allow you to zero in on a particular type ofservice of a particular cluster group as well as set how long youare willing to wait in the discovery process till finally givingup. The first matching service that it sees "flowing" around on theUDP stream is the one it picks and sticks to for that and subsequentrequests, ensuring UDP is only used when there are no other serversto talk to.
FAILOVER
On each request the server, the client will send the version numberassociated with the list of servers in the cluster it is aware of.Initially this version will be zero and the list will be empty.Only when the server sees the client has an old list will the serversend the updated list. This is an important distinction as the list(ClusterMetaData) is not transmitted back and forth on everyrequest, only on change. If the membership of the cluster is stablethere is essentially no clustering overhead to the protocol -- 8byte overhead to each request and 1 byte on each response -- so youwill *not* see an exponential slowdown in response times the moremembers are added to the cluster. This new list takes affect forall proxies that share the same ServerMetaData data. Internally wekey the ClusterMetaData by ServerMetaData. I originally had theversion be a simple "increment by one" strategy, but eventually wentwith the value of System.currentTimeMillis(). It's possible morethan one server is reachable via the ServerMetaData (i.e.multicast://) and each server has it's own list and version number.Secondly, if a server is restarted, the version number will go backto zero and the client could be stuck thinking it has a more currentlist than the server.
When a server shuts down, more connections are refused, existingconnections not in mid-request are closed, any remaining connectionsare closed immediately after completion of the request in progressand clients can failover gracefully to the next server in the list.If a server crashes requests are retried on the next server in thelist. This failover pattern is followed until there are no moreservers in the list at which point the client attempts a finalmulticast search (if it was created with a multicast PROVIDER_URL)before abandoning the request and throwing an exception to thecaller. Currently, the failover is ordered but could very easily bemade random. The multicast discovery aspect of the client adds anice randomness to the selection of the first server that is perhapssomewhat "just". Theoretically, servers that are under more loadwill send out less heart beats than servers with no load. This maynot happen as theory dictates, but certainly as we get more ejbstatistic data wired into the server functionality we can pursuedeliberate heartbeat throttling techniques that might make thattheory really sing in practice.
GERONIMO
On the G side of things, the multicast functionality has been copiedinto Geronimo. Still need to get it updated to the latest changes.We'll eventually want OpenEJB getting notifications from theGeronimo version instead of using it's own. Once that is done wecan remove the dep on the openejb-multicast jar. For the moment Ijust tucked the multicast server implementation into theEjbDaemonGBean as a temporary solution. A tricky thing is that whenwe get that setup as it's own server component we won't want theport offset and the hostname to affect the multicast host and port.The combination of the mutlicast host and port essentially creates a"topic" that all members of the network can listen to and writemessages to. So any servers that are in the same cluster will needto listen on the same host/port.
We could really use a GUI for this stuff too. Is there anyone outthere with a few spare cycles who wants to write up a trivial little"show me the servers on the cluster" kind of portlet for the console?
-David

Re: Client/Server Multicast Discovery and Failover

Reply via email to