FailoverPage edited by David BlevinsChanges (2)
Full ContentFailoverOn each request to the server, the client will send the version number associated with the list of servers in the cluster it is aware of. Initially this version will be zero and the list will be empty. Only when the server sees the client has an old list will the server send the updated list. This is an important distinction as the list is not transmitted back and forth on every request, only on change. If the membership of the cluster is stable there is essentially no clustering overhead to the protocol – 8 byte overhead to each request and 1 byte on each response – so you will not see an exponential slowdown in response times the more members are added to the cluster. This new list takes affect for all proxies that share the same connection. When a server shuts down, more connections are refused, existing connections not in mid-request are closed, any remaining connections are closed immediately after completion of the request in progress and clients can failover gracefully to the next server in the list. If a server crashes requests are retried on the next server in the list (or depending on the ConnectionStrategy). This failover pattern is followed until there are no more servers in the list at which point the client attempts a final multicast search (if it was created with a multicast PROVIDER_URL) before abandoning the request and throwing an exception to the caller. By default, the failover is ordered but random selection is supported. The multicast discovery aspect of the client adds a nice randomness to the selection of the first server. DiscoveryEach discoverable service has a URI which is broadcast as a heartbeat to other servers in the cluster. This URI advertises the service's type, its cluster group, and its location in the format of 'group:type:location'. Say for example "cluster1:ejb:ejbd://thehost:4201". The URI is sent out repeatedly in a pulse and its presence on the network indicates its availability and its absence indicates the service is no longer available. The sending of this pulse (the heartbeat) can be done via UDP or TCP: multicast and "multipoint" respectively. More on that in the following section. The rate at which the heartbeat is pulsed to the network can be specified via the 'heart_rate' property. The default is 500 milliseconds. This rate is also used when listening for services on the network. If a service goes missing for the duration of 'heart_rate' multiplied by 'max_missed_heartbeats', then the service is considered dead. The 'group' property, cluster1 in the example, is used to dissect the servers on the network into smaller logical clusters. A given server will broadcast all it's services with the group prefixed in the URI, as well it will ignore any services it sees broadcast if they do not share the same group name. Multicast (UDP)Multicast is the preferred way to broadcast the heartbeat on the network. The simple technique of broadcasting a non-changing service URI on the network has specific advantages to multicast. The URI itself is essentially stateless and there is no "i'm alive" URI or an "i'm dead" URI. In this way the issues with UDP being unordered and unreliable melt away as state is no longer a concern and packet sizes are always small. Complicated libraries that ride atop UDP and attempt to offer reliability (retransmission) and ordering on UDP can be avoided. As well the advantages UDP has over TCP are retained as there are no java layers attempting to force UDP communication to be more TCP-like. The simple design means UDP/Multicast is only used for discovery and from there on out critical information is transmitted over TCP/IP which is obviously going to do a better job at ensuring reliability and ordering. Multicast ClientThe multicast functionality is not just for servers to find each other in a cluster, it can also be used for EJB clients to discover a server. A special "multicast://" URL can be used in the InitialContext properties to signify that multicast should be used to seed the connection process. Such as: Properties properties = new Properties(); properties.setProperty(Context.INITIAL_CONTEXT_FACTORY, "org.apache.openejb.client.RemoteInitialContextFactory"); properties.setProperty(Context.PROVIDER_URL, "multicast://239.255.2.3:6142"); InitialContext remoteContext = new InitialContext(properties);
Server ConfigurationIn the server this list can be specified via the conf/multipoint.properties file like so:
server = org.apache.openejb.server.discovery.MultipointDiscoveryAgent
bind = 127.0.0.1
port = 4212
disabled = false
initialServers = 192.168.1.20:4212, 192.168.1.30:4212, 192.168.1.40:4212
The above configuration shows the server has an port 4212 open for connections by other servers for multipoint communication. The initialServers list should be a comma separated list of other similar servers on the network. Only one of the servers listed is required to be running when this server starts up – it is not required to list all servers in the network. Client ConfigurationConfiguration in the client is similar, but note that EJB clients do not participate directly in multipoint communication and do not connect to the multipoint port. The server list is simply a list of the regular "ejbd://" urls that a client normally uses to connect to a server. Properties properties = new Properties(); properties.setProperty(Context.INITIAL_CONTEXT_FACTORY, "org.apache.openejb.client.RemoteInitialContextFactory"); properties.setProperty(Context.PROVIDER_URL, "failover:ejbd://192.168.1.20:4201,ejbd://192.168.1.30:4201,ejbd://192.168.1.40:4201"); InitialContext remoteContext = new InitialContext(properties); ConsiderationsNetwork sizeThe general disadvantage of this topology is the number of connections required. The number of connections for the network of servers is equal to "(n * n - n) / 2 ", where n is the number of servers. For example, with 5 servers you need 10 connections, with 10 servers you need 45 connections, and with 50 servers you need 1225 connections. This is of course the number of connections across the entire network, each individual server only needs "n - 1" connections. The handling of these sockets is all asynchronous Java NIO code which allows the server to handle many connections (all of them) with one thread. From a pure threading perspective, the option is extremely efficient with just one thread to listen and broadcast to many peers. Double connectIt is possible in this process that two servers learn of each other at the same time and each attempts to connect to the other simultaneously, resulting in two connections between the same two servers. When this happens both servers will detect the extra connection and one of the connections will be dropped and one will be kept. In practice this race condition rarely happens and can be avoided almost entirely by fanning out server startup by as little as 100 milliseconds.
Change Notification Preferences
View Online
|
View Changes
|
Add Comment
|
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
- [CONF] OpenEJB 3.0.x documentation > Failover confluence
