David Blevins wrote:
On May 3, 2006, at 8:51 AM, Jules Gosnell wrote:
I'd like to kick off a thread about the monitoring of clustered
deployments...
There is a section in the 1,000ft Clustering Overview (http://
opensource.atlassian.com/confluence/oss/display/GERONIMO/
Clustering), but little content, due to the fact that there has been
little discussion about this subject on the list...
Obviously we can use standard tools etc to monitor individual nodes
in a cluster - but, if you want to aggregate all the stats together,
for a clusterwide view of what is going on in your deployment, life
becomes a little more complicated....
I have a few ideas, but I haven't done much reading around this
area, so it may be that there are already specs and standard ways of
achieving all of this. if there are and you know of them, please
shout. If not, lets throw it open to discussion....
thanks for your time,
Nice looking doc, Jules!
I asked this question last time and you said you wanted to compile
the info into 1000ft document, so I'm guessing this is my queue :)
On Mar 11, 2006, at 12:23 PM, David Blevins wrote:
I like the concept that clients can be made smarter or store
information that will make the cluster that much more efficient.
But I'm not sure what you'd need me to do for clients that are out
of our control and potentially written in an entirely different
language, i.e. CORBA and Web Services.
Can you describe what considerations I'd have to add on my side of
the fence to make that work?
The considerations are :
(a) storing and accepting updates to a record of cluster membership and
endpoints
(b) deciding by pluggable strategy, to which of these nodes to submit
each request
(c) transparently failing over to the next node, if the node selected
for a request is down
(a) the record would be initially populated via e.g. either a hardcoded
list initialised from the jndi.properties file, or by autodiscovery of
the clusterdirectly. It would be kept up to date by piggy-backing deltas
on the return leg of invocations.
(b) this strategy would be responsible for implementing load-balancing
policies - i.e. round-robin or random for SLSBs and sticky for SFSBs and
probably Entities....
(c) this allows a client to continue its conversation through the stub,
without needing to know that the server that it was talking to has died
and been replaced by another...
Re: RMI/IIOP - These considerations are dealt with in an intelligent
client-side java stub. I talked to Kresten of Trifork some time ago
about this - it seems that they have a CORBA client-side impl for
clustering that maps quite closely to the sort of thing that I envisage
OpenEJB using for its own protocol... Ideally we would want to share the
same code between both transport impls - I don't know enough about how
they are plugged in to the stub to decide whether this is practical... -
we really need an OpenEJB client-side clustering volunteer here. Gianny
might be interested, or someone else ?
Re: WS - I am working on an Axis2/WADI integration with Rajith at the
moment. I believe that WS invocations on OpenEJBs come via Axis2? - if
we are talking SOAP/HTTP then, I guess responsibility for the
intelligence that would be bundled into the java client stub in the
other two cases would fall upon the HTTP load-balancer... - i.e. this
would be responsible for maintaining session affinity for stateful
requests (there are apparently a couple of competing session marking
specs for WS) and managing fail-over between nodes. Other WS transports
would require other solutions - have you any in particular in mind ?
I may have picked up the wrong end of the stick here, or ommitted stuff
that you were angling for - if so, push straight back at me :-)
Jules
Thoughts?
-David
--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."
/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
* www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/