> For me, this results in the practical difference that the pub-sub model > means that the agent has the ability to subscribe to the messages and is > therefore alive - and that therefore the list of live hosts is always > current.
I don't understand why that would be the case. any pub-sub model must internally have some sort of membership list which it believes to be current/live, but it cannot possibly know until it receives some sort of response from the host. even then, it's unknown whether that response really means enough liveness to execute the devops command you're pushing. in other words, to probe for whether a node can execute an op, you really have to try to execute the op. which means you WILL STILL have to deal with semi-byzantine failures through timeouts, etc. which is why I use ssh to mass-admin. let's be honest, handling timeouts is not magic. ssh is also nicely decoupled, and has excellent ways to robustly express asymmetric trust, etc. also, to me, integration is the devil's playground. it's easy to pitch that integration will make life easier, but except in fairly specific conditions, it also leads to tighter coupling, fragility, inflexibility. in a sense, the issue here is a failure to tool-build. for instance, if it were really a big deal, we could have a standard infrastructure for collecting node status information. "standard" in the sense of IETF RFC. all sources of node info on your system could feed into it, and you might chose, eg, a Bayesian mechanism to make predictions about whether a particular node will successfully perform a particular op. (for instance, interconnect fabrics often have a realtime measure of whether a node is up, for their definition of up. similarly, service nodes (say, NTP) can often provide a last-seen timestamp. nodes might also run endogenous beacons (say, ganglia, etc). it's a bit curious that this hasn't (AFAIK) been done before in much generality. anyone? regards, mark hahn. _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
