i haven't had to actually do this yet, but if i understand the systemd socket-activation concept correctly, that may be a useful building block for putting something like this together (service gets started when another service tries to access it over the network).
thankfully we don't have any super order-dependent services like that (they just enter a retry loop until their dependencies are available), but i'm tempted to try it out now :) On Thu, Jul 9, 2015 at 11:57 PM Martin A. Brown <mar...@linux-ip.net> wrote: > > Hello there, > > > I have an app that is distributed across a dozen servers. > > > > There are several processes involved, some with dependencies on > > processes running on other servers. > > > What app would you recommend for starting the whole thing up in an > > orderly manner? > > Is it possible to adjust the pieces of software so that there is no > required 'orderly' startup? > > I ask because--if the application requires synchronized startup of > services across multiple machines, then what happens when one of the > services (or nodes) early in that dependency chain fails during > operation? > > For example, let's imagine services A through I, each of which must > be launched before the subsequent can launch: > > A -> B -> C -> D -> E -> F -> G -> H -> I > > Assuming normal, orderly, coordinated startup, great. Now, > everything is running. > > Suppose that service C fails. > What happens? > Will the application still run? > Do D through I need to be restarted (or just D)? > > If it is possible to adjust the individual services so that each of > them can run and retry, fail gracefully, or even fail hard (as fast > as possible, please) to contend with dependency issues, I would > recommend that. > > Perhaps you have already addressed that question or are in the > (unenviable) position of contending with feature-complete software > that is ready for deployment. > > Since you are in the 10+ node realm, I think I'd also agree with > using some sort of configuration management (somebody suggested > Ansible). With this many nodes, it's an operational truism that one > of them will kick the bucket during your dog's midnight birthday > party [0] and you'll want to be able to move the service quickly to > another node. > > Hurrah for the well-worn configuration management tools. > > This is the modern take on startup script dependencies, just now > with more network in-between! Everybody needs more network > in-between! Not an easy problem. > > Anyway, good luck with this conundrum! > > -Martin > > [0] Silicon devices sense these moments and cherish destroying our > equanimity. > > -- > Martin A. Brown > http://linux-ip.net/ > _______________________________________________ > PLUG mailing list > PLUG@lists.pdxlinux.org > http://lists.pdxlinux.org/mailman/listinfo/plug > _______________________________________________ PLUG mailing list PLUG@lists.pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug