All - we have a few clusters in our environment and for some of them there is a need to manage cron jobs running on the active node of the cluster.
Are there any standard methods for managing these? The clusters I'm looking at are Oracle/Sun Cluster based, but VCS-based ones would presumably have similar problems, I'd imagine. In the case of some of our systems, the crontabs themselves are huge and contain many entries. They are copied from the shared application filesystem directly into /var/spool/cron/crontabs/appuser on the startup of a dummy cron resource and removed when it is shut down. For others we have fewer crontab entries and I was thinking of having some simple shell script shim to execute script entries from the application user's system crontab on all cluster nodes iff they are the application filesystem is mounted, but otherwise leave them alone[1]. Ideally I'm thinking it would be far better to run an independent instance of the cron process as the application user solely to allow it and all process it spawns to be managed as a fully-fledged cluster resource - if the cluster needed to evacuate in a hurry the cluster management could kill the entire process group by nuking the user-mode cron process ... I've never heard of cron being used in this way though - does anything like this exist? I'm thinking here of situations where the system cron has started processes on behalf of a cluster-managed app. When the cluster decides it needs to fail a node and evacuate to the other one, what happens with in-flight cron-initiated processes? What if something that cron spawned holds open the app filesystem and prevents it from being unmounted and exported cleanly? (for example, I'm sure there are other horrible ones) Granted, most cron tasks are short-lived, however it is a common usage for cron to be used to manage long-running services[2] - cron will start a process whose first task is to determine whether there is another copy already running. If there is, it will exit immediately, otherwise it will fork and remain running, but even short-lived tasks might throw a wrench into the works if they happen to run at the wrong time. If this task were not managed by the cluster resource scripts or a child of something that was, how would the cluster know which process to kill beyond nuking all processes owned by the user and hoping for the best? ... what would Nathan do? Regards, Malcolm [1] for example, in $HOME/bin/shim.sh: |#!/bin/sh |if [ ! -x $1 ] ; then exit 0; fi |exec "$@" then the crontab entries would be modified as follows: |# do not modify - the original is in /apps/cluster/etc/crontab |15 3 * * * $HOME/bin/shim.sh /apps/cluster/bin/foo some thing In this case, the crontabs would be kept in sync by having the resource management scripts copy the current crontab from the shared fileystem into /var/spool/cron/crontabs/appuser each time the resource is started[3] [2] yes, this would be a mad way to do process management in a clustered environment, don't do that, make it a proper cluster resource or smf-managed service already. I know. [3] Some of the reading I've done indicates there is something that already exists for Oracle/Sun Cluster to keep arbitrary files in sync - clfilesync. I haven't found it on my systems yet though, but will look into that further, although that doesn't address the process group or in-flight issues as above. -- Malcolm Herbert [email protected]
pgpA2lbuKbWtF.pgp
Description: PGP signature
_______________________________________________ msosug mailing list [email protected] http://mexico.purplecow.org/m/listinfo/msosug
