Am Montag, 20. Januar 2014, 10:54:18 schrieb Stefan Bauer: > Hi Folks, > > we're running a pacemaker/openais cluster for a webserver with databases in > an active/passive setup with 2 nodes on debian 6.0 > > Unfortunately this is a "closed environment" so we are quite limited in > regard to version updates. > > Close to midnight, a backup-script is running - this seems to be the reason > for the timeouts due to high load. > > Even though the load is relaxed a few minutes later, some of the ressources > become failed and hence are not running: > > postgresql#011(lsb:postgresql):#011Started host41.my.network FAILED > > > Here is the log-file: > > http://cubewerk.de/syslog.1 > > Maybe somebody see the obvious and can bring some light into this. > > thank you > > stefan
Hi, I doubt that the backup process disturbs your setup. At least not from the first view: 1) Jan 19 23:23:07 host41 logger: backup.me start by host41.my.network Jan 19 23:24:02 host41 logger: backup.me is running (...) - job ended No errors in between. Everything seems to be fine. But I don't know about your backup process. Perhaps The script just triggers the start and the load goes on. Check it, please. 2) Jan 19 23:32:44 host41 crmd: [2108]: info: crm_timer_popped: Jan 19 23:32:44 host41 crmd: [2108]: info: notify_crmd: Transition 381 status: done - <null> Ages (8 Seconds) after the backup pengines timer popps, checks the cluster status and does not find any problems. 3) Your problems start here: Jan 19 23:40:24 host41 lrmd: [2105]: WARN: cluster_ip:monitor process (PID 21345) timed out Jan 19 23:40:24 host41 lrmd: [2105]: WARN: mysql:monitor process (PID 21347) timed out (...) Jan 19 23:40:30 host41 crmd: [2108]: ERROR: process_lrm_event: LRM operation mysql_monitor_20000 (122) Timed Out (...) 16 minutes (!) after the backup your problems start. From the logs you cannot see why. First the monitorign of the IP adress and then the MySQL DB fails. Perhaps your backup script is still running or you have some other problem. Please check your script: How long does it run? What load does it cause? Does it block the something, so that the monitoring fails? Greetings, Michael Schwartzkopff -- [*] sys4 AG http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 Franziskanerstraße 15, 81669 München Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org