Hello List, Any suggestions to solve the following would be most appreciated.
Setup: Active/Passive Two Node Cluster. Two UPSes (APC Smart-UPS 1500 C) with USB communication cables cross connected (ie UPS-webserver1 monitored by webserver2, and vice versa) to allow for stonith/fencing OS OpenSuse Leap 42.2 NUT version 2.7.1-2.41-x86_64 Fencing agent: external/nut Problem: When power fails to a single UPS, both nodes are shutdown. The node with the still powered UPS comes back up, but requires manual intervention to keep it providing services. I would like only the node with the "On Battery" UPS to shutdown. The resupply of services problem seems to be that NUT on the node that comes back up will not restart until the other node restarts. Stonith and my upssched-cmd script both use upscmd -u ups-webserver2-master -p mypassword ups-webserver2@webserver1 shutdown.reboot or upscmd -u ups-webserver1-master -p mypassword ups-webserver1@webserver2 shutdown.reboot as appropriate. When the cluster software (Pacemaker/Corosync) use the one of above command as part of a fencing operation, only the target node is shutdown, and its UPS's outlets power-cycled. When NUT via my upssched-cmd script issues one of the above commands both nodes shutdown and both of their UPS's outlets power-cycle. This problem should be very rare, but it would be better to cover it rather than not. Power failure and resupply to both UPSes (the most common problem for me) works well. I use upssched to set the same timers after power failure on each system. The receive simultaneous shutdown commands, which they obey. When power returns they both come back up. Stonith/Fencing via the stonith resource agent external/nut resource agent works. Thanks, Tim. My config files ups.conf On webserver1 [ups-webserver2] driver = usbhid-ups port = auto desc = "APC Smart-UPS C 1000/1500va" vendorid = 051d On webserver2 [ups-webserver1] driver = usbhid-ups port = auto desc = "APC Smart-UPS C 1000/1500va" vendorid = 051d nut.conf MODE=netserver upsd.conf Webserver1 LISTEN 127.0.0.1 3493 LISTEN ::1 3493 LISTEN 192.168.1.21 3493 Webserver2 LISTEN 127.0.0.1 3493 LISTEN ::1 3493 LISTEN 192.168.1.22 3493 upsd.users defines users (special settings required for stonith to work) On webserver1 [ups-webserver2-slave] password = mypassword actions = SET instcmds = ALL upsmon slave [ups-webserver2-master] password = mypassword actions = SET actions = FSD instcmds = ALL upsmon master On webserver2 [ups-webserver1-slave] password = mypassword actions = SET instcmds = ALL upsmon slave [ups-webserver1-master] password = mypassword actions = SET actions = FSD instcmds = ALL upsmon master upsmon.conf Webserver1 MONITOR ups-webserver1@webserver2 1 ups-webserver1-master mypassword master MONITOR ups-webserver2@localhost 0 ups-webserver2-slave mypassword slave Webserver2 MONITOR ups-webserver2@webserver1 1 ups-webserver2-master mypassword master MONITOR ups-webserver1@localhost 0 ups-webserver1-slave mypassword slave It needs the following upsmon.conf NOTIFYCMD /usr/sbin/upssched NOTIFYFLAG ONLINE SYSLOG+WALL+ NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC Configure 'upssched' by editing upssched.conf upssched.conf webserver1 CMDSCRIPT /bin/upssched-cmd PIPEFN /var/lib/ups/upssched/upssched.pipe LOCKFN /var/lib/ups/upssched/upssched.lock AT ONBATT ups-webserver2@localhost START-TIMER onbatt-ups-webserver2 600 AT ONLINE ups-webserver2@localhost CANCEL-TIMER onbatt-ups-webserver2 webserver2 CMDSCRIPT /bin/upssched-cmd . PIPEFN /var/lib/ups/upssched/upssched.pipe LOCKFN /var/lib/ups/upssched/upssched.lock AT ONBATT ups-webserver1@localhost START-TIMER onbatt-ups-webserver1 600 AT ONLINE ups-webserver1@localhost CANCEL-TIMER onbatt-ups-webserver1 Edit /bin/upssched-cmd /bin/upssched-cmd webserver1 case $1 in onbatt-ups-webserver1) logger -t upssched-cmd "UPS-Webserver1 has gone on battery." ;; onbatt-ups-webserver2) logger -t upssched-cmd "UPS-Webserver2 has gone on battery." /usr/bin/upscmd -u ups-webserver2-master -p mypassword ups-webserver2@webserver1 shutdown.reboot ;; *) logger -t upssched-cmd "Unrecognized command: $1" ;; esac Webserver2 case $1 in onbatt-ups-webserver1) logger -t upssched-cmd "UPS-Webserver1 has been gone on battery." /usr/bin/upscmd -u ups-webserver1-master -p mypassword ups-webserver1@webserver2 shutdown.reboot ;; onbatt-ups-webserver2) logger -t upssched-cmd "UPS-Webserver2 has gone on battery." ;; *) logger -t upssched-cmd "Unrecognized command: $1" ;; esac
_______________________________________________ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser