On 2010-02-24T20:53:40, Sander van Vugt <m...@sandervanvugt.nl> wrote:
> Hi, > > STONITH seems to be driving me to feel like SMITH lately, so after > unsuccessful attempts to get drac5 and rackpdu to do their work, I'm now > focusing on the external/sbd plugin. It doesn't work too well though, so > if anyone can give me a hint, I would appreciate. You are missing the configuration in /etc/sysconfig/sbd, it seems. Here're some notes that will be merged into the SP1 manual - they're my rough notes, not yet refined by the documentation group: > Which looks like they are trying to do a STONITH shootout? So I cleared > that information, using sbd -d /dev/dm-0 <nodenames> clear which looks > useful, but didn't fix the issue. (Neither did a bold sbd -d /dev/dm-0 > create). Don't use dm-0 links, they are not stable. = Storage protection = The SLE HA cluster stack's highest priority is protecting the integrity of data. This is achieved by preventing uncoordinated concurrent access to data storage - such as mounting an ext3 file system more than once in the cluster, but also preventing OCFS2 from being mounted if coordination with other cluster nodes is not available. In a well-functioning cluster, Pacemaker will detect if resources are active beyond their concurrency limits and initiate recovery; further, its policy engine will never exceed these limitations. However, network partitioning or software malfunction could potentially cause scenarios where several coordinators are elected. If this so-called split brain scenario were allowed to unfold, data corruption might occur. Hence, several layers of protection have been added to the cluster stack to mitigate this. IO fencing/STONITH is the primary component contributing to this goal, since they ensure that, prior to storage activation, all other access is terminated; cLVM2 exclusive activation or OCFS2 file locking support are other mechanisms, protecting against administrative or application faults. Combined appropriately for your setup, these can reliably prevent split-brain scenarios from causing harm. This chapter describes an IO fencing mechanism that leverages the storage itself, following by a description of an additional layer of protection to ensure exclusive storage access. These two mechanisms can even be combined for higher levels of protection. == Storage-based fencing == This section describes how scenarios where shared storage is used can leverage said shared storage for very reliable I/O fencing and avoidance of split-brain scenarios. This mechanism has been used successfully with the Novell Cluster Suite and is also available in a similar fashion for the SLE HA 11 product using the "external/sbd" STONITH agent. === Description === In an environment where all nodes have access to shared storage, a small (1MB) partition is formated for use with sbd. The daemon, once configured, is brought online on each node before the rest of the cluster stack is started, and terminated only after all other cluster components have been shut down - ensuring that cluster resources are never activated without sbd supervision. The daemon automatically allocates one of the message slots on the partition to itself, and constantly monitors it for messages to itself. Upon receipt of a message, the daemon immediately complies with the request, such as initiating a power-off or reboot cycle for fencing. The daemon also constantly monitors connectivity to the storage device, and commits suicide in case the partition becomes unreachable, guaranteeing that it is not disconnected from fencing message. (If the cluster data resides on the same logical unit in a different partition, this is not an additional point of failure; the work-load would terminate anyway if the storage connectivity was lost.) Increased protection is offered through "watchdog" support. Modern systems support a "hardware watchdog" that has to be updated by the software client, or else the hardware will enforce a system restart. This protects against failures of the sbd process itself, such as dieing, or becoming stuck on an IO error. === Setup guide === ==== Requirements ==== The environment must have shared storage reachable by all nodes. It is recommended to create a 1MB partition at the start of the device; in the rest of this text, this is referred to as "/dev/SBD", please substitute your actual pathname (ie, "/dev/sdc1") for this below. This shared storage segment must not make use of host-based RAID, cLVM2, nor DRBD. However, using storage-based RAID and multipathing is recommended for increased reliability. ==== SBD partition ==== All these steps must be performed as root. After having made very sure that this is indeed the device you want to use, and does not hold any data you need - as the sbd command will overwrite it without further requests for confirmation -, initialize the sbd device: # sbd -d /dev/SBD create This will write a header to the device, and create slots for up to 255 nodes sharing this device with default timings. If your sbd device resides on a multipath group, you may need to adjust the timeouts sbd uses, as MPIO's path down detection can cause some latency: after the msgwait timeout, the message is assumed to have been delivered to the node. For multipath, this should be the time required for MPIO to detect a path failure and switch to the next path. You may have to test this in your environment. The node will perform suicide if it has not updated the watchdog timer fast enough; the watchdog timeout must be shorter than the msgwait timeout - half the value is a good estimate. This can be specified when the SBD device is initialized: # /usr/sbin/sbd -d /dev/SBD -4 $msgwait -1 $watchdogtimeout create (All timeouts are in seconds.) You can look at what was written to the device using: # sbd -d /dev/SBD dump Header version : 2 Number of slots : 255 Sector size : 512 Timeout (watchdog) : 5 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 10 As you can see, the timeouts are also stored in the header, to ensure that all participating nodes agree on them. ==== Setup the software watchdog ==== Additionally, it is highly recommended that you set up your Linux system to use a watchdog. Please refer to the SLES manual for this step(?). This involves loading the proper watchdog driver on system boot. On HP hardware, this is the "hpwdt" module. For systems with a Intel TCO, "iTCO_wdt" can be used. "softdog" is the most generic driver, but it is recommended that you use one with actual hardware integration. (See "drivers/watchdog" in the kernel package for a list of choices.) ==== Starting the sbd daemon ==== The sbd daemon is a critical piece of the cluster stack. It must always be running when the cluster stack is up, or even when the rest of it has crashed, so that it can be fenced. The openais init script starts and stops SBD if configured; add the following to /etc/sysconfig/sbd: === SBD_DEVICE="/dev/SBD" # The next line enables the watchdog support: SBD_OPTS="-W" === If the SBD device is not accessible, the daemon will fail to start and inhibit openais startup. Note: If the SBD device becomes inaccessible from a node, this could cause the node to enter an infinite reboot cycle. That is technically correct, but depending on your administrative policies, might be a considered a nuisance. You may wish to not automatically start up openais on boot in such cases. Before proceeding, ensure that SBD has indeed started on all nodes through "rcopenais restart". === Testing SBD === The command # sbd -d /dev/SBD list Will dump the node slots, and their current messages, from the sbd device. You should see all cluster nodes that have ever been started with sbd being listed there; most likely with the message slot showing "clear". You can now try sending a test message to one of the nodes: # sbd -d /dev/SBD message nodea test The node will acknowledge the receipt of the message in the system logs: Aug 29 14:10:00 nodea sbd: [13412]: info: Received command test from nodeb This confirms that SBD is indeed up and running on the node, and that it is ready to receive messages. ==== Configuring the fencing resource ==== To complete the sbd setup, it is necessary to activate sbd as a STONITH/fencing mechanism in the CIB as follows: # crm configure property stonith-enabled="true" property stonith-timeout="30s" primitive stonith:external/sbd params sbd_device="/dev/SBD" commit quit Note that since node slots are allocated automatically, no manual hostlist needs to be defined. The SBD mechanism is used instead of other fencing/stonith mechanisms; please disable any others you might have configured before. Once the resource has started, your cluster is now successfully configured for shared-storage fencing, and will utilize this method in case a node needs to be fenced. [snip] Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker