On 04/20/2017 01:43 AM, Ulrich Windl wrote: > Should have gone to the list... > >>>>> Digimer <li...@alteeve.ca> schrieb am 19.04.2017 um 17:20 in Nachricht >> <600637f1-fef8-0a3d-821c-7aecfa398...@alteeve.ca>: >>> On 19/04/17 02:38 AM, Ulrich Windl wrote: >>>>>>> Digimer <li...@alteeve.ca> schrieb am 18.04.2017 um 19:08 in > Nachricht >>>> <26e49390-b384-b46e-4965-eba5bfe59...@alteeve.ca>: >>>>> On 18/04/17 11:07 AM, Lentes, Bernd wrote: >>>>>> Hi, >>>>>> >>>>>> i'm currently establishing a two node cluster. Each node is a HP > server >>>> with >>>>> an ILO card. >>>>>> I can fence both of them, it's working fine. >>>>>> But what is if the ILO does not work correctly ? Then fencing is not >>>>> possible. >>>>> >>>>> Correct. If you only have iLO fencing, then the cluster would hang >>>>> (failed fencing is *not* an indication of node death). >>>>> >>>>>> I also have a switched PDU from APC. Each server has two power > supplies. >>>>> Currently one is connected to the normal power equipment, the other to > the >>>>> UPS. >>>>>> As a sort of redundancy, if the UPS does not work properly. >>>>> >>>>> That's a fine setup. >>>>> >>>>>> When i'd like to use the switched PDU as a fencing device i will loose > the >>>> >>>>> redundancy of two independent power sources, because then i have to > connect >>>> >>>>> both power supplies together to the UPS. >>>>>> I wouldn't like to do that. >>>>> >>>>> Not if you have two switched PDUs. This is what we do in our Anvil! >>>>> systems... One PDU feeds the first PSU in each node and the second PDU >>>>> feeds the second PSUs. Ideally both PDUs are fed by UPSes, but that's >>>>> not as important. One PDU on a UPS and one PDU directly from mains will >>>>> work. >>>>> >>>>>> How important would you consider to have two independent fencing device > for >>>> >>>>> each node ? I'd can't by another PDU, currently we are very poor. >>>>> >>>>> Depends entirely on your tolerance for interruption. *I* answer that >>>>> with "extremely important". However, most clusters out there have only >>>>> IPMI-based fencing, so they would obviously say "not so important". >>>>> >>>>>> Is there another way to create a second fencing device, independent > from >>>> the >>>>> ILO card ? >>>>>> >>>>>> Thanks. >>>>> >>>>> Sure, SBD would work. I've never seen IPMI not have a watchdog timer >>>>> (and iLO is IPMI++), as one example. It's slow, and needs shared >>>>> storage, but a small box somewhere running a small tgtd or iscsid > should >>>>> do the trick (note that I have never used SBD myself...). >>>> >>>> Slow is relative: If it takes 3 seconds from issuing the reset command > until >>>> the node is dead, it's fast enough for most cases. Even a switched PDU > has >>> some >>>> delays: The command has to be processed, the relay may "stick" a short >>> moment, >>>> the power supply's capacitors have to discharge (if you have two power >>> supplys, >>>> both need to)... And iLOs don't really like to be powered off. >>>> >>>> Ulrich >>> >>> The way I understand SBD, and correct me if I am wrong, recovery won't >>> begin until sometime after the watchdog timer kicks. If the watchdog >>> timer is 60 seconds, then your cluster will hang for >60 seconds (plus >>> fence delays, etc). >> >> I think it works differently: One task periodically reads ist mailbox slot >> for commands, and once a comment was read, it's executed immediately. Only > if >> the read task does hang for a long time, the watchdog itself triggers a > reset >> (as SBD seems dead). So the delay is actually made from the sum of "write >> delay", "read delay", "command excution".
I think you're right when sbd uses shared-storage, but there is a watchdog-only configuration that I believe digimer was referring to. With watchdog-only, the cluster will wait for the value of the stonith-watchdog-timeout property before considering the fencing successful. >> The manual page (LSES 11 SP4) states: "Set watchdog timeout to N seconds. >> This depends mostly on your storage latency; the majority of devices must be > >> successfully read within this time, or else the node will self-fence." and >> "If a watchdog is used together with the "sbd" as is strongly recommended, >> the watchdog is activated at initial start of the sbd daemon. The watchdog > is >> refreshed every time the majority of SBD devices has been successfully read. > >> Using a watchdog provides additional protection against "sbd" crashing." >> >> Final remark: I thing the developers of sbd were under drugs (or never saw a > >> UNIX program before) when designing the options. For example: "-W Enable or > >> disable use of the system watchdog to protect against the sbd processes >> failing and the node being left in an undefined state. Specify this once to > >> enable, twice to disable." (MHO) >> >> Regards, >> Ulrich >> >>> >>> IPMI and PDUs can confirm fence the peer if ~5 seconds (plus fence > delays). >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.com/w/ >>> "I am, somehow, less interested in the weight and convolutions of >>> Einstein’s brain than in the near certainty that people of equal talent >>> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org