On 1-2-2012 16:56, Martin Gerhard Loschwitz wrote:
>> Martin,
>>      
>    
>> I'm actually running a dual-active Samba server with a shared GFS2
>> file-system (block-replicated by DR:BD). I'm also (ab)using it for an
>> Apache/Tomcat installation with session replication in Tomcat through a
>> shared file-system.
>>      
>    
>> Of course, all this is balanced using RR-DNS, and when one node fails,
>> the cluster resource (IP address) is taken over by the surviving node to
>> re-establish service (at somewhat lower performance).
>>      
>    
>> Did a kitten just die?
>>      
>    
>> Robert Campbell
>>      
> Robert,
>
> what sort of STONITH do you use for this setup? How did the system react
> the last time where the interconnect between the two nodes was broken
> but they were still up and running? And when did you test the fail-over
> capabilities of your cluster for the last time?
>
> What happens if one of your nodes fails and GFS thinks that it will have
> to fence them until that fencing process is actually done?
>
> I'll get a shovel while waiting for the answer. ;-)
>
> Best regards
> Martin
>
>    
Martin,

STONITH is power-off through IPMI on HP iLO, so that should be sorted. 
The system is still in the testing-fase, but I'll test the loss of an 
iLO connection, and then a failure of a node (will IMPI fencing still 
claim a success of fence?).

The last time (still during initial buildup test) the suviving node 
STONITHed the node that lost network connection (it's difficult to test 
a different failure scenario than a network failure. How do you make a 
kernel panic on purpose?). RHEL-clustering announced a succesful fence 
and everything continued working happily on the surviving node. The 
connection used for the STONITH is redundantly made to a pair of 
switches. Unfortunately the connection to the iLO is not redundant, but 
this would be a failure on a failure (so not really a concern for us).

During the build-up we also noticed that if there is no successful fence 
(fencing was not configured), the surviving node will get a file-system 
time-out. Rather unfortunate. We're thinking of implementing a manual 
fence mechanism in addition to the automatic, so that the I/O can 
continue as soon as an administrator has dialed in to the cluster after 
having received the e-mail of the failure. But again, this would require 
a failure on top of a failure.

I feel confident the shovel can be put back into the cupboard.

Robert



_______________________________________________________________________________________________
Help save paper! Do you really need to print this email?

Aan de inhoud van dit bericht kunnen alleen rechten ten opzichte van Morpho B.V.
worden ontleend, indien zij door rechtsgeldig ondertekende stukken worden 
ondersteund. 
De informatie in dit e-mailbericht is van vertrouwelijke aard en alleen bedoeld 
voor gebruik 
door geadresseerde. Als u een bericht onbedoeld heeft ontvangen, wordt u 
verzocht de
verzender hiervan in kennis te stellen en het bericht te vernietigen zonder te 
vermenigvuldigen
of andersoortig te gebruiken.

The contents of this electronic mail message are only binding upon Morpho B.V.
if the contents of the message are accompanied by a lawfully recognized type of
signature.  The contents of this electronic mail message are privileged and 
confidential and are
intended only for use by the addressee.  If you have received this electronic 
mail message by error,
please notify the sender and delete the message without reproducing it and 
using it in any way.

#
" Ce courriel et les documents qui lui sont joints peuvent contenir des 
informations confidentielles ou ayant un caractère privé. S'ils ne vous sont 
pas destinés, nous vous signalons qu'il est strictement interdit de les 
divulguer, de les reproduire ou d'en utiliser de quelque manière que ce soit le 
contenu. Si ce message vous a été transmis par erreur, merci d'en informer 
l'expéditeur et de supprimer immédiatement de votre système informatique ce 
courriel ainsi que tous les documents qui y sont attachés."
******
" This e-mail and any attached documents may contain confidential or 
proprietary information. If you are not the intended recipient, you are 
notified that any dissemination, copying of this e-mail and any attachments 
thereto or use of their contents by any means whatsoever is strictly 
prohibited. If you have received this e-mail in error, please advise the sender 
immediately and delete this e-mail and all attached documents from your 
computer system."
#

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to