Hi folks,

RHEL 5.2
cman-2.0.84-2.el5
gfs-utils-0.1.17-1.el5
rgmanager-2.0.38-2.el5
openais-0.80.3-15.el5
kmod-gfs-PAE-0.1.23-5.el5
kmod-gfs2-PAE-1.92-1.1.el5
gfs2-utils-0.1.44-1.el5_2.1

I came into work this morning and our 4 node cluster was down because access to 
the GFS filesystem had been lost by all nodes due to an iSCSI error.
Even though the iSCSI error corrected itself in the middle of the night, the 
cluster did not regain quorum.

It took me 2 hours to fix the problem. Rebooting any node would would fail to 
start fencing during boot.

I eventually got it working by powering off all nodes, rebooting one at a time, 
but fencing did not start working until the fourth node was booted but
even then the GFS filesystem was not mounted.

Here's what I did.
Power off node 4.
Power off node 3.
Power off node 3.
Reboot node 1.

Node 1 can join the fence domain.
Power on node 2. Node 2 can't join the fence domain.
Power on node 3. Node 3 can't join the fence domain.
Power on node 4. Node 4 joins the fence domain.

I then had to 'service gfs start' on nodes 1 2 & 3 and the cluster was back up 
and running.

What is the correct way to get GFS filesystems running again after access to 
the GFS device has been temporarily lost and the cluster is blocking all
activity ?

Thanks,
Nick .




--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to