After learning more about what fencing means when you see it in action. (the default of emergency_restart(); ). I'm now researching how to determine what causes a fencing to occur.
This is sles10.2 on the 2.6.16-42.5 kernel which means 1.4.1-sles is the version of ocfs2. I know the default reply from Sunil will be to ask Novell.... :-). But we actually have a support partnership with HP and since they're not Novell, we have to wait for their backline contacts to make connection. Which is why I'm asking the users community simultaneous to the support call. The call has been open for 8 hrs now with no call back yet. We have a set of 6 servers in a cluster and they're only in a cluster for the sake of ocfs2 for a shared volume. Today within a one minute time span, node 1 says he lost connectivity to node 2 and 3, followed about a minute later by saying he lost connectivity to node 0 and 5. 1 and 4 stayed up. But 2, 3, 0, and 5 all were evicted and rebooted. This happened on the prod cluster and simultaneously on our nonprod cluster simultaneously. The only difference between nonprod and prod is that nonprod has 7 nodes rather than 6... On the nonprod cluster, 4 out of 7 servers rebooted due to node eviction. This set of servers are setup across two blade chassis and the nic config is a private vlan, non routed. It's eth1 using a 192.168.x.y scheme. The blade servers were running a load average of about 1.1 or so but are 8ways (dual quad core) which isn't exactly taxing the boxes. The LAN environment is 10gbit fiber from the connect modules on the chassis to the switches and are gig uplink on the blades themselves. Ifconfig shows no evidence of packet loss. Questions: Can we set up redundant heartbeat ip connections? Can we also add a disk heartbeat? If it truly is network connectivity, can we set the timeout to be more lenient? And can we change the fencing to something other than machine reset? Eg unmount the volume, change it to read only, etc? Thanks... Angelo _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users