An (unverified) issue occur to us yesterday when moving a service from a RHEL 
4.7 cluster into a new, but existing, RHEL 5.5 cluster:

There was an oversight in regards to one of its GFS Shared Resources; its 
superblock was not re-written with the new cluster name, so it could not mount 
on the new cluster.  Starting the cluster service would fail, however, its 
policy is to remain "disabled", and not fail-over to another domain member.  
But it attempted to startup anyways.

That's where we ran into some unexpected trouble.

We attempted to enable the package again and it went along happily mounting 
private resources, ext3 filesystems, that were now partially mounting on a 
fail-over domain.  Fortunately for us, the Linux kernel detected some 
underlying blocks being modifed and immediately switched those mounts to 
read-only.  We discovered the issue, umount'ed all the filesystems and ran 
e2fsck -- which happily repaired a few mishaps.

Fixing the GFS superblock solved our problem, but we are curious if this is a 
"feature" or a "bug" with the fail-over attempt.  With private resources 
failing, we don't get this recovery effort on another server -- it just fails 
which is the way we want it to do.

This is still our observation, and we will at some point stage this scenario at 
our D.R. site for testing, but I thought I bounce it off this list.

Pertinent log information follows:

Jul  7 18:59:16 columbia clurgmgrd[20213]: <notice> Starting disabled service 
service:TOBY 
Jul  7 19:00:04 columbia clurgmgrd: [20213]: <err> 'mount -t gfs -o noatime 
/dev/mapper/VGCCC-lvoltobywav /toby/wav' failed, error=1 
Jul  7 19:00:13 columbia clurgmgrd[20213]: <notice> start on clusterfs 
"CCC-lvoltobywav" returned 2 (invalid argument(s)) 
Jul  7 19:00:13 columbia clurgmgrd[20213]: <warning> #68: Failed to start 
service:TOBY; return value: 1 
Jul  7 19:00:13 columbia clurgmgrd[20213]: <notice> Stopping service 
service:TOBY 

Jul  7 19:00:46 columbia clurgmgrd[20213]: <notice> Service service:TOBY is 
recovering 
Jul  7 19:00:46 columbia clurgmgrd[20213]: <warning> #71: Relocating failed 
service service:TOBY 
Jul  7 19:01:44 columbia clurgmgrd[20213]: <alert> #2: Service service:TOBY 
returned failure code.  Last Owner: zodiac 
Jul  7 19:01:44 columbia clurgmgrd[20213]: <alert> #4: Administrator 
intervention required. 
Jul  7 19:01:45 columbia clurgmgrd[20213]: <alert> #2: Service service:TOBY 
returned failure code.  Last Owner: zodiac 
Jul  7 19:01:45 columbia clurgmgrd[20213]: <alert> #4: Administrator 
intervention required. 

Jul  7 19:04:05 columbia clurgmgrd[20213]: <notice> Stopping service 
service:TOBY 
Jul  7 19:04:05 columbia clurgmgrd[20213]: <notice> Service service:TOBY is 
disabled

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to