Your arrival to Xen's tap:sync "solution" was good to read, Jeff, thanks for 
sharing!


-----Original Message-----
From: linux-cluster-boun...@redhat.com 
[mailto:linux-cluster-boun...@redhat.com] On Behalf Of Jeff Sturm
Sent: Monday, November 22, 2010 1:03 PM
To: linux clustering
Subject: Re: [Linux-cluster] gfs2_jadd borked my cluster?

> -----Original Message-----
> From: linux-cluster-boun...@redhat.com
[mailto:linux-cluster-boun...@redhat.com]
> On Behalf Of rhu...@bidmc.harvard.edu
> Sent: Monday, November 22, 2010 10:34 AM
> To: linux-cluster@redhat.com
> Subject: Re: [Linux-cluster] gfs2_jadd borked my cluster?
> 
> I suspect a virtio_blk caching issue is causing the problems with GFS2
on KVM guests.
> I read in the RHEL 5.6 (beta) release notes that "a caching issue"
(generically written
> as this) was corrected with the virtio_blk module.  And RHEL 6
declares that GFS2 is a
> supported filesystem no KVM guests -- there is no such written
statement anywhere in
> the RHEL 5 documentation.

Although we don't use KVM or GFS2, I've seen a similar issue.  We had GFS 
filesystems periodically withdraw from the cluster, often requiring a node 
restart or fsck to fix.

We changed our Xen block devices to use the tap:sync: backend driver and 
haven't seen the problem since.  I don't have anything conclusive to tell you 
this fixed the problem, but the evidence is there.  Having no familiarity with 
KVM I can't tell you what the equivalent of tap:sync:
is, or if one even exists.

We did not stumble across this setting by accident.  Through some 
brainstorming, asking ourselves "what's different about our virtual clusters 
and physical clusters", we had guessed that block caching could be responsible. 
 If a virtual host completes some I/O and tells the cluster it is done, it 
seems intuitive that the I/O must be
complete to guarantee filesystem consistency.   From the virtual host's
perspective the I/O may be done, but the physical host is responsible for 
flushing blocks to the actual SAN, and may delay the operation, or write blocks 
in a different order than originally intended.

Xen's tap:sync: driver ensures that each block written by the virtual host is 
written immediately to the physical device.  

-Jeff



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to