Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11540



A couple of new data points:

- Mounting with barrier=0 seems to have gotten the write performance back. 

- Given that under SLES9, we see the message "disabling barrier-based syncs" 
quite soon after either Lustre or a regular ext3 mount, running under SLES10
with barrier=0 or the boot parameter barrier=off should not induce any extra
data loss for hardware failures.

- Barriers are off by default in the vanilla Linus kernel -- in the SLES10
kernel they are being turned on by a patch from Suse (more details available)

- We do know that running with barriers off makes it even more critical to run
e2fsck after a storage hardware failure - any thing that would generate SCSI
errors on linux or result in the cache being lost.

- From Documentation/block/barriers.txt, we need to find out exactly which
behavior we are seeing to start investigating why these are so slow for us. At
first glance it seems that the SCSI midlayer doesn't support command tagging, so
it is draining the whole request queue for these writes. More investigation is
needed.

- From forgetting to put into place the bug 11230 changes for the last round of
obdfilter, it seems that a lconf change to detect this kernel and make those
changes is needed.

Finally -- the big question:

- Is CFS comfortable with us running Lustre with barrier=0 ?

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to