Re: UFS2 fsck Question (semantics of -p)

2006-09-04 Thread Can Sar
We ran our experiment on top of a very simple RAM disk which does not  
have any caches or anything of that sort.


The dmesg log is at http://keeda.stanford.edu/dmesg

The resultant images are at:
http://keeda.stanford.edu/ufs-umount-image
http://keeda.stanford.edu/ufs-mount-sync-image


If you run fsck -p on them, fsck will not be able to recover, while  
fsck without the -p option will be able to.


Can


On Aug 29, 2006, at 9:55 AM, Chuck Swiger wrote:


Can Sar wrote:
[ ... ]
Would you consider it an error if the -p option does not fix  
inconsistencies caused by a simple power failure, without any  
hardware or software corruption?


You're asking an interesting question, but the issue of data  
integrity depends not only on the software which comprises the OS,  
but also on the hardware being used.


In particular, the system depends upon the hard drives to reliably  
report when data being written actually has been; SCSI drives,  
using tagged command queuing, especially in conjunction with a  
battery-backup which ensures the drive stays up long enough to  
flush it's write cache even if system power is removed, will tend  
to fare pretty well.


IDE drives, by contrast, have a bad habit of lying about whether  
data has actually been written to the disk itself rather than  
simply making it to the write cache on the drive.  (Such drives  
ignore the ATA "FLUSH CACHE" command, specificly.)


In other words, showing that a filesystem can become inconsistent  
in a fashion that "fsck -p" cannot correct is interesting and a  
concern regardless of the circumstances, but showing it in cases  
where you are using battery-backed drives and/or SCSI rather than  
IDE is a lot more meaningful.  If you are using IDE devices, your  
testing will be more meaningful if you disable the IDE write-cache  
entirely.  Also, you should put your results somewhere, perhaps on  
a webpage with links to the filesystem images and a complete dmesg  
so that the OS version and hardware being used is well-documented.


--
-Chuck



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: UFS2 fsck Question (semantics of -p)

2006-08-29 Thread Chuck Swiger

Can Sar wrote:
[ ... ]
Would you consider it an error if the -p option does not fix 
inconsistencies caused by a simple power failure, without any hardware 
or software corruption?


You're asking an interesting question, but the issue of data integrity depends 
not only on the software which comprises the OS, but also on the hardware 
being used.


In particular, the system depends upon the hard drives to reliably report when 
data being written actually has been; SCSI drives, using tagged command 
queuing, especially in conjunction with a battery-backup which ensures the 
drive stays up long enough to flush it's write cache even if system power is 
removed, will tend to fare pretty well.


IDE drives, by contrast, have a bad habit of lying about whether data has 
actually been written to the disk itself rather than simply making it to the 
write cache on the drive.  (Such drives ignore the ATA "FLUSH CACHE" command, 
specificly.)


In other words, showing that a filesystem can become inconsistent in a fashion 
that "fsck -p" cannot correct is interesting and a concern regardless of the 
circumstances, but showing it in cases where you are using battery-backed 
drives and/or SCSI rather than IDE is a lot more meaningful.  If you are using 
IDE devices, your testing will be more meaningful if you disable the IDE 
write-cache entirely.  Also, you should put your results somewhere, perhaps on 
a webpage with links to the filesystem images and a complete dmesg so that the 
OS version and hardware being used is well-documented.


--
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


UFS2 fsck Question (semantics of -p)

2006-08-28 Thread Can Sar

Hi,

I work on a project to automatically (by dynamically running the  
system) find crash recovery errors in storage systems and we are  
beginning to do some preliminary checking of FreeBSD. We found an  
"error" where on power failure the disk can get corrupted even after  
an operation has returned successfully on a synchronous mount. Fsck  
running with the -p option cannot fix this error, while running it  
without, does happen to fix it (in this particular test case).  
However, the fsck manfile says the following:


"The kernel takes care that only a restricted class of innocuous file  
sys-
 tem inconsistencies can happen unless hardware or software  
failures

 intervene.  These are limited to the following:

   Unreferenced inodes
   Link counts in inodes too large
   Missing blocks in the free map
   Blocks in the free map also in files
   Counts in the super-block wrong

 These are the only inconsistencies that fsck_ffs with the -p  
option will
 correct; if it encounters other inconsistencies, it exits with  
an abnor-
 mal return status and an automatic reboot will then fail.  For  
each cor-
 rected inconsistency one or more lines will be printed  
identifying the
 file system on which the correction will take place, and the  
nature of
 the correction.  After successfully correcting a file system,  
fsck_ffs
 will print the number of files on that file system, the number  
of used

 and free blocks, and the percentage of fragmentation."

Would you consider it an error if the -p option does not fix  
inconsistencies caused by a simple power failure, without any  
hardware or software corruption?


I have two example ufs2 images of such errors that you can download.

http://keeda.stanford.edu/ufs-umount-image
http://keeda.stanford.edu/ufs-mount-sync-image

Thank you very much for your help,
Can Sar
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"