On Sat, 24 Apr 1999, Francisco Jose Montilla wrote:

>       - Doesn't SCSI controllers use parity? (Although you have to
> enable it, of course)

Yes, every modern SCSI controller I've seen uses parity.

But even more importantly--I think it's unlikely that something
like a SCSI controller is going to fail in such a manner that it's still
functional enough to corrupt data.

It's far more likely that it will just completly die and either not be
detected by the driver or not be able to detect any SCSI devices on the
scsi bus. At least, every single failed SCSI card I've ever seen failed
in one of these two manners.

As far as the controller going bad weeks before, err, I'm familiar with
BSDI and I can tell you it will print errors to the screen (just like
Linux) if it find errors with the filesystem or unreadable blocks. If he
was able to have a system up for weeks and not notice these errors in the
logs or on the screen, well...

Another thing to keep in mind is that SCSI is a high-level protocol. I can
easily see how something like a failing floppy controller could cause
damage to data, but a failing SCSI controller? Assuming it corrupted the
data sent to the drive, it would most likely corrupt the SCSI commands as
well, so a write command may well end up being something completly
undefined that the SCSI hard drive would ignore, not to mention that it
would be unable to read anything from the drive since that requires
writing commands to the hard drive would would also get corrupted. 
  
> nonetheless, although we use only one, shouldn't data corruption be
> detected by the controller parity? One step further, how will the soft

That's my understanding, though parity only really helps you protect
against cable problems. If the data and commands are being corrupted
upstream of the parity logic(say, a faulty driver), then that logic will
generate parity based on corrupted data.

Given what I said above, it should be quite obvious that if a controller
were on the verge of failing, it would most likely start returning read
errors well before any garbage commands capable of doing damage to the
data on the drive were sent.

This being the case, perhaps one thing you can do is to use tune2fs to
change the errors behavior to "remount-ro", which will cause
the kernel to mount the partition read-only if any filesystem errors are
detected. The other option is to have the kernel panic when filesystem
errors are detected.

And finally, keep an eye on your logs and the output of dmesg. Any
SCSI/filesystem  errors will be reported there. I don't buy the "silent
corruption" theory except in OS that are too primitive to log
filesystem/SCSI errors, like DOS.


>       I agree completely with the first statement. But the second sounds
> somewhat odd to me. I can hotadd or hotremove a disk on linux with sw RAID
> and a non-hot swappable capable controller, maybe this is another feature
> of sw RAID over hw RAID? 

About the only issue I can think of is that of electrical problems when
hot adding/removing a SCSI hard drive from the bus. There are companies
who make removable drive brackets that add the circuitry required to make
this safe.

Brian

Reply via email to