On Sun, Jan 11, 2004 at 06:51:06AM -0800, Federico Sacerdoti wrote:
> Brooks,
> Sorry for the late reply. I'm glad to get some BSD feedback on my design. I
> am suprised that the OS lies about writing so much, but I guess the error
> from flushing the dirty pages have no way to propagate to the writer process
> since the two operations happen on different time scales.

This actually effects all POSIX OSes, I'm just most familiar with the
FreeBSD case (and I'd been talking about this issue with someone a
couple days before your post).  After all, fsync exists to allow
you to say "I really, really ment all the previous writes I did".

> I like your random block reading idea, although it will pollute the disk
> cache a bit. But I dont think the pollution will be significant enough to
> matter. The privileged point is valid, although I agree that it is not a
> huge concern.

There's no way the polution is going to matter.  If you read a 4K block
every minute, you're reading less then 6MB/day.  If you notice this,
you've got one seriously broken disk subsystem (maybe accessing network
block devices over modem :-).

> For "reading random bits" we need to
> 1. Identify the disk to check (perhaps check all of them, except foreign
> net-attached ones)
> 2. Find the disk size.

This part is going to have to be machine dependent and probably pretty
hairy on some platforms.  We'll want to do the opens in the MD init code
and cache them so we can drop privledge before we start doing things.

> 3. Seek to some random location R, 0<R<disksize - L
> 4. Read L bytes (some small number).

I hope we can make this part machine independent rather then
reimplementing it on each platform since it should be simple and clean.

> I think this would be easy to test with hot-swappable drives. I may have
> some time to implement this in the spring.

It should work pretty well in that mode.  I suspect testing is likely to
expose all sorts of intresting edge cases in the kernel, but we should
be able to detect lost disks.  We'll probalby want to have some sort of
throttling code where it stops trying to test the disk after it's failed
too many times to try and avoid filling the buffer cache to the point
that we can't do anything.

-- Brooks

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

Attachment: pgpJ6zv0LqljQ.pgp
Description: PGP signature

Reply via email to