Re: [Ganglia-developers] new disk-alive metric needed

Brooks Davis Mon, 22 Dec 2003 22:59:19 -0800

On Mon, Dec 22, 2003 at 08:26:10PM -0800, Federico Sacerdoti wrote:
> 
> This method takes advantage of the fact that most UNIX filesystems 
> flush dirty pages to disk at least once a minute. The idea is that 
> eventually we will detect that the disk cannot record new information 
> and tell the cluster about it.


I'm pretty sure this doesn't provide the semantics you want.  At least
in FreeBSD, if you can't write to the disk because it goes away,
you typically are in a state where the OS has already lied to the
application about writing the data so there is no reasonable way to tell
it that the write failed.  Since you'll be repeatedly writing small
amounts of data to the same place, the buffer cache won't run out of
space so you won't even get that notification.  You might get some
mileage from unlinking the file and re-creating it, at least in most
OSes.  Even that won't necessicairly work in FreeBSD due to softupdates.

I wonder if opening the entire disk or one large partition on it and
reading random blocks every polling interval wouldn't be a better
choice.  Assuming the disk is much larger then memory, you'll hit an
uncached block pretty fast if the disk goes away.  The downside of this
is that you have to have the disk open for read which makes the program
somewhat more privileged, but that's probably not a huge concern in most
environments.  In any case you can still drop privilege so raw access
is all you would get rendering an exploit that took advantage of this
descriptor very difficult.  This method won't protect you from disks
that go insane so they still pretend to read and write data, but don't
actually do so correctly, but I don't think that's a very common failure
mode.

-- Brooks

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

pgpDcKXqlkvJl.pgp
Description: PGP signature

Re: [Ganglia-developers] new disk-alive metric needed

Reply via email to