Sometimes a large system, despite precautions (or in the absence of them),
runs out of resources (VM, mainly) to the degree that no useful progress
is being made: that is, one can't even log in and kill the hogging processes.

(At least on SPARC) the usual workaround would be to break to the boot PROM
and sync; however, this invariably causes a crash dump to be taken.  In the
common case where dump partition = swap partition, the dump might
well not be complete anyway, since swap was already full.  And one may
well know by experience what the likely culprits are.

And on a large system, the crash dump can take longer to complete than
the subsequent reboot!

So it would be really nice to have an option to force a sync and reboot
_without_ taking a crash dump.  The ability to pass the option would require
boot PROM support (or else some obscure use of an existing command to
set something the kernel could readily detect); but the ability to optionally
_not_ take the crash dump even if it otherwise could and would, would
require kernel support.

Granted, the situation shouldn't come up all that often.  But it has been
known to do so (and one of the culprits tends to be a 3rd-party system
monitoring application that I won't name; ironic, IMO, that something
intended to warn of problems instead causes them); and when it does,
reducing the down-time by the 5-10 minutes that a really long crash
dump can take, would IMO be quite helpful.

Am I nuts, or is that a generic enough issue that someone else might
agree that it's interesting?
 
 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to