[...]
> Well, minimize. Thing is, without a
> recovery/rollback/checkpointing 
> mechanism, you can't really know whether you've lost
> something, and/or if 
> you've lost something critical. It's like returning
> from holiday and 
> finding your front door broken. You look inside and
> nothing _seems_ amiss. 
> But then, do you remember where Granny left her money
> jar ?
> 
> I'd think saying "rely on sync" is the wrong word.
> It's more like uttering 
> a prayer - calms the soul, and won't do no harm, and
> there are believers 
> who will strongly claim it did them good. You don't
> really _know_, though.
> 
> But there's nothing wrong with a good belief, mind
> you :)

I'd say that if it flushes some buffers that wouldn't have otherwise been
flushed, then one is indeed losing less than otherwise.  Which is not to
say (absent ufs logging or the zfs intent log not having been disabled)
that there's any confidence that ufs will even be consistent, let alone that
any particular state of transactions will be consistent; and forget about any
sort of logical consistency at all from the application point of view.  I've got
all that.  Still, a best effort is not worse than simply abandoning any data
that may be in un-flushed buffers.  It may or may not do a darn bit of good,
but in the case in question, there was a good chance that it would (and indeed
it did, since it looked like some buffers did get flushed, and everything 
appeared
to be ok afterward).

> My experience there is rather that if the 'syncing
> filesystems ...' part 
> works, then the dump will not hang either. They tend
> to go through the 
> same I/O drivers/devices. In fact, the 'syncing ...'
> part accesses more 
> I/O devs than the 'dumping ...' part does (the former
> goes for everything 
> unflushed, while the latter only attempts to get at
> the dump device). We 
> do have some service documents explaining how to get
> a dump if the box 
> hangs during 'syncing filesystems ...' - but none to
> my knowledge that do 
> the opposite.
> The time the dump takes, though, is known to be
> "high".

Right, it wasn't a question of sync or dump hanging; they both ran, except that
(a) the dump took a good 10 minutes, and under the circumstances wasn't needed
anyway since the problem was more or less understood, and (b) since VM was
exhausted (and RAM was probably larger than swap), the dump was incomplete 
anyway.

The problem is that one can't know in advance whether, if a situations arises 
where one
can't log in even on the console, a dump would be worth taking or not.  
Therefore, the
system is probably set up to default to dumping to a primary swap partition, 
since it
might be useful.  Only at the time one has to force a "panic 0" does one know 
whether
or not the dump would actually be worth the time it takes.

Another alternative would be if dumps were always interruptible with another 
L1-A (or break).
That might be simpler than a sync option plus kernel support for it, and would 
probably help in
that situation on x86 as well.  But if anything, it's been awhile (years, or 
perhaps just a box with
a Sun non-USB keyboard - I don't know whether it's an OS version or a hardware 
configuration that
distinguishes between those where dumps can vs can't be interrupted) since I've 
run into a case where
they _were_ interruptible.
 
 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to