On Fri, 17 Aug 2012 14:19:07 -0600, Gary Aitken wrote: > 1. It appears to me that the file system (ufs) is not writing > stuff out when things are idle. If I do a sync manually and > leave the machine idle and it crashes later, it comes up clean. > If I don't do a sync manually and it crashes later, it often > comes up needing fsck. Is there a way to configure the filesystem > to cache but still write cached stuff at low priority?
Note that even if the OS orders a data write, it's up to the disk driver to actually tell the disk to do it. And the disk then _has_ to do it. There is no real "connection" (in time) for those components of the "task line", even though one would assume that they happen immediately. On a somewhat idle system, you could keep a process (e. g. top -S) running to check system processes that could be responsible for writes (or missing writes). > 2. When my machine hung (could not rlogin or ping), I powered > off and rebooted. Does the machine have a "soft power button" and it is configured to issue a "shutdown -p now" (which is quite common)? When you have access to the machine, try that. Even if the machine does not accept network logins, this mechanism might still work. > Reboot did a deferred fsck. Is this intended? Personally, I'd rather wait some time to boot in a fully checked file system environment then dealing with the uncertain situation of snapshots and background FS check activity. In worst case, I want to be prompted by fsck if a major defect has been found that requires administrator attention. Put background_fsck="NO" into /etc/rc.conf to get this behaviour. Note that as long as fsck is running, you can't enter any interactive commands, and it will happen _prior_ to allowing any network connections. Also note that this is in single user mode, so you can't switch VTs. > After it booted I logged in, and also logged in on another system. > On the remote system I could do a ping but rlogin returned > "connection reset by peer", even though I could log in locally. Does rlogin work when you "give the system some time to recover"? > I presume that is because the background fscks were not complete? Possible. Background fsck is uncertain per se, so for diagnostics better leave it aside and use the maybe "less comfortable" method. This is easy when you have local access to the machine in question. > I then did a > ps ax | grep fsck > and saw only the "logger" process for the deferred fsck's. > I did a > man logger > which appeared to hang -- no output. I'm guessing because it needed > the filesystems which hadn't yet fsck'd. Just a guess: Maybe you're experiencing a file system defect and fsck, even though running in background, needs an input? I'm not really sure about this, because I'm _intendedly_ not using fsck that way. > I then attempted to switch consoles using > <alt>fn > but could not. That would imply you're still stuck in SUM. A strange constellation given that it appears that you have fsck running in background. > I then attempted to kill the man logger process using ^C with no success. Waiting / hanging process? > Can someone shed light on the above sequence of events? It's highly > likely some of them occurred before the 60 second delay for fsck > timed out, but I'd like to understand what the heck is going on. Try to construct a more _defined_ situation for further diagnostics. Also you could boot the system up in SUM (use "boot -s") and then perform fsck manually, just to make sure your disks are fine. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"