On Thu, Nov 13, 2008 at 02:45:14AM -0800, Jeremy Chadwick wrote: > On Thu, Nov 13, 2008 at 12:26:42PM +0200, Kostik Belousov wrote: > > On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote: > > > On Thu, Nov 13, 2008 at 12:41:02AM +0000, Tim Bishop wrote: > > > > On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote: > > > > > On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote: > > > > > > I've been playing around with snapshots lately but I've got a > > > > > > problem on > > > > > > one of my servers running 7-STABLE amd64: > > > > > > > > > > > > FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov > > > > > > 10 20:49:51 GMT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PALADIN > > > > > > amd64 > > > > > > > > > > > > I run the mksnap_ffs command to take the snapshot and some time > > > > > > later > > > > > > the system completely freezes up: > > > > > > > > > > > > paladin# cd /u2/.snap/ > > > > > > paladin# mksnap_ffs /u2 test.1 > > > > > > > > > > > > It only happens on this one filesystem, though, which might be to do > > > > > > with its size. It's not over the 2TB marker, but it's pretty close. > > > > > > It's > > > > > > also backed by a hardware RAID system, although a smaller > > > > > > filesystem on > > > > > > the same RAID has no issues. > > > > > > > > > > > > Filesystem 1K-blocks Used Avail Capacity Mounted on > > > > > > /dev/da0s1a 2078881084 921821396 990749202 48% /u2 > > > > > > > > > > > > To clarify "completely freezes up": unresponsive to all services > > > > > > over > > > > > > the network, except ping. On the console I can switch between the > > > > > > ttys, > > > > > > but none of them respond. The only way out is to hit the reset > > > > > > button. > > > > > > > > > > You need to provide information described in the > > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html > > > > > and especially > > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > > > > > > > > Ok, I've done that, and removed the patch that seemed to fix things. > > > > > > > > The first thing I notice after doing this on the console is that I can > > > > still ctrl+t the process: > > > > > > > > load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k > > > > > > > > But the top and ps I left running on other ttys have all stopped > > > > responding. > > > > > > Then in my book, the patch didn't fix anything. :-) The system is > > > still "deadlocking"; snapshot generation **should not** wedge the system > > > hard like this. > > You systematically mix two completely different issues: > > - first one is the _deadlock_ experienced by Tim; > > Re-read what he wrote. Quote: > > "Ok, I've done that, and removed the patch that seemed to fix things. > > The first thing I notice after doing this on the console is that I can > still ctrl+t the process: > > load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k > > But the top and ps I left running on other ttys have all stopped > responding." > > If he can press Control-T, it means SIGINFO can be sent to the > mksnap_ffs process, and the process responds with that information. So, > the system is not deadlocked -- meaning, I believe what he experiences > is what others experience (the system becomes completely unusable during > mksnap_ffs running, but DOES NOT hang or lock up, it just becomes so > god-awful slow that processes on the machine literally sit and spin for > minutes at a time).
Unless NOKERNINFO is specified in the local flags in the controlling terminal termios, kernel prints one line summary as shown above. This is done from the tty discipline input handler (or whatever it is in new tty code). No process cooperation is required. On the other hand, actually delivering SIGINFO and getting output from the process-installed handler do require process to either executing usermode or sleeping interruptible.
pgpLcGHHYtlZZ.pgp
Description: PGP signature