Re: svn commit: r222537 - in head/sys: kern sys
On Thu, 2 Jun 2011, Kenneth D. Merry wrote: On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote: On Wed, 1 Jun 2011, Andriy Gapon wrote: on 31/05/2011 20:29 Kenneth D. Merry said the following: + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); Sorry that I didn't gather myself together for a review before this change got actually committed. Do you see any reason not to make this spinlock recursive? It does need to be recursive to avoid deadlock (as I described), but this prevents it actually working as a lock. I think the new code needs it to work as a lock. I see reasons why it must not exist. The message buffer code cannot use any normal locking, and was carefully written to avoid doing so. I am a little bit worried about exotic situations like receiving an NMI in the middle of printing and wanting to print in the NMI context, or similar things that penetrate contexts with disabled interrupts - e.g. Machine Check Exception. Also it's not clear to me if there won't any bigger damage in the situations like those described above. P.S. I have been thinking about fixing the problem in a different fashion, via reserving portions of dmesg buffer for a whole message using CAS: http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html Something like this might work. PRINTF_BUFR size should not exist either, especially now that it is much more complicated and broken. Here is some of my old mail about adding (necessarily not normal) locking to to printf. No one replied, so I never finished this :-(. That particular solution doesn't lock the normal kernel printf path, and so won't fix what we're trying to fix. (We've got lots of disks in each system, and when there are various SES events and disks arriving and departing, there are multiple kernel contexts doing printfs simultaneously, and that results in scrambled dmesg output.) But it does. PRINTF_BUFR_SIZE (at least used to) only be applied to vprintf(), but my printf_lock() is applied to all calls to kvprintf() that set TOLOG. (If I missed some, this takes 2 lines each to fix.) The calls in tprintf() and log() have comments saying that this is intentional to serialize msgbuf, though I fear this might not be safe for at least tprintf(), since it may block unboundedly. But even when it blocks unboundedly, nothing really bad happens: the output just becomes very slow due to everything waiting for the timeout until the blockage clears, and may be interleaved. Recursion gives similar behaviour, except it takes unrecursing to clear the blockage. I consider the very slow output to be a feature. Unbounded blockages and recursion are bugs and the slow output makes them more obvious. I think the goals should be: - console output, syslog output, and dmesg output are not scrambled. (i.e. individual kernel printfs make it out atomically, or at least with a certain granularity, like PRINTF_BUFR_SIZE.) Console output includes debugger output. This must be able to break into any otherwise-atomic kernel printf. Possibly similarly for NMIs (if they print. They really shouldn't print, but they at least used to get this horribly wrong from the beginning on i386). Similarly for certain (non maskable) traps. These really shouldn't happen while we're printing, but if they do then they may need to panic and then print for things like machine check exceptions. Debugger output is the easiest way of showing why printf cannot use any normal locking. There is an option (that I don't like or use) which supports debugger output going through normal printf. I think the message buffer is used for this. So message buffer accesses cannout use normal locking either. Apart from this, I don't know any reason why a printf or even a msgbuf write can't wait for a while (a very long while if the output is to a slow device -- don't forget to test with a 50 bps serial console) until any previous printfs complete. My simple serialization gives this if the other printfs are on other CPUs, and possibly if they are on other ithreads (the simple serialization doesn't give up control for simplicity, since it is hard to tell when it is safe to give up control, but preemption while we are spinning might get back to the other printfs). And when the printf is recursive, there is no way back to the other printfs, so we must proceed. We could proceed immediately, but my simple serialization waits in this case too for simplicity. This case should be even rarer than waiting too long for another CPU. - Can be called by any kernel routine (i.e. doesn't sleep) Some complications for tprintf and uprintf (and maybe ordinary printf if it is connected to a pty). I think we already have the necessary complications. Something involving printing only to the message buffer in the non-sleeping part. - Offers a string at a time interface. I don't provide this. So printf's a character or word or 2 at a time may be interleaved. It is
Re: svn commit: r222537 - in head/sys: kern sys
On Fri, Jun 03, 2011 at 20:47:00 +1000, Bruce Evans wrote: On Thu, 2 Jun 2011, Kenneth D. Merry wrote: On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote: On Wed, 1 Jun 2011, Andriy Gapon wrote: on 31/05/2011 20:29 Kenneth D. Merry said the following: + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); Sorry that I didn't gather myself together for a review before this change got actually committed. Do you see any reason not to make this spinlock recursive? It does need to be recursive to avoid deadlock (as I described), but this prevents it actually working as a lock. I think the new code needs it to work as a lock. Yes, it does. I see reasons why it must not exist. The message buffer code cannot use any normal locking, and was carefully written to avoid doing so. I am a little bit worried about exotic situations like receiving an NMI in the middle of printing and wanting to print in the NMI context, or similar things that penetrate contexts with disabled interrupts - e.g. Machine Check Exception. Also it's not clear to me if there won't any bigger damage in the situations like those described above. P.S. I have been thinking about fixing the problem in a different fashion, via reserving portions of dmesg buffer for a whole message using CAS: http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html Something like this might work. PRINTF_BUFR size should not exist either, especially now that it is much more complicated and broken. Here is some of my old mail about adding (necessarily not normal) locking to to printf. No one replied, so I never finished this :-(. That particular solution doesn't lock the normal kernel printf path, and so won't fix what we're trying to fix. (We've got lots of disks in each system, and when there are various SES events and disks arriving and departing, there are multiple kernel contexts doing printfs simultaneously, and that results in scrambled dmesg output.) But it does. PRINTF_BUFR_SIZE (at least used to) only be applied to vprintf(), but my printf_lock() is applied to all calls to kvprintf() that set TOLOG. (If I missed some, this takes 2 lines each to fix.) The calls in tprintf() and log() have comments saying that this is intentional to serialize msgbuf, though I fear this might not be safe for at least tprintf(), since it may block unboundedly. But even when it blocks unboundedly, nothing really bad happens: the output just becomes very slow due to everything waiting for the timeout until the blockage clears, and may be interleaved. Recursion gives similar behaviour, except it takes unrecursing to clear the blockage. I consider the very slow output to be a feature. Unbounded blockages and recursion are bugs and the slow output makes them more obvious. Ahh. It wasn't 100% clear from looking at the patch which calls had the lock applied. In this case at least, it's skipping the lock when the priority is -1. Is this tprintf()? % /* % * Warn that a system table is full. % @@ -198,5 +221,9 @@ % pca.flags = flags; % va_start(ap, fmt); % + if (pri != -1) % + printf_lock(); /* To serialize msgbuf. XXX: long-held lock? */ % kvprintf(fmt, putchar, pca, 10, ap); % + if (pri != -1) % + printf_unlock(); % va_end(ap); % if (sess != NULL) % I think the goals should be: - console output, syslog output, and dmesg output are not scrambled. (i.e. individual kernel printfs make it out atomically, or at least with a certain granularity, like PRINTF_BUFR_SIZE.) Console output includes debugger output. This must be able to break into any otherwise-atomic kernel printf. Possibly similarly for NMIs (if they print. They really shouldn't print, but they at least used to get this horribly wrong from the beginning on i386). Similarly for certain (non maskable) traps. These really shouldn't happen while we're printing, but if they do then they may need to panic and then print for things like machine check exceptions. Debugger output is the easiest way of showing why printf cannot use any normal locking. There is an option (that I don't like or use) which supports debugger output going through normal printf. I think the message buffer is used for this. So message buffer accesses cannout use normal locking either. Apart from this, I don't know any reason why a printf or even a msgbuf write can't wait for a while (a very long while if the output is to a slow device -- don't forget to test with a 50 bps serial console) until any previous printfs complete. My simple serialization gives this if the other printfs are on other CPUs, and possibly if they are on other ithreads (the simple serialization doesn't give up control for simplicity, since it is hard to tell when it is safe to give up control, but preemption while we are spinning might get back to the other printfs). And when
Re: svn commit: r222537 - in head/sys: kern sys
On Fri, 3 Jun 2011, Kenneth D. Merry wrote: On Fri, Jun 03, 2011 at 20:47:00 +1000, Bruce Evans wrote: On Thu, 2 Jun 2011, Kenneth D. Merry wrote: That particular solution doesn't lock the normal kernel printf path, and so won't fix what we're trying to fix. (We've got lots of disks in each system, and when there are various SES events and disks arriving and departing, there are multiple kernel contexts doing printfs simultaneously, and that results in scrambled dmesg output.) But it does. PRINTF_BUFR_SIZE (at least used to) only be applied to vprintf(), but my printf_lock() is applied to all calls to kvprintf() that set TOLOG. (If I missed some, this takes 2 lines each to fix.) ... Ahh. It wasn't 100% clear from looking at the patch which calls had the lock applied. In this case at least, it's skipping the lock when the priority is -1. Is this tprintf()? % /* % * Warn that a system table is full. % @@ -198,5 +221,9 @@ % pca.flags = flags; % va_start(ap, fmt); % + if (pri != -1) % + printf_lock(); /* To serialize msgbuf. XXX: long-held lock? */ % kvprintf(fmt, putchar, pca, 10, ap); % + if (pri != -1) % + printf_unlock(); % va_end(ap); % if (sess != NULL) % Hmm. -1 means `not TOLOG' (see earlier in tprintf()), and I only apply printf_lock() if the output is kernel-only _or_ uses TOLOG (cases that go TOTTY only might need some serialization, but they can probably block and I don't want to apply the low-level printf_lock() to them). Only tprintf() has this magic -1 priority, and this magic never used, at least in the FreeBSD /sys tree and matching '[^a-z]tprintf('. There are just 4 calls to tprintf(), and they are all related to nfs and all use pri = LOG_INFO. So tprintf() always goes TOLOG in practice. uprintf() is relatively popular, and never goes TOLOG. The only obvious difference between uprintf() and tprintf() with pri -1 is that the former users curthread-td_proc while the latter takes a proc pointer. The proc pointer is constructed from a thread pointer passed around by nfs so I can't see what it is, but I bet it is always curthread-td_proc or perhaps NULL. If it is NULL, then tprintf() doesn't print to a tty and reduces to a buggy version of log() if pri != -1 (log() goes TOCONS if !log_open, while the reduced tprintf() always goes TOLOG); if pri == -1, then tprintf() prints nowhere (and hopefully doesn't get confused doing this). ... Apart from this, I don't know any reason why a printf or even a msgbuf write can't wait for a while (a very long while if the output is to a slow device -- don't forget to test with a 50 bps serial console) until any previous printfs complete. My simple serialization gives this if the other printfs are on other CPUs, and possibly if they are on other ithreads (the simple serialization doesn't give up control for simplicity, since it is hard to tell when it is safe to give up control, but preemption while we are spinning might get back to the other printfs). And when the printf is recursive, there is no way back to the other printfs, so we must proceed. We could proceed immediately, but my simple serialization waits in this case too for simplicity. This case should be even rarer than waiting too long for another CPU. Hopefully the delay won't be too bad for most consumers. It'll take some testing to see what happens. Certainly the cases of TOLOG only, and TOCONS to a synchronous console, will be very fast. I'll try to get some time to run some tests with your locking code and see how it works. It will be good to test on a bigger system that actually needs lots of kernel printfs. Bruce ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r222537 - in head/sys: kern sys
On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote: On Wed, 1 Jun 2011, Andriy Gapon wrote: on 31/05/2011 20:29 Kenneth D. Merry said the following: + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); Sorry that I didn't gather myself together for a review before this change got actually committed. Do you see any reason not to make this spinlock recursive? I see reasons why it must not exist. The message buffer code cannot use any normal locking, and was carefully written to avoid doing so. I am a little bit worried about exotic situations like receiving an NMI in the middle of printing and wanting to print in the NMI context, or similar things that penetrate contexts with disabled interrupts - e.g. Machine Check Exception. Also it's not clear to me if there won't any bigger damage in the situations like those described above. P.S. I have been thinking about fixing the problem in a different fashion, via reserving portions of dmesg buffer for a whole message using CAS: http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html Something like this might work. PRINTF_BUFR size should not exist either, especially now that it is much more complicated and broken. Here is some of my old mail about adding (necessarily not normal) locking to to printf. No one replied, so I never finished this :-(. That particular solution doesn't lock the normal kernel printf path, and so won't fix what we're trying to fix. (We've got lots of disks in each system, and when there are various SES events and disks arriving and departing, there are multiple kernel contexts doing printfs simultaneously, and that results in scrambled dmesg output.) I think the goals should be: - console output, syslog output, and dmesg output are not scrambled. (i.e. individual kernel printfs make it out atomically, or at least with a certain granularity, like PRINTF_BUFR_SIZE.) - Can be called by any kernel routine (i.e. doesn't sleep) - Offers a string at a time interface. - Does the right thing for log messages (priority, etc.) It looks like we should add does not use normal locking to the list as well. As Justin mentioned, we started off down the path of using atomic operations, and then figured out that wouldn't fully work. Then we discussed doing a per-CPU message buffer, with each message tagged with a sequence number, and having the reader re-serialize the messages in the right order. The complexity started to get large enough that we decided that using a spin lock would be a much easier approach. cnputs() already uses a spinlock, so we're no worse off than before from a locking standpoint. We could perhaps make the message buffer mutex recursive and set the MTX_NOWITNESS flag as well, that might make things a little better from a side effect standpoint. If we can come up with a solution that meets the above goals, Justin and I would be happy to help implement it. Ken -- Kenneth Merry k...@freebsd.org ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r222537 - in head/sys: kern sys
on 31/05/2011 20:29 Kenneth D. Merry said the following: + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); Sorry that I didn't gather myself together for a review before this change got actually committed. Do you see any reason not to make this spinlock recursive? I am a little bit worried about exotic situations like receiving an NMI in the middle of printing and wanting to print in the NMI context, or similar things that penetrate contexts with disabled interrupts - e.g. Machine Check Exception. Also it's not clear to me if there won't any bigger damage in the situations like those described above. P.S. I have been thinking about fixing the problem in a different fashion, via reserving portions of dmesg buffer for a whole message using CAS: http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html -- Andriy Gapon ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r222537 - in head/sys: kern sys
On Wed, 1 Jun 2011, Andriy Gapon wrote: on 31/05/2011 20:29 Kenneth D. Merry said the following: + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); Sorry that I didn't gather myself together for a review before this change got actually committed. Do you see any reason not to make this spinlock recursive? I see reasons why it must not exist. The message buffer code cannot use any normal locking, and was carefully written to avoid doing so. I am a little bit worried about exotic situations like receiving an NMI in the middle of printing and wanting to print in the NMI context, or similar things that penetrate contexts with disabled interrupts - e.g. Machine Check Exception. Also it's not clear to me if there won't any bigger damage in the situations like those described above. P.S. I have been thinking about fixing the problem in a different fashion, via reserving portions of dmesg buffer for a whole message using CAS: http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html Something like this might work. PRINTF_BUFR size should not exist either, especially now that it is much more complicated and broken. Here is some of my old mail about adding (necessarily not normal) locking to to printf. No one replied, so I never finished this :-(. %%% From b...@optusnet.com.au Mon Jun 15 16:18:03 2009 +1000 Date: Mon, 15 Jun 2009 16:17:59 +1000 (EST) From: Bruce Evans b...@optusnet.com.au X-X-Sender: b...@besplex.bde.org To: Bruce Evans b...@optusnet.com.au cc: Sam Leffler s...@freebsd.org, Ed Schouten e...@freebsd.org, src-committ...@freebsd.org, svn-src-...@freebsd.org, svn-src-head@FreeBSD.org Subject: Re: svn commit: r194204 - in head/sys: amd64/conf i386/conf In-Reply-To: 20090615114142.b...@besplex.bde.org Message-ID: 20090615153040.r1...@besplex.bde.org References: 200906141801.n5ei1zti056...@svn.freebsd.org 4a356a0f.3050...@freebsd.org 20090615075134.k24...@delplex.bde.org 4a359aa6.7010...@freebsd.org 20090615114142.b...@besplex.bde.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Status: RO X-Status: A X-Keywords: X-UID: 913 On Mon, 15 Jun 2009, Bruce Evans wrote: On Sun, 14 Jun 2009, Sam Leffler wrote: I'd love to see someone properly lock printf/console output instead of kludges like PRINTF_BUFR_SIZE. Almost done (for FreeBSD-~5.2): non-production version with lots of debugging cruft: This version for RELENG_7 seems to work as intended. jhb should look at its atomic ops. I only have 1 normal source of interleaved messages -- the messages about waiting for system processes on shutdown on SMP systems -- and this seems to be fixed. The patch applies even without removing PRINTF_BUFR_SIZE or cnputs(), and works, at least if PRINTF_BUFR_SIZE and the per-cpu buffer are not enabled. Note that it serializes accesses to the message buffer too. The locks automatically cover the message buffer if they are applied, and are applied to all cases that set TOLOG, though this might be too dangerous for tprintf() since printing to a user's console shouldn't block other parts of the kernel. printf() to a redirected console is more careful about this -- it extracts messages from the message buffer and can handle long delays provided the message buffer doesn't wrap around. This might be the best method for all cases. Accesses to the message buffer need to be serialized for this to work properly, and serializing ony accesses to the message buffer is easier since there are no hardware delays to worry about -- strict locking with no timeouts will work provided the halting problem is solvable for kvprintf() to the message buffer alone. No attempt is made to buffer messages across printf()s. Neither does PRINTF_BUFR_SIZE. PRINTF_BUFR_SIZE uses an auto buffer so it obviously has to flush at the end of each printf. Always extracting from the message buffer might be the best way to handle this too. Normally you would extract complete lines, but there must be a timeout to handle partial lines. The 1 second timeout here isn't best if the timeout is needed for avoiding deadlock, as may happen when a printf() is interrupted by panic(). panic could reasonably() blow open all locks including printf_lock(), but this is not easy to do without letting other CPUs proceed and/or complicating the usual case. Very slow terminals, say 50 bps, need a timeout of say 16 seconds so that the can print an 80-character line without being interrupted. % Index: subr_prf.c % === % RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v % retrieving revision 1.130.2.1 % diff -u -2 -r1.130.2.1 subr_prf.c % --- subr_prf.c21 Jan 2009 00:26:45 - 1.130.2.1 % +++ subr_prf.c15 Jun 2009 05:32:03 - % @@ -112,4 +112,27 @@ % always_console_output, 0, Always output to console despite TIOCCONS.); % % +static int printf_lockcount; % + % +static void % +printf_lock(void) %
Re: svn commit: r222537 - in head/sys: kern sys
On 6/1/11 12:07 AM, Andriy Gapon wrote: on 31/05/2011 20:29 Kenneth D. Merry said the following: + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); Sorry that I didn't gather myself together for a review before this change got actually committed. Do you see any reason not to make this spinlock recursive? I am a little bit worried about exotic situations like receiving an NMI in the middle of printing and wanting to print in the NMI context, or similar things that penetrate contexts with disabled interrupts - e.g. Machine Check Exception. Also it's not clear to me if there won't any bigger damage in the situations like those described above. P.S. I have been thinking about fixing the problem in a different fashion, via reserving portions of dmesg buffer for a whole message using CAS: http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html This is actually where we started (although unfortunately independently since neither Ken nor I saw your previous email). As you say in your message form last April, integrating this type of strategy into the message buffer code is problematic. You need to track the in-progress writes (i.e. holes in the message buffer) so that only fully committed data is printed. I'm sure this can be done with per-cpu data structures and a heavy dose of atomic ops, but after an hour of discussion, we wound up with a ton of complexity and little confidence our design was sound. Even using a spinlock is a lot more complex than it would at first appear. Ken spent almost 2 weeks completing the work and ferreting out all of the corner cases. This bug has been in FreeBSD since SMP was added. While our solution may not be ideal, it is better than what was there before. Committing it seems to have also reignited the discussion on how to implement an even better solution. If there are concrete ideas on how to improve this code further, Ken and I will participate in the work to get them into -current. -- Justin ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r222537 - in head/sys: kern sys
On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote: Author: ken Date: Tue May 31 17:29:58 2011 New Revision: 222537 URL: http://svn.freebsd.org/changeset/base/222537 Log: Fix apparent garbage in the message buffer. While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix scrambled console output, the message buffer and syslog were still getting log messages one character at a time. While all of the characters still made it into the log (courtesy of atomic operations), they were often interleaved when there were multiple threads writing to the buffer at the same time. This seems to panic my box with lock msgbuf 0xfe0127e0 already initialized. Unfortunately, though I booted with a fresh CURRENT this morning successfully, both /boot/kernel and /boot/kernel.old give this panic. To add insult to injury, when the kernel drops into the debugger, my keyboard input no longer works so I can't get a stack, etc. So: 1) Is there anything else I can do to help debug this? 2) how can I resurrect this box without a reinstall? I will try to repro on a virtual machine so I have a snapshot to come back to. Thanks, matthew This fixes message buffer accesses to use buffering logic as well, so that strings that are less than PRINTF_BUFR_SIZE will be put into the message buffer atomically. So now dmesg output should look the same as console output. subr_msgbuf.c: Convert most message buffer calls to use a new spin lock instead of atomic variables in some places. Add a new routine, msgbuf_addstr(), that adds a NUL-terminated string to a message buffer. This takes a priority argument, which allows us to eliminate some races (at least in the the string at a time case) that are present in the implementation of msglogchar(). (dangling and lastpri are static variables, and are subject to races when multiple callers are present.) msgbuf_addstr() also allows the caller to request that carriage returns be stripped out of the string. This matches the behavior of msglogchar(), but in testing so far it doesn't appear that any newlines are being stripped out. So the carriage return removal functionality may be a candidate for removal later on if further analysis shows that it isn't necessary. subr_prf.c: Add a new msglogstr() routine that calls msgbuf_logstr(). Rename putcons() to putbuf(). This now handles buffered output to the message log as well as the console. Also, remove the logic in putcons() (now putbuf()) that added a carriage return before a newline. The console path was the only path that needed it, and cnputc() (called by cnputs()) already adds a carriage return. So this duplication resulted in kernel-generated console output lines ending in '\r''\r''\n'. Refactor putchar() to handle the new buffering scheme. Add buffering to log(). Change log_console() to use msglogstr() instead of msglogchar(). Don't add extra newlines by default in log_console(). Hide that behavior behind a tunable/sysctl (kern.log_console_add_linefeed) for those who would like the old behavior. The old behavior led to the insertion of extra newlines for log output for programs that print out a string, and then a trailing newline on a separate write. (This is visible with dmesg -a.) msgbuf.h: Add a prototype for msgbuf_addstr(). Add three new fields to struct msgbuf, msg_needsnl, msg_lastpri and msg_lock. The first two are needed for log message functionality previously handled by msglogchar(). (Which is still active if buffering isn't enabled.) Include sys/lock.h and sys/mutex.h for the new mutex. Reviewed by: gibbs Modified: head/sys/kern/subr_msgbuf.c head/sys/kern/subr_prf.c head/sys/sys/msgbuf.h Modified: head/sys/kern/subr_msgbuf.c
Re: svn commit: r222537 - in head/sys: kern sys
On Tue, May 31, 2011 at 2:00 PM, m...@freebsd.org wrote: On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote: Author: ken Date: Tue May 31 17:29:58 2011 New Revision: 222537 URL: http://svn.freebsd.org/changeset/base/222537 Log: Fix apparent garbage in the message buffer. While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix scrambled console output, the message buffer and syslog were still getting log messages one character at a time. While all of the characters still made it into the log (courtesy of atomic operations), they were often interleaved when there were multiple threads writing to the buffer at the same time. This seems to panic my box with lock msgbuf 0xfe0127e0 already initialized. Unfortunately, though I booted with a fresh CURRENT this morning successfully, both /boot/kernel and /boot/kernel.old give this panic. To add insult to injury, when the kernel drops into the debugger, my keyboard input no longer works so I can't get a stack, etc. So: 1) Is there anything else I can do to help debug this? 1. sysctl debug.debugger_on_panic=1 ? 2) how can I resurrect this box without a reinstall? 2. Best way is to probably to use the bsdinstall CD, use the LiveCD mode, setup the system as usual (mount /, mount devfs, chroot, mount -a), rewind to an earlier version of svn (shouldn't be too hard if you run /etc/rc.d/network restart from inside the chroot), rebuild the kernel (and potentially world), and install the kernel to the chroot, then exit and reboot (this is a method I picked up from installing Gentoo Linux multiple times, but it should work for FreeBSD as well). This is part of the reason why I'm an avid using of make installkernel INSTKERNNAME=$KERNCONF.$SVN_REVISION , symlink /boot/kernel to the latest one I want to boot, and I only go through every once in a blue moon to reap the kernels I don't need anymore -- I don't know until after a few weeks soak on my workstation whether or not a regression is present in the kernel. I will try to repro on a virtual machine so I have a snapshot to come back to. HTH! -Garrett ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r222537 - in head/sys: kern sys
On Tue, May 31, 2011 at 14:00:18 -0700, m...@freebsd.org wrote: On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote: Author: ken Date: Tue May 31 17:29:58 2011 New Revision: 222537 URL: http://svn.freebsd.org/changeset/base/222537 Log: ?Fix apparent garbage in the message buffer. ?While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix ?scrambled console output, the message buffer and syslog were still getting ?log messages one character at a time. ?While all of the characters still ?made it into the log (courtesy of atomic operations), they were often ?interleaved when there were multiple threads writing to the buffer at the ?same time. This seems to panic my box with lock msgbuf 0xfe0127e0 already initialized. Unfortunately, though I booted with a fresh CURRENT this morning successfully, both /boot/kernel and /boot/kernel.old give this panic. To add insult to injury, when the kernel drops into the debugger, my keyboard input no longer works so I can't get a stack, etc. Uh-oh! So: 1) Is there anything else I can do to help debug this? 2) how can I resurrect this box without a reinstall? I will try to repro on a virtual machine so I have a snapshot to come back to. My guess is that this is an issue with the message buffer reinitialization path. lock_init() (called by mtx_init()) has an assert to make sure that the lock is initialized, and that is just a flag check. Since the spin lock is part of the message buffer structure, if it is held over from a previous boot, the LO_INITIALIZED flag may still be set. Try power cycling the machine. If it is an issue with re-initialization, that should clear the memory and allow you to boot. My testing has been with VMs (under Xen), so the reinit path has probably not been tested as fully as it should have been. Sorry about that! As for the debugger, that's another issue altogether. It does work for me, but then again if the spin lock initialization is broken for the message buffer that may affect things. Try a cold boot and see if that helps. If so, I think we can probably just bzero the mutex in msgbuf_reinit() and that will fix things. Ken -- Kenneth Merry k...@freebsd.org ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r222537 - in head/sys: kern sys
On Tue, May 31, 2011 at 2:46 PM, Kenneth D. Merry k...@freebsd.org wrote: On Tue, May 31, 2011 at 14:00:18 -0700, m...@freebsd.org wrote: On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote: Author: ken Date: Tue May 31 17:29:58 2011 New Revision: 222537 URL: http://svn.freebsd.org/changeset/base/222537 Log: ?Fix apparent garbage in the message buffer. ?While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix ?scrambled console output, the message buffer and syslog were still getting ?log messages one character at a time. ?While all of the characters still ?made it into the log (courtesy of atomic operations), they were often ?interleaved when there were multiple threads writing to the buffer at the ?same time. This seems to panic my box with lock msgbuf 0xfe0127e0 already initialized. Unfortunately, though I booted with a fresh CURRENT this morning successfully, both /boot/kernel and /boot/kernel.old give this panic. To add insult to injury, when the kernel drops into the debugger, my keyboard input no longer works so I can't get a stack, etc. Uh-oh! So: 1) Is there anything else I can do to help debug this? 2) how can I resurrect this box without a reinstall? I will try to repro on a virtual machine so I have a snapshot to come back to. My guess is that this is an issue with the message buffer reinitialization path. lock_init() (called by mtx_init()) has an assert to make sure that the lock is initialized, and that is just a flag check. Since the spin lock is part of the message buffer structure, if it is held over from a previous boot, the LO_INITIALIZED flag may still be set. Try power cycling the machine. If it is an issue with re-initialization, that should clear the memory and allow you to boot. Hmm, apparently my previous presses of the power button weren't long enough. I let it sit off for 20 seconds and it boots okay now. Thanks, matthew My testing has been with VMs (under Xen), so the reinit path has probably not been tested as fully as it should have been. Sorry about that! As for the debugger, that's another issue altogether. It does work for me, but then again if the spin lock initialization is broken for the message buffer that may affect things. Try a cold boot and see if that helps. If so, I think we can probably just bzero the mutex in msgbuf_reinit() and that will fix things. Ken -- Kenneth Merry k...@freebsd.org ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r222537 - in head/sys: kern sys
On Tue, May 31, 2011 at 15:02:37 -0700, m...@freebsd.org wrote: On Tue, May 31, 2011 at 2:46 PM, Kenneth D. Merry k...@freebsd.org wrote: On Tue, May 31, 2011 at 14:00:18 -0700, m...@freebsd.org wrote: On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote: Author: ken Date: Tue May 31 17:29:58 2011 New Revision: 222537 URL: http://svn.freebsd.org/changeset/base/222537 Log: ?Fix apparent garbage in the message buffer. ?While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix ?scrambled console output, the message buffer and syslog were still getting ?log messages one character at a time. ?While all of the characters still ?made it into the log (courtesy of atomic operations), they were often ?interleaved when there were multiple threads writing to the buffer at the ?same time. This seems to panic my box with lock msgbuf 0xfe0127e0 already initialized. Unfortunately, though I booted with a fresh CURRENT this morning successfully, both /boot/kernel and /boot/kernel.old give this panic. To add insult to injury, when the kernel drops into the debugger, my keyboard input no longer works so I can't get a stack, etc. Uh-oh! So: 1) Is there anything else I can do to help debug this? 2) how can I resurrect this box without a reinstall? I will try to repro on a virtual machine so I have a snapshot to come back to. My guess is that this is an issue with the message buffer reinitialization path. ?lock_init() (called by mtx_init()) has an assert to make sure that the lock is initialized, and that is just a flag check. Since the spin lock is part of the message buffer structure, if it is held over from a previous boot, the LO_INITIALIZED flag may still be set. Try power cycling the machine. ?If it is an issue with re-initialization, that should clear the memory and allow you to boot. Hmm, apparently my previous presses of the power button weren't long enough. I let it sit off for 20 seconds and it boots okay now. Okay, so it probably is the re-initialization code. Can you try this patch and see if it survives a warm boot? I also changed the initialization path, so we don't get tripped up by garbage left in memory. Also, does the debugger work now that it has booted successfully? Thanks, Ken -- Kenneth Merry k...@freebsd.org Index: subr_msgbuf.c === --- subr_msgbuf.c (revision 222537) +++ subr_msgbuf.c (working copy) @@ -61,6 +61,7 @@ mbp-msg_magic = MSG_MAGIC; mbp-msg_lastpri = -1; mbp-msg_needsnl = 0; + bzero(mbp-msg_lock, sizeof(mbp-msg_lock)); mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); } @@ -95,6 +96,7 @@ mbp-msg_lastpri = -1; /* Assume that the old message buffer didn't end in a newline. */ mbp-msg_needsnl = 1; + bzero(mbp-msg_lock, sizeof(mbp-msg_lock)); mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); } ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r222537 - in head/sys: kern sys
On Tue, May 31, 2011 at 3:11 PM, Kenneth D. Merry k...@freebsd.org wrote: On Tue, May 31, 2011 at 15:02:37 -0700, m...@freebsd.org wrote: On Tue, May 31, 2011 at 2:46 PM, Kenneth D. Merry k...@freebsd.org wrote: On Tue, May 31, 2011 at 14:00:18 -0700, m...@freebsd.org wrote: On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote: Author: ken Date: Tue May 31 17:29:58 2011 New Revision: 222537 URL: http://svn.freebsd.org/changeset/base/222537 Log: ?Fix apparent garbage in the message buffer. ?While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix ?scrambled console output, the message buffer and syslog were still getting ?log messages one character at a time. ?While all of the characters still ?made it into the log (courtesy of atomic operations), they were often ?interleaved when there were multiple threads writing to the buffer at the ?same time. This seems to panic my box with lock msgbuf 0xfe0127e0 already initialized. Unfortunately, though I booted with a fresh CURRENT this morning successfully, both /boot/kernel and /boot/kernel.old give this panic. To add insult to injury, when the kernel drops into the debugger, my keyboard input no longer works so I can't get a stack, etc. Uh-oh! So: 1) Is there anything else I can do to help debug this? 2) how can I resurrect this box without a reinstall? I will try to repro on a virtual machine so I have a snapshot to come back to. My guess is that this is an issue with the message buffer reinitialization path. ?lock_init() (called by mtx_init()) has an assert to make sure that the lock is initialized, and that is just a flag check. Since the spin lock is part of the message buffer structure, if it is held over from a previous boot, the LO_INITIALIZED flag may still be set. Try power cycling the machine. ?If it is an issue with re-initialization, that should clear the memory and allow you to boot. Hmm, apparently my previous presses of the power button weren't long enough. I let it sit off for 20 seconds and it boots okay now. Okay, so it probably is the re-initialization code. Can you try this patch and see if it survives a warm boot? I also changed the initialization path, so we don't get tripped up by garbage left in memory. This patch survived a warm reboot (shutdown -r now). Also, does the debugger work now that it has booted successfully? The debugger (or rather, my keyboard in the debugger) works on a successful boot. I used sysctl debug.kdb.enter=1 to test it. Thanks, matthew ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org