Re: svn commit: r222537 - in head/sys: kern sys

2011-06-03 Thread Bruce Evans

On Thu, 2 Jun 2011, Kenneth D. Merry wrote:


On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote:

On Wed, 1 Jun 2011, Andriy Gapon wrote:


on 31/05/2011 20:29 Kenneth D. Merry said the following:

+   mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);


Sorry that I didn't gather myself together for a review before this change
got
actually committed.
Do you see any reason not to make this spinlock recursive?


It does need to be recursive to avoid deadlock (as I described), but this
prevents it actually working as a lock.  I think the new code needs it to
work as a lock.


I see reasons why it must not exist.  The message buffer code cannot use
any normal locking, and was carefully written to avoid doing so.


I am a little bit worried about exotic situations like receiving an NMI
in the
middle of printing and wanting to print in the NMI context, or similar
things
that penetrate contexts with disabled interrupts - e.g. Machine Check
Exception.
Also it's not clear to me if there won't any bigger damage in the
situations
like those described above.

P.S. I have been thinking about fixing the problem in a different fashion,
via
reserving portions of dmesg buffer for a whole message using CAS:
http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html


Something like this might work.

PRINTF_BUFR size should not exist either, especially now that it is
much more complicated and broken.  Here is some of my old mail about
adding (necessarily not normal) locking to to printf.  No one replied,
so I never finished this :-(.


That particular solution doesn't lock the normal kernel printf path, and so
won't fix what we're trying to fix.  (We've got lots of disks in each
system, and when there are various SES events and disks arriving and
departing, there are multiple kernel contexts doing printfs simultaneously,
and that results in scrambled dmesg output.)


But it does.  PRINTF_BUFR_SIZE (at least used to) only be applied to
vprintf(), but my printf_lock() is applied to all calls to kvprintf()
that set TOLOG.  (If I missed some, this takes 2 lines each to fix.)
The calls in tprintf() and log() have comments saying that this is
intentional to serialize msgbuf, though I fear this might not be safe
for at least tprintf(), since it may block unboundedly.  But even when
it blocks unboundedly, nothing really bad happens: the output just
becomes very slow due to everything waiting for the timeout until the
blockage clears, and may be interleaved.  Recursion gives similar
behaviour, except it takes unrecursing to clear the blockage.  I
consider the very slow output to be a feature.  Unbounded blockages
and recursion are bugs and the slow output makes them more obvious.


I think the goals should be:

- console output, syslog output, and dmesg output are not scrambled.
  (i.e. individual kernel printfs make it out atomically, or at least
  with a certain granularity, like PRINTF_BUFR_SIZE.)


Console output includes debugger output.  This must be able to break into
any otherwise-atomic kernel printf.  Possibly similarly for NMIs (if they
print.  They really shouldn't print, but they at least used to get this
horribly wrong from the beginning on i386).  Similarly for certain (non
maskable) traps.  These really shouldn't happen while we're printing, but
if they do then they may need to panic and then print for things like
machine check exceptions.

Debugger output is the easiest way of showing why printf cannot use
any normal locking.  There is an option (that I don't like or use)
which supports debugger output going through normal printf.  I think
the message buffer is used for this.  So message buffer accesses cannout
use normal locking either.

Apart from this, I don't know any reason why a printf or even a msgbuf
write can't wait for a while (a very long while if the output is to a
slow device -- don't forget to test with a 50 bps serial console) until
any previous printfs complete.  My simple serialization gives this if
the other printfs are on other CPUs, and possibly if they are on other
ithreads (the simple serialization doesn't give up control for simplicity,
since it is hard to tell when it is safe to give up control, but
preemption while we are spinning might get back to the other printfs).
And when the printf is recursive, there is no way back to the other
printfs, so we must proceed.  We could proceed immediately, but my
simple serialization waits in this case too for simplicity.  This case
should be even rarer than waiting too long for another CPU.


- Can be called by any kernel routine (i.e. doesn't sleep)


Some complications for tprintf and uprintf (and maybe ordinary printf if
it is connected to a pty).  I think we already have the necessary
complications.  Something involving printing only to the message buffer
in the non-sleeping part.


- Offers a string at a time interface.


I don't provide this.  So printf's a character or word or 2 at a time
may be interleaved.  It is 

Re: svn commit: r222537 - in head/sys: kern sys

2011-06-03 Thread Kenneth D. Merry
On Fri, Jun 03, 2011 at 20:47:00 +1000, Bruce Evans wrote:
 On Thu, 2 Jun 2011, Kenneth D. Merry wrote:
 
 On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote:
 On Wed, 1 Jun 2011, Andriy Gapon wrote:
 
 on 31/05/2011 20:29 Kenneth D. Merry said the following:
 + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);
 
 Sorry that I didn't gather myself together for a review before this 
 change
 got
 actually committed.
 Do you see any reason not to make this spinlock recursive?
 
 It does need to be recursive to avoid deadlock (as I described), but this
 prevents it actually working as a lock.  I think the new code needs it to
 work as a lock.

Yes, it does.

 I see reasons why it must not exist.  The message buffer code cannot use
 any normal locking, and was carefully written to avoid doing so.
 
 I am a little bit worried about exotic situations like receiving an NMI
 in the
 middle of printing and wanting to print in the NMI context, or similar
 things
 that penetrate contexts with disabled interrupts - e.g. Machine Check
 Exception.
 Also it's not clear to me if there won't any bigger damage in the
 situations
 like those described above.
 
 P.S. I have been thinking about fixing the problem in a different 
 fashion,
 via
 reserving portions of dmesg buffer for a whole message using CAS:
 http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html
 
 Something like this might work.
 
 PRINTF_BUFR size should not exist either, especially now that it is
 much more complicated and broken.  Here is some of my old mail about
 adding (necessarily not normal) locking to to printf.  No one replied,
 so I never finished this :-(.
 
 That particular solution doesn't lock the normal kernel printf path, and so
 won't fix what we're trying to fix.  (We've got lots of disks in each
 system, and when there are various SES events and disks arriving and
 departing, there are multiple kernel contexts doing printfs simultaneously,
 and that results in scrambled dmesg output.)
 
 But it does.  PRINTF_BUFR_SIZE (at least used to) only be applied to
 vprintf(), but my printf_lock() is applied to all calls to kvprintf()
 that set TOLOG.  (If I missed some, this takes 2 lines each to fix.)
 The calls in tprintf() and log() have comments saying that this is
 intentional to serialize msgbuf, though I fear this might not be safe
 for at least tprintf(), since it may block unboundedly.  But even when
 it blocks unboundedly, nothing really bad happens: the output just
 becomes very slow due to everything waiting for the timeout until the
 blockage clears, and may be interleaved.  Recursion gives similar
 behaviour, except it takes unrecursing to clear the blockage.  I
 consider the very slow output to be a feature.  Unbounded blockages
 and recursion are bugs and the slow output makes them more obvious.

Ahh.  It wasn't 100% clear from looking at the patch which calls had the
lock applied.  In this case at least, it's skipping the lock when the
priority is -1.  Is this tprintf()?

%  /*
%   * Warn that a system table is full.
% @@ -198,5 +221,9 @@
%   pca.flags = flags;
%   va_start(ap, fmt);
% + if (pri != -1)
% + printf_lock();  /* To serialize msgbuf.  XXX: long-held
lock? */
%   kvprintf(fmt, putchar, pca, 10, ap);
% + if (pri != -1)
% + printf_unlock();
%   va_end(ap);
%   if (sess != NULL)
%

 I think the goals should be:
 
 - console output, syslog output, and dmesg output are not scrambled.
   (i.e. individual kernel printfs make it out atomically, or at least
   with a certain granularity, like PRINTF_BUFR_SIZE.)
 
 Console output includes debugger output.  This must be able to break into
 any otherwise-atomic kernel printf.  Possibly similarly for NMIs (if they
 print.  They really shouldn't print, but they at least used to get this
 horribly wrong from the beginning on i386).  Similarly for certain (non
 maskable) traps.  These really shouldn't happen while we're printing, but
 if they do then they may need to panic and then print for things like
 machine check exceptions.
 
 Debugger output is the easiest way of showing why printf cannot use
 any normal locking.  There is an option (that I don't like or use)
 which supports debugger output going through normal printf.  I think
 the message buffer is used for this.  So message buffer accesses cannout
 use normal locking either.
 
 Apart from this, I don't know any reason why a printf or even a msgbuf
 write can't wait for a while (a very long while if the output is to a
 slow device -- don't forget to test with a 50 bps serial console) until
 any previous printfs complete.  My simple serialization gives this if
 the other printfs are on other CPUs, and possibly if they are on other
 ithreads (the simple serialization doesn't give up control for simplicity,
 since it is hard to tell when it is safe to give up control, but
 preemption while we are spinning might get back to the other printfs).
 And when 

Re: svn commit: r222537 - in head/sys: kern sys

2011-06-03 Thread Bruce Evans

On Fri, 3 Jun 2011, Kenneth D. Merry wrote:


On Fri, Jun 03, 2011 at 20:47:00 +1000, Bruce Evans wrote:

On Thu, 2 Jun 2011, Kenneth D. Merry wrote:



That particular solution doesn't lock the normal kernel printf path, and so
won't fix what we're trying to fix.  (We've got lots of disks in each
system, and when there are various SES events and disks arriving and
departing, there are multiple kernel contexts doing printfs simultaneously,
and that results in scrambled dmesg output.)


But it does.  PRINTF_BUFR_SIZE (at least used to) only be applied to
vprintf(), but my printf_lock() is applied to all calls to kvprintf()
that set TOLOG.  (If I missed some, this takes 2 lines each to fix.)
...


Ahh.  It wasn't 100% clear from looking at the patch which calls had the
lock applied.  In this case at least, it's skipping the lock when the
priority is -1.  Is this tprintf()?

%  /*
%   * Warn that a system table is full.
% @@ -198,5 +221,9 @@
%   pca.flags = flags;
%   va_start(ap, fmt);
% + if (pri != -1)
% + printf_lock();  /* To serialize msgbuf.  XXX: long-held
lock? */
%   kvprintf(fmt, putchar, pca, 10, ap);
% + if (pri != -1)
% + printf_unlock();
%   va_end(ap);
%   if (sess != NULL)
%


Hmm.  -1 means `not TOLOG' (see earlier in tprintf()), and I only apply
printf_lock() if the output is kernel-only _or_ uses TOLOG (cases that
go TOTTY only might need some serialization, but they can probably
block and I don't want to apply the low-level printf_lock() to them).

Only tprintf() has this magic -1 priority, and this magic never used,
at least in the FreeBSD /sys tree and matching '[^a-z]tprintf('.  There
are just 4 calls to tprintf(), and they are all related to nfs and all
use pri = LOG_INFO.  So tprintf() always goes TOLOG in practice.

uprintf() is relatively popular, and never goes TOLOG.  The only obvious
difference between uprintf() and tprintf() with pri -1 is that the
former users curthread-td_proc while the latter takes a proc pointer.
The proc pointer is constructed from a thread pointer passed around
by nfs so I can't see what it is, but I bet it is always curthread-td_proc
or perhaps NULL.  If it is NULL, then tprintf() doesn't print to a tty
and reduces to a buggy version of log() if pri != -1 (log() goes TOCONS
if !log_open, while the reduced tprintf() always goes TOLOG); if pri
== -1, then tprintf() prints nowhere (and hopefully doesn't get confused
doing this).


...
Apart from this, I don't know any reason why a printf or even a msgbuf
write can't wait for a while (a very long while if the output is to a
slow device -- don't forget to test with a 50 bps serial console) until
any previous printfs complete.  My simple serialization gives this if
the other printfs are on other CPUs, and possibly if they are on other
ithreads (the simple serialization doesn't give up control for simplicity,
since it is hard to tell when it is safe to give up control, but
preemption while we are spinning might get back to the other printfs).
And when the printf is recursive, there is no way back to the other
printfs, so we must proceed.  We could proceed immediately, but my
simple serialization waits in this case too for simplicity.  This case
should be even rarer than waiting too long for another CPU.


Hopefully the delay won't be too bad for most consumers.  It'll take some
testing to see what happens.


Certainly the cases of TOLOG only, and TOCONS to a synchronous console,
will be very fast.


I'll try to get some time to run some tests with your locking code and see
how it works.


It will be good to test on a bigger system that actually needs lots of
kernel printfs.

Bruce
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r222537 - in head/sys: kern sys

2011-06-02 Thread Kenneth D. Merry
On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote:
 On Wed, 1 Jun 2011, Andriy Gapon wrote:
 
 on 31/05/2011 20:29 Kenneth D. Merry said the following:
 +   mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);
 
 Sorry that I didn't gather myself together for a review before this change 
 got
 actually committed.
 Do you see any reason not to make this spinlock recursive?
 
 I see reasons why it must not exist.  The message buffer code cannot use
 any normal locking, and was carefully written to avoid doing so.
 
 I am a little bit worried about exotic situations like receiving an NMI 
 in the
 middle of printing and wanting to print in the NMI context, or similar 
 things
 that penetrate contexts with disabled interrupts - e.g. Machine Check 
 Exception.
 Also it's not clear to me if there won't any bigger damage in the 
 situations
 like those described above.
 
 P.S. I have been thinking about fixing the problem in a different fashion, 
 via
 reserving portions of dmesg buffer for a whole message using CAS:
 http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html
 
 Something like this might work.
 
 PRINTF_BUFR size should not exist either, especially now that it is
 much more complicated and broken.  Here is some of my old mail about
 adding (necessarily not normal) locking to to printf.  No one replied,
 so I never finished this :-(.

That particular solution doesn't lock the normal kernel printf path, and so
won't fix what we're trying to fix.  (We've got lots of disks in each
system, and when there are various SES events and disks arriving and
departing, there are multiple kernel contexts doing printfs simultaneously,
and that results in scrambled dmesg output.)

I think the goals should be:

 - console output, syslog output, and dmesg output are not scrambled.
   (i.e. individual kernel printfs make it out atomically, or at least
   with a certain granularity, like PRINTF_BUFR_SIZE.)

 - Can be called by any kernel routine (i.e. doesn't sleep)

 - Offers a string at a time interface.

 - Does the right thing for log messages (priority, etc.)

It looks like we should add does not use normal locking to the list as
well.

As Justin mentioned, we started off down the path of using atomic
operations, and then figured out that wouldn't fully work.  Then we
discussed doing a per-CPU message buffer, with each message tagged with a
sequence number, and having the reader re-serialize the messages in the
right order.  The complexity started to get large enough that we decided
that using a spin lock would be a much easier approach.

cnputs() already uses a spinlock, so we're no worse off than before from a
locking standpoint.

We could perhaps make the message buffer mutex recursive and set the
MTX_NOWITNESS flag as well, that might make things a little better from a
side effect standpoint.

If we can come up with a solution that meets the above goals, Justin and I
would be happy to help implement it.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r222537 - in head/sys: kern sys

2011-06-01 Thread Andriy Gapon
on 31/05/2011 20:29 Kenneth D. Merry said the following:
 + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);


Sorry that I didn't gather myself together for a review before this change got
actually committed.
Do you see any reason not to make this spinlock recursive?

I am a little bit worried about exotic situations like receiving an NMI in the
middle of printing and wanting to print in the NMI context, or similar things
that penetrate contexts with disabled interrupts - e.g. Machine Check Exception.
Also it's not clear to me if there won't any bigger damage in the situations
like those described above.

P.S. I have been thinking about fixing the problem in a different fashion, via
reserving portions of dmesg buffer for a whole message using CAS:
http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html

-- 
Andriy Gapon
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r222537 - in head/sys: kern sys

2011-06-01 Thread Bruce Evans

On Wed, 1 Jun 2011, Andriy Gapon wrote:


on 31/05/2011 20:29 Kenneth D. Merry said the following:

+   mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);


Sorry that I didn't gather myself together for a review before this change got
actually committed.
Do you see any reason not to make this spinlock recursive?


I see reasons why it must not exist.  The message buffer code cannot use
any normal locking, and was carefully written to avoid doing so.


I am a little bit worried about exotic situations like receiving an NMI in the
middle of printing and wanting to print in the NMI context, or similar things
that penetrate contexts with disabled interrupts - e.g. Machine Check Exception.
Also it's not clear to me if there won't any bigger damage in the situations
like those described above.

P.S. I have been thinking about fixing the problem in a different fashion, via
reserving portions of dmesg buffer for a whole message using CAS:
http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html


Something like this might work.

PRINTF_BUFR size should not exist either, especially now that it is
much more complicated and broken.  Here is some of my old mail about
adding (necessarily not normal) locking to to printf.  No one replied,
so I never finished this :-(.

%%%

From b...@optusnet.com.au Mon Jun 15 16:18:03 2009 +1000

Date: Mon, 15 Jun 2009 16:17:59 +1000 (EST)
From: Bruce Evans b...@optusnet.com.au
X-X-Sender: b...@besplex.bde.org
To: Bruce Evans b...@optusnet.com.au
cc: Sam Leffler s...@freebsd.org, Ed Schouten e...@freebsd.org,
src-committ...@freebsd.org, svn-src-...@freebsd.org,
svn-src-head@FreeBSD.org
Subject: Re: svn commit: r194204 - in head/sys: amd64/conf i386/conf
In-Reply-To: 20090615114142.b...@besplex.bde.org
Message-ID: 20090615153040.r1...@besplex.bde.org
References: 200906141801.n5ei1zti056...@svn.freebsd.org 
4a356a0f.3050...@freebsd.org
 20090615075134.k24...@delplex.bde.org 4a359aa6.7010...@freebsd.org
 20090615114142.b...@besplex.bde.org
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Status: RO
X-Status: A
X-Keywords: 
X-UID: 913


On Mon, 15 Jun 2009, Bruce Evans wrote:


On Sun, 14 Jun 2009, Sam Leffler wrote:


I'd love to see someone properly lock printf/console output instead of 
kludges like PRINTF_BUFR_SIZE.


Almost done (for FreeBSD-~5.2): non-production version with lots of debugging
cruft:


This version for RELENG_7 seems to work as intended.  jhb should look at
its atomic ops.

I only have 1 normal source of interleaved messages -- the messages about
waiting for system processes on shutdown on SMP systems -- and this seems
to be fixed.

The patch applies even without removing PRINTF_BUFR_SIZE or cnputs(), and
works, at least if PRINTF_BUFR_SIZE and the per-cpu buffer are not enabled.

Note that it serializes accesses to the message buffer too.  The locks
automatically cover the message buffer if they are applied, and are
applied to all cases that set TOLOG, though this might be too dangerous
for tprintf() since printing to a user's console shouldn't block other
parts of the kernel.  printf() to a redirected console is more careful
about this -- it extracts messages from the message buffer and can handle
long delays provided the message buffer doesn't wrap around.  This
might be the best method for all cases.  Accesses to the message buffer
need to be serialized for this to work properly, and serializing ony
accesses to the message buffer is easier since there are no hardware
delays to worry about -- strict locking with no timeouts will work
provided the halting problem is solvable for kvprintf() to the message
buffer alone.

No attempt is made to buffer messages across printf()s.  Neither does
PRINTF_BUFR_SIZE.  PRINTF_BUFR_SIZE uses an auto buffer so it obviously
has to flush at the end of each printf.  Always extracting from the
message buffer might be the best way to handle this too.  Normally you
would extract complete lines, but there must be a timeout to handle
partial lines.

The 1 second timeout here isn't best if the timeout is needed for
avoiding deadlock, as may happen when a printf() is interrupted by
panic().  panic could reasonably() blow open all locks including
printf_lock(), but this is not easy to do without letting other CPUs
proceed and/or complicating the usual case.

Very slow terminals, say 50 bps, need a timeout of say 16 seconds so
that the can print an 80-character line without being interrupted.

% Index: subr_prf.c
% ===
% RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v
% retrieving revision 1.130.2.1
% diff -u -2 -r1.130.2.1 subr_prf.c
% --- subr_prf.c21 Jan 2009 00:26:45 -  1.130.2.1
% +++ subr_prf.c15 Jun 2009 05:32:03 -
% @@ -112,4 +112,27 @@
%  always_console_output, 0, Always output to console despite TIOCCONS.);
% 
% +static int	printf_lockcount;

% +
% +static void
% +printf_lock(void)
% 

Re: svn commit: r222537 - in head/sys: kern sys

2011-06-01 Thread Justin T. Gibbs

On 6/1/11 12:07 AM, Andriy Gapon wrote:

 on 31/05/2011 20:29 Kenneth D. Merry said the following:
 + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);


 Sorry that I didn't gather myself together for a review before this 

change got

 actually committed.
 Do you see any reason not to make this spinlock recursive?

 I am a little bit worried about exotic situations like receiving an 

NMI in the
 middle of printing and wanting to print in the NMI context, or similar 

things
 that penetrate contexts with disabled interrupts - e.g. Machine Check 

Exception.
 Also it's not clear to me if there won't any bigger damage in the 

situations

 like those described above.

 P.S. I have been thinking about fixing the problem in a different 

fashion, via

 reserving portions of dmesg buffer for a whole message using CAS:
 http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html


This is actually where we started (although unfortunately independently
since neither Ken nor I saw your previous email).  As you say in your
message form last April, integrating this type of strategy into the
message buffer code is problematic.  You need to track the in-progress
writes (i.e. holes in the message buffer) so that only fully committed
data is printed.  I'm sure this can be done with per-cpu data structures
and a heavy dose of atomic ops, but after an hour of discussion, we wound
up with a ton of complexity and little confidence our design was sound.
Even using a spinlock is a lot more complex than it would at first appear.
Ken spent almost 2 weeks completing the work and ferreting out all of the
corner cases.

This bug has been in FreeBSD since SMP was added.  While our solution may
not be ideal, it is better than what was there before.  Committing it seems
to have also reignited the discussion on how to implement an even better
solution.  If there are concrete ideas on how to improve this code further,
Ken and I will participate in the work to get them into -current.

--
Justin

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r222537 - in head/sys: kern sys

2011-05-31 Thread mdf
On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote:
 Author: ken
 Date: Tue May 31 17:29:58 2011
 New Revision: 222537
 URL: http://svn.freebsd.org/changeset/base/222537

 Log:
  Fix apparent garbage in the message buffer.

  While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix
  scrambled console output, the message buffer and syslog were still getting
  log messages one character at a time.  While all of the characters still
  made it into the log (courtesy of atomic operations), they were often
  interleaved when there were multiple threads writing to the buffer at the
  same time.

This seems to panic my box with lock msgbuf 0xfe0127e0
already initialized.

Unfortunately, though I booted with a fresh CURRENT this morning
successfully, both /boot/kernel and /boot/kernel.old give this panic.
To add insult to injury, when the kernel drops into the debugger, my
keyboard input no longer works so I can't get a stack, etc.

So:

1) Is there anything else I can do to help debug this?
2) how can I resurrect this box without a reinstall?

I will try to repro on a virtual machine so I have a snapshot to come back to.

Thanks,
matthew

  This fixes message buffer accesses to use buffering logic as well, so that
  strings that are less than PRINTF_BUFR_SIZE will be put into the message
  buffer atomically.  So now dmesg output should look the same as console
  output.

  subr_msgbuf.c:                Convert most message buffer calls to use a new 
 spin
                        lock instead of atomic variables in some places.

                        Add a new routine, msgbuf_addstr(), that adds a
                        NUL-terminated string to a message buffer.  This
                        takes a priority argument, which allows us to
                        eliminate some races (at least in the the string
                        at a time case) that are present in the
                        implementation of msglogchar().  (dangling and
                        lastpri are static variables, and are subject to
                        races when multiple callers are present.)

                        msgbuf_addstr() also allows the caller to request
                        that carriage returns be stripped out of the
                        string.  This matches the behavior of msglogchar(),
                        but in testing so far it doesn't appear that any
                        newlines are being stripped out.  So the carriage
                        return removal functionality may be a candidate
                        for removal later on if further analysis shows
                        that it isn't necessary.

  subr_prf.c:           Add a new msglogstr() routine that calls
                        msgbuf_logstr().

                        Rename putcons() to putbuf().  This now handles
                        buffered output to the message log as well as
                        the console.  Also, remove the logic in putcons()
                        (now putbuf()) that added a carriage return before
                        a newline.  The console path was the only path that
                        needed it, and cnputc() (called by cnputs())
                        already adds a carriage return.  So this
                        duplication resulted in kernel-generated console
                        output lines ending in '\r''\r''\n'.

                        Refactor putchar() to handle the new buffering
                        scheme.

                        Add buffering to log().

                        Change log_console() to use msglogstr() instead of
                        msglogchar().  Don't add extra newlines by default
                        in log_console().  Hide that behavior behind a
                        tunable/sysctl (kern.log_console_add_linefeed) for
                        those who would like the old behavior.  The old
                        behavior led to the insertion of extra newlines
                        for log output for programs that print out a
                        string, and then a trailing newline on a separate
                        write.  (This is visible with dmesg -a.)

  msgbuf.h:             Add a prototype for msgbuf_addstr().

                        Add three new fields to struct msgbuf, msg_needsnl,
                        msg_lastpri and msg_lock.  The first two are needed
                        for log message functionality previously handled
                        by msglogchar().  (Which is still active if
                        buffering isn't enabled.)

                        Include sys/lock.h and sys/mutex.h for the new
                        mutex.

  Reviewed by:  gibbs

 Modified:
  head/sys/kern/subr_msgbuf.c
  head/sys/kern/subr_prf.c
  head/sys/sys/msgbuf.h

 Modified: head/sys/kern/subr_msgbuf.c
 

Re: svn commit: r222537 - in head/sys: kern sys

2011-05-31 Thread Garrett Cooper
On Tue, May 31, 2011 at 2:00 PM,  m...@freebsd.org wrote:
 On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote:
 Author: ken
 Date: Tue May 31 17:29:58 2011
 New Revision: 222537
 URL: http://svn.freebsd.org/changeset/base/222537

 Log:
  Fix apparent garbage in the message buffer.

  While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix
  scrambled console output, the message buffer and syslog were still getting
  log messages one character at a time.  While all of the characters still
  made it into the log (courtesy of atomic operations), they were often
  interleaved when there were multiple threads writing to the buffer at the
  same time.

 This seems to panic my box with lock msgbuf 0xfe0127e0
 already initialized.

 Unfortunately, though I booted with a fresh CURRENT this morning
 successfully, both /boot/kernel and /boot/kernel.old give this panic.
 To add insult to injury, when the kernel drops into the debugger, my
 keyboard input no longer works so I can't get a stack, etc.

 So:

 1) Is there anything else I can do to help debug this?

1. sysctl debug.debugger_on_panic=1 ?

 2) how can I resurrect this box without a reinstall?

2. Best way is to probably to use the bsdinstall CD, use the LiveCD
mode, setup the system as usual (mount /, mount devfs, chroot, mount
-a), rewind to an earlier version of svn (shouldn't be too hard if you
run /etc/rc.d/network restart from inside the chroot), rebuild the
kernel (and potentially world), and install the kernel to the chroot,
then exit and reboot (this is a method I picked up from installing
Gentoo Linux multiple times, but it should work for FreeBSD as well).

This is part of the reason why I'm an avid using of make installkernel
INSTKERNNAME=$KERNCONF.$SVN_REVISION , symlink /boot/kernel to the
latest one I want to boot, and I only go through every once in a blue
moon to reap the kernels I don't need anymore -- I don't know until
after a few weeks soak on my workstation whether or not a regression
is present in the kernel.

 I will try to repro on a virtual machine so I have a snapshot to come back to.

HTH!
-Garrett
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r222537 - in head/sys: kern sys

2011-05-31 Thread Kenneth D. Merry
On Tue, May 31, 2011 at 14:00:18 -0700, m...@freebsd.org wrote:
 On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote:
  Author: ken
  Date: Tue May 31 17:29:58 2011
  New Revision: 222537
  URL: http://svn.freebsd.org/changeset/base/222537
 
  Log:
  ?Fix apparent garbage in the message buffer.
 
  ?While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix
  ?scrambled console output, the message buffer and syslog were still getting
  ?log messages one character at a time. ?While all of the characters still
  ?made it into the log (courtesy of atomic operations), they were often
  ?interleaved when there were multiple threads writing to the buffer at the
  ?same time.
 
 This seems to panic my box with lock msgbuf 0xfe0127e0
 already initialized.
 
 Unfortunately, though I booted with a fresh CURRENT this morning
 successfully, both /boot/kernel and /boot/kernel.old give this panic.
 To add insult to injury, when the kernel drops into the debugger, my
 keyboard input no longer works so I can't get a stack, etc.

Uh-oh!

 So:
 
 1) Is there anything else I can do to help debug this?
 2) how can I resurrect this box without a reinstall?
 
 I will try to repro on a virtual machine so I have a snapshot to come back to.

My guess is that this is an issue with the message buffer reinitialization
path.  lock_init() (called by mtx_init()) has an assert to make sure that
the lock is initialized, and that is just a flag check.

Since the spin lock is part of the message buffer structure, if it is held
over from a previous boot, the LO_INITIALIZED flag may still be set.

Try power cycling the machine.  If it is an issue with re-initialization,
that should clear the memory and allow you to boot.

My testing has been with VMs (under Xen), so the reinit path has probably
not been tested as fully as it should have been.  Sorry about that!

As for the debugger, that's another issue altogether.  It does work for me,
but then again if the spin lock initialization is broken for the message
buffer that may affect things.

Try a cold boot and see if that helps.  If so, I think we can probably just
bzero the mutex in msgbuf_reinit() and that will fix things.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r222537 - in head/sys: kern sys

2011-05-31 Thread mdf
On Tue, May 31, 2011 at 2:46 PM, Kenneth D. Merry k...@freebsd.org wrote:
 On Tue, May 31, 2011 at 14:00:18 -0700, m...@freebsd.org wrote:
 On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org wrote:
  Author: ken
  Date: Tue May 31 17:29:58 2011
  New Revision: 222537
  URL: http://svn.freebsd.org/changeset/base/222537
 
  Log:
  ?Fix apparent garbage in the message buffer.
 
  ?While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix
  ?scrambled console output, the message buffer and syslog were still getting
  ?log messages one character at a time. ?While all of the characters still
  ?made it into the log (courtesy of atomic operations), they were often
  ?interleaved when there were multiple threads writing to the buffer at the
  ?same time.

 This seems to panic my box with lock msgbuf 0xfe0127e0
 already initialized.

 Unfortunately, though I booted with a fresh CURRENT this morning
 successfully, both /boot/kernel and /boot/kernel.old give this panic.
 To add insult to injury, when the kernel drops into the debugger, my
 keyboard input no longer works so I can't get a stack, etc.

 Uh-oh!

 So:

 1) Is there anything else I can do to help debug this?
 2) how can I resurrect this box without a reinstall?

 I will try to repro on a virtual machine so I have a snapshot to come back 
 to.

 My guess is that this is an issue with the message buffer reinitialization
 path.  lock_init() (called by mtx_init()) has an assert to make sure that
 the lock is initialized, and that is just a flag check.

 Since the spin lock is part of the message buffer structure, if it is held
 over from a previous boot, the LO_INITIALIZED flag may still be set.

 Try power cycling the machine.  If it is an issue with re-initialization,
 that should clear the memory and allow you to boot.

Hmm, apparently my previous presses of the power button weren't long
enough.  I let it sit off for 20 seconds and it boots okay now.

Thanks,
matthew

 My testing has been with VMs (under Xen), so the reinit path has probably
 not been tested as fully as it should have been.  Sorry about that!

 As for the debugger, that's another issue altogether.  It does work for me,
 but then again if the spin lock initialization is broken for the message
 buffer that may affect things.

 Try a cold boot and see if that helps.  If so, I think we can probably just
 bzero the mutex in msgbuf_reinit() and that will fix things.

 Ken
 --
 Kenneth Merry
 k...@freebsd.org

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r222537 - in head/sys: kern sys

2011-05-31 Thread Kenneth D. Merry
On Tue, May 31, 2011 at 15:02:37 -0700, m...@freebsd.org wrote:
 On Tue, May 31, 2011 at 2:46 PM, Kenneth D. Merry k...@freebsd.org wrote:
  On Tue, May 31, 2011 at 14:00:18 -0700, m...@freebsd.org wrote:
  On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org 
  wrote:
   Author: ken
   Date: Tue May 31 17:29:58 2011
   New Revision: 222537
   URL: http://svn.freebsd.org/changeset/base/222537
  
   Log:
   ?Fix apparent garbage in the message buffer.
  
   ?While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix
   ?scrambled console output, the message buffer and syslog were still 
   getting
   ?log messages one character at a time. ?While all of the characters still
   ?made it into the log (courtesy of atomic operations), they were often
   ?interleaved when there were multiple threads writing to the buffer at 
   the
   ?same time.
 
  This seems to panic my box with lock msgbuf 0xfe0127e0
  already initialized.
 
  Unfortunately, though I booted with a fresh CURRENT this morning
  successfully, both /boot/kernel and /boot/kernel.old give this panic.
  To add insult to injury, when the kernel drops into the debugger, my
  keyboard input no longer works so I can't get a stack, etc.
 
  Uh-oh!
 
  So:
 
  1) Is there anything else I can do to help debug this?
  2) how can I resurrect this box without a reinstall?
 
  I will try to repro on a virtual machine so I have a snapshot to come back 
  to.
 
  My guess is that this is an issue with the message buffer reinitialization
  path. ?lock_init() (called by mtx_init()) has an assert to make sure that
  the lock is initialized, and that is just a flag check.
 
  Since the spin lock is part of the message buffer structure, if it is held
  over from a previous boot, the LO_INITIALIZED flag may still be set.
 
  Try power cycling the machine. ?If it is an issue with re-initialization,
  that should clear the memory and allow you to boot.
 
 Hmm, apparently my previous presses of the power button weren't long
 enough.  I let it sit off for 20 seconds and it boots okay now.

Okay, so it probably is the re-initialization code.  Can you try this patch
and see if it survives a warm boot?  I also changed the initialization
path, so we don't get tripped up by garbage left in memory.

Also, does the debugger work now that it has booted successfully?

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
Index: subr_msgbuf.c
===
--- subr_msgbuf.c   (revision 222537)
+++ subr_msgbuf.c   (working copy)
@@ -61,6 +61,7 @@
mbp-msg_magic = MSG_MAGIC;
mbp-msg_lastpri = -1;
mbp-msg_needsnl = 0;
+   bzero(mbp-msg_lock, sizeof(mbp-msg_lock));
mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);
 }
 
@@ -95,6 +96,7 @@
mbp-msg_lastpri = -1;
/* Assume that the old message buffer didn't end in a newline. */
mbp-msg_needsnl = 1;
+   bzero(mbp-msg_lock, sizeof(mbp-msg_lock));
mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);
 }
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org

Re: svn commit: r222537 - in head/sys: kern sys

2011-05-31 Thread mdf
On Tue, May 31, 2011 at 3:11 PM, Kenneth D. Merry k...@freebsd.org wrote:
 On Tue, May 31, 2011 at 15:02:37 -0700, m...@freebsd.org wrote:
 On Tue, May 31, 2011 at 2:46 PM, Kenneth D. Merry k...@freebsd.org wrote:
  On Tue, May 31, 2011 at 14:00:18 -0700, m...@freebsd.org wrote:
  On Tue, May 31, 2011 at 10:29 AM, Kenneth D. Merry k...@freebsd.org 
  wrote:
   Author: ken
   Date: Tue May 31 17:29:58 2011
   New Revision: 222537
   URL: http://svn.freebsd.org/changeset/base/222537
  
   Log:
   ?Fix apparent garbage in the message buffer.
  
   ?While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix
   ?scrambled console output, the message buffer and syslog were still 
   getting
   ?log messages one character at a time. ?While all of the characters 
   still
   ?made it into the log (courtesy of atomic operations), they were often
   ?interleaved when there were multiple threads writing to the buffer at 
   the
   ?same time.
 
  This seems to panic my box with lock msgbuf 0xfe0127e0
  already initialized.
 
  Unfortunately, though I booted with a fresh CURRENT this morning
  successfully, both /boot/kernel and /boot/kernel.old give this panic.
  To add insult to injury, when the kernel drops into the debugger, my
  keyboard input no longer works so I can't get a stack, etc.
 
  Uh-oh!
 
  So:
 
  1) Is there anything else I can do to help debug this?
  2) how can I resurrect this box without a reinstall?
 
  I will try to repro on a virtual machine so I have a snapshot to come 
  back to.
 
  My guess is that this is an issue with the message buffer reinitialization
  path. ?lock_init() (called by mtx_init()) has an assert to make sure that
  the lock is initialized, and that is just a flag check.
 
  Since the spin lock is part of the message buffer structure, if it is held
  over from a previous boot, the LO_INITIALIZED flag may still be set.
 
  Try power cycling the machine. ?If it is an issue with re-initialization,
  that should clear the memory and allow you to boot.

 Hmm, apparently my previous presses of the power button weren't long
 enough.  I let it sit off for 20 seconds and it boots okay now.

 Okay, so it probably is the re-initialization code.  Can you try this patch
 and see if it survives a warm boot?  I also changed the initialization
 path, so we don't get tripped up by garbage left in memory.

This patch survived a warm reboot (shutdown -r now).

 Also, does the debugger work now that it has booted successfully?

The debugger (or rather, my keyboard in the debugger) works on a
successful boot.  I used sysctl debug.kdb.enter=1 to test it.

Thanks,
matthew
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org