Re: panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread David Wolfskill
On Tue, Jul 21, 2015 at 03:21:16PM -0500, Eric van Gyzen wrote:
> ...
> >> So it looks like net swi, leaking some udp6 lock.
> > Curiouser and curiouser...  While I'm not taking any special pains to
> > avoid building IPv6, I'm not actively actually doing anything with it
> > (IPv6), either (for both the failing machine and my laptop).
> >
> > Once I'm back home, I should be able to poke around in ddb after
> > re-creating the panic, if that would be a useful thing for me to do (and
> > given some hints as to what to poke).
> >
> > Naturally, I'm also happy to change bits of sources, rebuild, and
> > smoke-test.
> >
> > A quick check from the SVN update output only shows r285710, r285711, and
> > r285740 in the range from (r285685,r285741] -- as the kernel running
> > r285685 had no known issues -- that touched sys/netinet6/*.
> 
> It's a multicast destination.  Maybe something is using mDNS?
> 
> Randall, does the test on line 406 of udp6_usrreq.c need to be inverted?
> 
> Eric
> 

  We have a winner!

FreeBSD freebeast.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1789  
r285741M/285741:1100077: Tue Jul 21 14:50:59 PDT 2015 
r...@freebeast.catwhisker.org:/common/S3/obj/usr/src/sys/GENERIC  amd64

freebeast(11.0-C)[3] cd /usr/src
freebeast(11.0-C)[4] svn diff sys/netinet
netinet/  netinet6/ 
freebeast(11.0-C)[4] svn diff sys/netinet*
Index: sys/netinet6/udp6_usrreq.c
===
--- sys/netinet6/udp6_usrreq.c  (revision 285741)
+++ sys/netinet6/udp6_usrreq.c  (working copy)
@@ -403,7 +403,7 @@
INP_RLOCK(last);
INP_INFO_RUNLOCK(pcbinfo);
UDP_PROBE(receive, NULL, last, ip6, last, uh);
-   if (udp6_append(last, m, off, &fromsa)) 
+   if (! udp6_append(last, m, off, &fromsa)) 
INP_RUNLOCK(last);
inp_lost:
return (IPPROTO_DONE);
freebeast(11.0-C)[5] 

Thanks! :-)

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpJj1IlX26b0.pgp
Description: PGP signature


Re: panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread Eric van Gyzen
On 07/21/2015 15:21, Eric van Gyzen wrote:
> On 07/21/2015 15:05, David Wolfskill wrote:
>> On Tue, Jul 21, 2015 at 10:28:32PM +0300, Konstantin Belousov wrote:
>>> ...
>>> Indeed, thank you.
>>> ithread_loop() at ithread_loop+0xa6/frame 0xfe083b9c0a70
>>> fork_exit() at fork_exit+0x84/frame 0xfe083b9c0ab0
>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe083b9c0ab0
>>> --- trap 0, rip = 0, rsp = 0xfe083b9c0b70, rbp = 0 ---
>>> suspending ithread with the following locks held:
>>> shared rw udpinp (udpinp) r = 3 (0xf80010c7d7b0) locked @ 
>>> /usr/src/sys/netinet6/in6_pcb.c:1174
>>> panic: witness_warn
>>> cpuid = 3
>>>
>>> So it looks like net swi, leaking some udp6 lock.
>> Curiouser and curiouser...  While I'm not taking any special pains to
>> avoid building IPv6, I'm not actively actually doing anything with it
>> (IPv6), either (for both the failing machine and my laptop).
>>
>> Once I'm back home, I should be able to poke around in ddb after
>> re-creating the panic, if that would be a useful thing for me to do (and
>> given some hints as to what to poke).
>>
>> Naturally, I'm also happy to change bits of sources, rebuild, and
>> smoke-test.
>>
>> A quick check from the SVN update output only shows r285710, r285711, and
>> r285740 in the range from (r285685,r285741] -- as the kernel running
>> r285685 had no known issues -- that touched sys/netinet6/*.
> It's a multicast destination.  Maybe something is using mDNS?

Blurf.  "I wonder if" it's a multicast destination.  (I need more chocolate.)

> Randall, does the test on line 406 of udp6_usrreq.c need to be inverted?
>
> Eric
>




signature.asc
Description: OpenPGP digital signature


Re: panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread Eric van Gyzen
On 07/21/2015 15:05, David Wolfskill wrote:
> On Tue, Jul 21, 2015 at 10:28:32PM +0300, Konstantin Belousov wrote:
>> ...
>> Indeed, thank you.
>> ithread_loop() at ithread_loop+0xa6/frame 0xfe083b9c0a70
>> fork_exit() at fork_exit+0x84/frame 0xfe083b9c0ab0
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe083b9c0ab0
>> --- trap 0, rip = 0, rsp = 0xfe083b9c0b70, rbp = 0 ---
>> suspending ithread with the following locks held:
>> shared rw udpinp (udpinp) r = 3 (0xf80010c7d7b0) locked @ 
>> /usr/src/sys/netinet6/in6_pcb.c:1174
>> panic: witness_warn
>> cpuid = 3
>>
>> So it looks like net swi, leaking some udp6 lock.
> Curiouser and curiouser...  While I'm not taking any special pains to
> avoid building IPv6, I'm not actively actually doing anything with it
> (IPv6), either (for both the failing machine and my laptop).
>
> Once I'm back home, I should be able to poke around in ddb after
> re-creating the panic, if that would be a useful thing for me to do (and
> given some hints as to what to poke).
>
> Naturally, I'm also happy to change bits of sources, rebuild, and
> smoke-test.
>
> A quick check from the SVN update output only shows r285710, r285711, and
> r285740 in the range from (r285685,r285741] -- as the kernel running
> r285685 had no known issues -- that touched sys/netinet6/*.

It's a multicast destination.  Maybe something is using mDNS?

Randall, does the test on line 406 of udp6_usrreq.c need to be inverted?

Eric



signature.asc
Description: OpenPGP digital signature


Re: panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread David Wolfskill
On Tue, Jul 21, 2015 at 10:28:32PM +0300, Konstantin Belousov wrote:
> ...
> Indeed, thank you.
> ithread_loop() at ithread_loop+0xa6/frame 0xfe083b9c0a70
> fork_exit() at fork_exit+0x84/frame 0xfe083b9c0ab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe083b9c0ab0
> --- trap 0, rip = 0, rsp = 0xfe083b9c0b70, rbp = 0 ---
> suspending ithread with the following locks held:
> shared rw udpinp (udpinp) r = 3 (0xf80010c7d7b0) locked @ 
> /usr/src/sys/netinet6/in6_pcb.c:1174
> panic: witness_warn
> cpuid = 3
> 
> So it looks like net swi, leaking some udp6 lock.

Curiouser and curiouser...  While I'm not taking any special pains to
avoid building IPv6, I'm not actively actually doing anything with it
(IPv6), either (for both the failing machine and my laptop).

Once I'm back home, I should be able to poke around in ddb after
re-creating the panic, if that would be a useful thing for me to do (and
given some hints as to what to poke).

Naturally, I'm also happy to change bits of sources, rebuild, and
smoke-test.

A quick check from the SVN update output only shows r285710, r285711, and
r285740 in the range from (r285685,r285741] -- as the kernel running
r285685 had no known issues -- that touched sys/netinet6/*.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpQ_E2uiyznk.pgp
Description: PGP signature


Re: panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread Konstantin Belousov
On Tue, Jul 21, 2015 at 07:17:43PM +, Mark Johnston wrote:
> On Tue, Jul 21, 2015 at 09:19:27AM -0700, David Wolfskill wrote:
> > On Tue, Jul 21, 2015 at 04:39:07PM +0300, Konstantin Belousov wrote:
> > > On Tue, Jul 21, 2015 at 05:57:34AM -0700, David Wolfskill wrote:
> > > > My laptop had no problems, but the build machine has a panic that
> > > > appears quite reproducible (4 "successes" out of 4 tries); here's a bit
> > > > from the core.txt file:
> > > 
> > > There must be kernel messages before the panic string.  They are crusial
> > > to understand what is going on.
> > > ...
> > 
> > Sorry I wasn't able to capture those before I needed to do Other Things.
> > 
> > The machine had a (PCI-attached) serial console that was working
> > for FreeBSD (thanks mostly to sbruno's help), but Somthing seems
> > to Have Happened, and that's not presently working (even in stable/10,
> > where I first got it working).
> > 
> > I will try to get it working again, but I doubt I will have time to
> > focus on that until about 9 hours from now.
> 
> It's possible to extract log messages leading up to the panic from the
> vmcore. From the kgdb prompt, running
> 
> (kgdb) printf "%s", msgbufp->msg_ptr
> 
> should bring them up.
> 
> And, I just noticed that you posted the core.txt, which contains this
> info near the end:
> http://www.catwhisker.org/~david/FreeBSD/head/core.txt.1

Indeed, thank you.
ithread_loop() at ithread_loop+0xa6/frame 0xfe083b9c0a70
fork_exit() at fork_exit+0x84/frame 0xfe083b9c0ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe083b9c0ab0
--- trap 0, rip = 0, rsp = 0xfe083b9c0b70, rbp = 0 ---
suspending ithread with the following locks held:
shared rw udpinp (udpinp) r = 3 (0xf80010c7d7b0) locked @ 
/usr/src/sys/netinet6/in6_pcb.c:1174
panic: witness_warn
cpuid = 3

So it looks like net swi, leaking some udp6 lock.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread Mark Johnston
On Tue, Jul 21, 2015 at 09:19:27AM -0700, David Wolfskill wrote:
> On Tue, Jul 21, 2015 at 04:39:07PM +0300, Konstantin Belousov wrote:
> > On Tue, Jul 21, 2015 at 05:57:34AM -0700, David Wolfskill wrote:
> > > My laptop had no problems, but the build machine has a panic that
> > > appears quite reproducible (4 "successes" out of 4 tries); here's a bit
> > > from the core.txt file:
> > 
> > There must be kernel messages before the panic string.  They are crusial
> > to understand what is going on.
> > ...
> 
> Sorry I wasn't able to capture those before I needed to do Other Things.
> 
> The machine had a (PCI-attached) serial console that was working
> for FreeBSD (thanks mostly to sbruno's help), but Somthing seems
> to Have Happened, and that's not presently working (even in stable/10,
> where I first got it working).
> 
> I will try to get it working again, but I doubt I will have time to
> focus on that until about 9 hours from now.

It's possible to extract log messages leading up to the panic from the
vmcore. From the kgdb prompt, running

(kgdb) printf "%s", msgbufp->msg_ptr

should bring them up.

And, I just noticed that you posted the core.txt, which contains this
info near the end:
http://www.catwhisker.org/~david/FreeBSD/head/core.txt.1
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread David Wolfskill
On Tue, Jul 21, 2015 at 04:39:07PM +0300, Konstantin Belousov wrote:
> On Tue, Jul 21, 2015 at 05:57:34AM -0700, David Wolfskill wrote:
> > My laptop had no problems, but the build machine has a panic that
> > appears quite reproducible (4 "successes" out of 4 tries); here's a bit
> > from the core.txt file:
> 
> There must be kernel messages before the panic string.  They are crusial
> to understand what is going on.
> ...

Sorry I wasn't able to capture those before I needed to do Other Things.

The machine had a (PCI-attached) serial console that was working
for FreeBSD (thanks mostly to sbruno's help), but Somthing seems
to Have Happened, and that's not presently working (even in stable/10,
where I first got it working).

I will try to get it working again, but I doubt I will have time to
focus on that until about 9 hours from now.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpXNNyz_zCPR.pgp
Description: PGP signature


Re: panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread Konstantin Belousov
On Tue, Jul 21, 2015 at 05:57:34AM -0700, David Wolfskill wrote:
> My laptop had no problems, but the build machine has a panic that
> appears quite reproducible (4 "successes" out of 4 tries); here's a bit
> from the core.txt file:

There must be kernel messages before the panic string.  They are crusial
to understand what is going on.

> 
> freebeast.catwhisker.org dumped core - see /var/crash/vmcore.1
> 
> Tue Jul 21 05:36:11 PDT 2015
> 
> FreeBSD freebeast.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1787  
> r285741M/285741:1100077: Tue Jul 21 04:48:37 PDT 2015 
> r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/sys/GENERIC  amd64
> 
> panic: witness_warn
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: witness_warn head/amd64 @r285741 on 1 of 2 machines

2015-07-21 Thread David Wolfskill
My laptop had no problems, but the build machine has a panic that
appears quite reproducible (4 "successes" out of 4 tries); here's a bit
from the core.txt file:

freebeast.catwhisker.org dumped core - see /var/crash/vmcore.1

Tue Jul 21 05:36:11 PDT 2015

FreeBSD freebeast.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1787  
r285741M/285741:1100077: Tue Jul 21 04:48:37 PDT 2015 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/sys/GENERIC  amd64

panic: witness_warn

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: witness_warn
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe083b9c0860
vpanic() at vpanic+0x189/frame 0xfe083b9c08e0
kassert_panic() at kassert_panic+0x132/frame 0xfe083b9c0950
witness_warn() at witness_warn+0x498/frame 0xfe083b9c0a20
ithread_loop() at ithread_loop+0x165/frame 0xfe083b9c0a70
fork_exit() at fork_exit+0x84/frame 0xfe083b9c0ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe083b9c0ab0
--- trap 0, rip = 0, rsp = 0xfe083b9c0b70, rbp = 0 ---
...
(kgdb) #0  doadump (textdump=0) at pcpu.h:221
#1  0x80377dfe in db_dump (dummy=, dummy2=false, 
dummy3=0, dummy4=0x0) at /usr/src/sys/ddb/db_command.c:533
#2  0x80377971 in db_command (cmd_table=0x0)
at /usr/src/sys/ddb/db_command.c:440
#3  0x80377604 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:493
#4  0x8037a19b in db_trap (type=, code=0)
at /usr/src/sys/ddb/db_main.c:251
#5  0x80a56624 in kdb_trap (type=3, code=0, tf=)
at /usr/src/sys/kern/subr_kdb.c:654
#6  0x80e61bd1 in trap (frame=0xfe083b9c0790)
at /usr/src/sys/amd64/amd64/trap.c:540
#7  0x80e41e02 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:235
#8  0x80a55cfe in kdb_enter (why=0x8136f098 "panic", 
msg=0x80a5bee0 
"UH\211AWAVATSH\203PI\211A\211H\213\004%\201H\211E\201<%x\201")
 at cpufunc.h:63
#9  0x80a19739 in vpanic (fmt=, 
ap=) at /usr/src/sys/kern/kern_shutdown.c:737
#10 0x80a19582 in kassert_panic (fmt=)
at /usr/src/sys/kern/kern_shutdown.c:634
#11 0x80a74908 in witness_warn (flags=2, lock=, 
fmt=0x81367827 "suspending ithread")
at /usr/src/sys/kern/subr_witness.c:1757
#12 0x809e2985 in ithread_loop (arg=0xf8000770c820)
at /usr/src/sys/kern/kern_intr.c:1345
#13 0x809df874 in fork_exit (
callout=0x809e2820 , arg=0xf8000770c820, 
frame=0xfe083b9c0ac0) at /usr/src/sys/kern/kern_fork.c:1006
#14 0x80e4233e in fork_trampoline ()
at /usr/src/sys/amd64/amd64/exception.S:610
#15 0x in ?? ()
Current language:  auto; currently minimal
(kgdb) 


On boot, it dropped into the debugger; it was on the most recent
instantiation that I manually issued a "dump" command from that
environment, then rebooted under the previous kernel:

FreeBSD freebeast.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1786  
r285715M/285715:1100077: Mon Jul 20 04:22:26 PDT 2015 
r...@freebeast.catwhisker.org:/common/S3/obj/usr/src/sys/GENERIC  amd64

(And yes, it runs an unmodified GENERIC kernel.)

The machine has been deployed only for a couple of months or so,
but has been building stable/10 and head daily during that time.
Until a couple of weeks ago, it was doing this for both i386 and
amd64; since then, I dropped i386 from my home infrastructure, so
it's been only amd64.

In the stable/10 environment, it also make use of a 3-spindle zraid for
running poudriere (to build the ports for my "production" machines), and
it's been doing that quite well, also.

Only other thing that I think of that's noteworthy is that its boot
drive is an SSD (where I have not yet enabled TRIM, as it's a Crucial
M500, and I need to be sure we don't try to use the queued TRIM commands
on it, as there are reports that queued TRIM commands on the M500 will
corrupt data).

OK; please see  for the
dump(-related) files.  (It's on a residential ADSL, so it's going to be
slow.  Sorry; I have a limited amount of bandwidth.)

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgp4y7XWYoZPA.pgp
Description: PGP signature