Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found

2013-05-31 Thread Andre Albsmeier
On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > Each day at 5:15 we are generating snapshots on various machines.
> > This used to work perfectly under 7-STABLE for years but since
> > we started to use 9.1-STABLE the machine reboots in about 10%
> > of all cases.
> > 
> > After rebooting we find a new snapshot file which is a bit
> > smaller than the good ones and with different permissions
> > It does not succeed a fsck. In this example it is the one
> > whose name is beginning with s3:
> > 
> > -r--r-   1 root  operator  snapshot 72802894528 29 May 05:15 
> > s2-2013.05.28-03.15.04
> > -r   1 root  operator  snapshot 72802893824 29 May 05:15 
> > s3-2013.05.29-03.15.03
> > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > s4-2013.05.23-06.38.44
> > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > s5-2013.05.24-03.15.03
> > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > s6-2013.05.25-03.15.03
> > 
> > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > 
> > May 29 05:15:00  palveli kernel: lock order reversal:
> > May 29 05:15:00  palveli kernel: 1st 0xc2371da8 ufs (ufs) @ 
> > /src/src-9/sys/kern/vfs_mount.c:1240
> > May 29 05:15:00  palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ 
> > /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > May 29 05:15:04  palveli kernel: lock order reversal:
> > May 29 05:15:04  palveli kernel: 1st 0xc228471c snaplk (snaplk) 
> > @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > May 29 05:15:04  palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ 
> > /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > 
> > Unfortunatley no corefiles are being generated ;-(.
> > 
> > I have checked and even rebuilt the (UFS1) fs in question
> > from scratch. I have also seen this happen on an UFS2 on
> > another machine and on a third one when running "dump -L"
> > on a root fs.
> > 
> > Any hints of how to proceed?
> 
> Would it be possible to setup a serial console that is logged on this machine
> to see if it is panic'ing but failing to write out a crashdump?

I'll try to arrange that. It'll take a bit since this
box is 200 km away... 

Maybe I'll find another one nearby to reproduce it...

-Andre

-- 
This email has been checked as virus-free.
It may still be full of nonsense however.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found

2013-05-31 Thread John Baldwin
On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> Each day at 5:15 we are generating snapshots on various machines.
> This used to work perfectly under 7-STABLE for years but since
> we started to use 9.1-STABLE the machine reboots in about 10%
> of all cases.
> 
> After rebooting we find a new snapshot file which is a bit
> smaller than the good ones and with different permissions
> It does not succeed a fsck. In this example it is the one
> whose name is beginning with s3:
> 
> -r--r-   1 root  operator  snapshot 72802894528 29 May 05:15 
> s2-2013.05.28-03.15.04
> -r   1 root  operator  snapshot 72802893824 29 May 05:15 
> s3-2013.05.29-03.15.03
> -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> s4-2013.05.23-06.38.44
> -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> s5-2013.05.24-03.15.03
> -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> s6-2013.05.25-03.15.03
> 
> After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> I see the following LORs (mksnap_ffs starts exactly at 5:15):
> 
> May 29 05:15:00  palveli kernel: lock order reversal:
> May 29 05:15:00  palveli kernel: 1st 0xc2371da8 ufs (ufs) @ 
> /src/src-9/sys/kern/vfs_mount.c:1240
> May 29 05:15:00  palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ 
> /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> May 29 05:15:04  palveli kernel: lock order reversal:
> May 29 05:15:04  palveli kernel: 1st 0xc228471c snaplk (snaplk) @ 
> /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> May 29 05:15:04  palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ 
> /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> 
> Unfortunatley no corefiles are being generated ;-(.
> 
> I have checked and even rebuilt the (UFS1) fs in question
> from scratch. I have also seen this happen on an UFS2 on
> another machine and on a third one when running "dump -L"
> on a root fs.
> 
> Any hints of how to proceed?

Would it be possible to setup a serial console that is logged on this machine
to see if it is panic'ing but failing to write out a crashdump?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


pf loosing (v6) TCP states much too early, "no-route" not working with IPv6

2013-05-31 Thread Harald Schmalzbauer
 Hello,

my default pf config blocks everything and allowes specific connections.
One of them is "in from x to self port ssh" which expands to "port ssh
keep state flags S/SA" by default.

After ssh login, I see the corresponding entry in the states table:
all tcp 2001:db8:f0bb:1::1[22] <- 2001:db8:f0bb:1::3:1[42730]  
ESTABLISHED:ESTABLISHED

pfctl -s info claims:
TIMEOUTS:
...
tcp.established   86400s
...

After a couple of hours of inactivity, the ssh session silently stalls.
Here's what I have in the log:
rule 3/0(match): block in on rl1: 2001:db8:f0bb:1::3:1.42730 >
2001:db8:f0bb:1::1.22: Flags [P.], ack 1444009640, win 65535, length 48

The rule evaluation by itself is correct, it's no TCP-SYN, so it get's
blocked, but this packet should not get through the ruleset at all, at
least not before 86400s of idle connection. In my case, it was after ~3
hours. And ports numbers are exactly the same as in the state table
entry from some hours before. So the state table entry seems to got lost!

My question:

Is such a problem known?
Did I miss enything else?

System runs 8.1-STABLE/x86

Another issue was that "no-route" doesn't work for IPv6 connections. I
had to replace it with "any".

Thansk for any hints in advance,

-Harry

P.S.: It's an embedded box where upgrading is overdue, but not that easy...



signature.asc
Description: OpenPGP digital signature


FreeBSD-9.1: machine reboots during snapshot creation, LORs found

2013-05-31 Thread Andre Albsmeier
Each day at 5:15 we are generating snapshots on various machines.
This used to work perfectly under 7-STABLE for years but since
we started to use 9.1-STABLE the machine reboots in about 10%
of all cases.

After rebooting we find a new snapshot file which is a bit
smaller than the good ones and with different permissions
It does not succeed a fsck. In this example it is the one
whose name is beginning with s3:

-r--r-   1 root  operator  snapshot 72802894528 29 May 05:15 
s2-2013.05.28-03.15.04
-r   1 root  operator  snapshot 72802893824 29 May 05:15 
s3-2013.05.29-03.15.03
-r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
s4-2013.05.23-06.38.44
-r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
s5-2013.05.24-03.15.03
-r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
s6-2013.05.25-03.15.03

After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
I see the following LORs (mksnap_ffs starts exactly at 5:15):

May 29 05:15:00  palveli kernel: lock order reversal:
May 29 05:15:00  palveli kernel: 1st 0xc2371da8 ufs (ufs) @ 
/src/src-9/sys/kern/vfs_mount.c:1240
May 29 05:15:00  palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ 
/src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
May 29 05:15:04  palveli kernel: lock order reversal:
May 29 05:15:04  palveli kernel: 1st 0xc228471c snaplk (snaplk) @ 
/src/src-9/sys/ufs/ufs/ufs_vnops.c:976
May 29 05:15:04  palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ 
/src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626

Unfortunatley no corefiles are being generated ;-(.

I have checked and even rebuilt the (UFS1) fs in question
from scratch. I have also seen this happen on an UFS2 on
another machine and on a third one when running "dump -L"
on a root fs.

Any hints of how to proceed?

-Andre
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_9 tinderbox] failure on i386/i386

2013-05-31 Thread FreeBSD Tinderbox
TB --- 2013-05-31 07:10:25 - tinderbox 2.10 running on freebsd-stable.sentex.ca
TB --- 2013-05-31 07:10:25 - FreeBSD freebsd-stable.sentex.ca 8.3-STABLE 
FreeBSD 8.3-STABLE #0: Tue Oct 16 17:37:58 UTC 2012 
mdtan...@freebsd-stable.sentex.ca:/usr/obj/usr/src/sys/server  amd64
TB --- 2013-05-31 07:10:25 - starting RELENG_9 tinderbox run for i386/i386
TB --- 2013-05-31 07:10:25 - cleaning the object tree
TB --- 2013-05-31 07:10:45 - /usr/local/bin/svn stat /src
TB --- 2013-05-31 07:10:50 - At svn revision 251176
TB --- 2013-05-31 07:10:51 - building world
TB --- 2013-05-31 07:10:51 - CROSS_BUILD_TESTING=YES
TB --- 2013-05-31 07:10:51 - MAKEOBJDIRPREFIX=/obj
TB --- 2013-05-31 07:10:51 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2013-05-31 07:10:51 - SRCCONF=/dev/null
TB --- 2013-05-31 07:10:51 - TARGET=i386
TB --- 2013-05-31 07:10:51 - TARGET_ARCH=i386
TB --- 2013-05-31 07:10:51 - TZ=UTC
TB --- 2013-05-31 07:10:51 - __MAKE_CONF=/dev/null
TB --- 2013-05-31 07:10:51 - cd /src
TB --- 2013-05-31 07:10:51 - /usr/bin/make -B buildworld
>>> World build started on Fri May 31 07:10:51 UTC 2013
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Fri May 31 10:05:30 UTC 2013
TB --- 2013-05-31 10:05:30 - generating LINT kernel config
TB --- 2013-05-31 10:05:30 - cd /src/sys/i386/conf
TB --- 2013-05-31 10:05:30 - /usr/bin/make -B LINT
TB --- 2013-05-31 10:05:30 - cd /src/sys/i386/conf
TB --- 2013-05-31 10:05:30 - /usr/sbin/config -m LINT
TB --- 2013-05-31 10:05:30 - building LINT kernel
TB --- 2013-05-31 10:05:30 - CROSS_BUILD_TESTING=YES
TB --- 2013-05-31 10:05:30 - MAKEOBJDIRPREFIX=/obj
TB --- 2013-05-31 10:05:30 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2013-05-31 10:05:30 - SRCCONF=/dev/null
TB --- 2013-05-31 10:05:30 - TARGET=i386
TB --- 2013-05-31 10:05:30 - TARGET_ARCH=i386
TB --- 2013-05-31 10:05:30 - TZ=UTC
TB --- 2013-05-31 10:05:30 - __MAKE_CONF=/dev/null
TB --- 2013-05-31 10:05:30 - cd /src
TB --- 2013-05-31 10:05:30 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Fri May 31 10:05:30 UTC 2013
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
>>> stage 3.2: building everything
[...]
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000 -DGPROF -falign-functions=16 -DGPROF4 
-DGUPROF -fno-builtin -mno-align-long-strings -mpreferred-stack-boundary=2 
-mno-mmx -mno-sse -msoft-float -ffreestanding -fstack-protector -Werror -pg 
-mprofiler-epilogue /src/sys/dev/aha/aha_isa.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000 -DGPROF -falign-functions=16 -DGPROF4 
-DGUPROF -fno-builtin -mno-align-long-strings -mpreferred-stack-boundary=2 
-mno-mmx -mno-sse -msoft-float -ffreestanding -fstack-protector -Werror -pg 
-mprofiler-epilogue /src/sys/dev/aha/aha_mca.c
In file included from /src/sys/dev/aha/aha_mca.c:49:
/src/sys/dev/aha/ahareg.h:300: error: field 'timer' has incomplete type
/src/sys/dev/aha/aha_mca.c: In function 'aha_mca_attach':
/src/sys/dev/aha/aha_mca.c:194: error: 'aha' undeclared (first use in this 
function)
/src/sys/dev/aha/aha_mca.c:194: error: (Each undeclared identifier is reported 
only once
/src/sys/dev/aha/aha_mca.c:194: error: for each function it appears in.)
*** Error code 1

Stop in /obj/i386.i386/src/sys/LINT.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2013-05-31 10:12:04 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2013-05-31 10:12:04 - ERROR: failed to build LINT kernel
TB --- 2013-05-31 10:12:04 - 8357.76 user 914.91 system 10899.77 real


http://tinderbox.freebsd.org/tinderbox-freebsd9-build-RELENG_9-i386-i386.full
___
freebsd-stab