Re: Frequent VFS crashes with RELENG_6

2006-11-14 Thread Kris Kennaway
On Tue, Nov 14, 2006 at 05:10:23PM +0200, Vlad Galu wrote:
> On 11/1/06, Vlad Galu <[EMAIL PROTECTED]> wrote:
> >On 10/31/06, Kris Kennaway <[EMAIL PROTECTED]> wrote:
> >> On Tue, Oct 31, 2006 at 04:34:59PM +0200, Vlad Galu wrote:
> >>
> >> >Yes, but for objective reasons I can't publish it :(
> >> > The only
> >> > debugging option that I didn't use was INVARIANTS.
> >>
> >> Which is coincidentally the most useful one ;-)
> >>
> >> Also turn on DEBUG_LOCKS and DEBUG_VFS_LOCKS then report the output of
> >> 'show lockedvnods' at the time of crash, as well.
> 
>Upon Tor Egge's suggestion, I removed ZERO_COPY_SOCKETS from my
> kernel and the machine has been running nicely ever since.

Glad to hear it, depending on what Tor had to say you might want to
file a PR about that.

Kris


pgpZDlSaF6yFa.pgp
Description: PGP signature


Re: Frequent VFS crashes with RELENG_6

2006-11-14 Thread Vlad Galu

On 11/1/06, Vlad Galu <[EMAIL PROTECTED]> wrote:

On 10/31/06, Kris Kennaway <[EMAIL PROTECTED]> wrote:
> On Tue, Oct 31, 2006 at 04:34:59PM +0200, Vlad Galu wrote:
>
> >Yes, but for objective reasons I can't publish it :(
> > The only
> > debugging option that I didn't use was INVARIANTS.
>
> Which is coincidentally the most useful one ;-)
>
> Also turn on DEBUG_LOCKS and DEBUG_VFS_LOCKS then report the output of
> 'show lockedvnods' at the time of crash, as well.


   Upon Tor Egge's suggestion, I removed ZERO_COPY_SOCKETS from my
kernel and the machine has been running nicely ever since.

--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Frequent VFS crashes with RELENG_6

2006-11-03 Thread Vlad Galu

On 10/31/06, Kris Kennaway <[EMAIL PROTECTED]> wrote:

  It now crashes in a different place. Unfortunately I don't have
physical access to the machine. A bt full is available at
http://night.rdslink.ro/dudu/freebsd/03_11_2006.txt. The stack was
corrupted though :(

--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Frequent VFS crashes with RELENG_6

2006-10-31 Thread Vlad Galu

On 10/31/06, Kris Kennaway <[EMAIL PROTECTED]> wrote:

On Tue, Oct 31, 2006 at 04:34:59PM +0200, Vlad Galu wrote:

>Yes, but for objective reasons I can't publish it :(
> The only
> debugging option that I didn't use was INVARIANTS.

Which is coincidentally the most useful one ;-)

Also turn on DEBUG_LOCKS and DEBUG_VFS_LOCKS then report the output of
'show lockedvnods' at the time of crash, as well.


  I've applied a patch suggested by Eric and I'll see how it goes
with it. If it crashes again, I'll add the things you mentioned to my
kernel configuration and get back to the list with further details.


--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Frequent VFS crashes with RELENG_6

2006-10-31 Thread Kris Kennaway
On Tue, Oct 31, 2006 at 04:34:59PM +0200, Vlad Galu wrote:

>Yes, but for objective reasons I can't publish it :(
> The only
> debugging option that I didn't use was INVARIANTS.

Which is coincidentally the most useful one ;-)

Also turn on DEBUG_LOCKS and DEBUG_VFS_LOCKS then report the output of
'show lockedvnods' at the time of crash, as well.

Kris


pgpfBXzUAvxB9.pgp
Description: PGP signature


Re: Frequent VFS crashes with RELENG_6

2006-10-31 Thread Vlad Galu

On 10/31/06, Eric Anderson <[EMAIL PROTECTED]> wrote:

On 10/31/06 08:03, Vlad Galu wrote:
> On 10/1/06, Cy Schubert <[EMAIL PROTECTED]> wrote:
>> In message <[EMAIL PROTECTED]>,
>> "Vlad
>>  GALU" writes:
>>> On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote:
 Hi,

 1.) Bad ram ? Have you run some memory tester ?
>>>Yes, memtest86 didn't show anything weird.
>>>
 2.) Have you background fsck running on this disk ? If
 so try to boot into single user and do a full fsck on this
 disk.

>>>I have background_fsck="NO" in rc.conf and I checked the whole disk
>>> several times.
>>>Something I forgot to mention earlier: the crash is easier to
>>> reproduce when running rtorrent. The machine did crash without running
>>> it as well, but far more seldom.
>> I've been experiencing the same problem as well. I discovered that the disk 
on which the filesystem was had some bad sectors causing dump -0Lauf to fail while 
taking snapshot causing the system to panic. Running smartctl on the device indicated 
that there were bad sectors 40% within the surface scan being performed by SMART. The 
drive, an 80 GB Maxtor, was replaced with a 250 GB Western Digital (for a very good 
price, so good a price I purchased two of them). It was 906 days old, having only 
been powered off maybe a dozen times over the last three years.
>
>  During the last 2 weeks I ran the same system with WITNESS turned
> on. The fact that the purpose of this machine is not I/O dependant
> allowed me to run bonnie++ and iozone every second day for the whole
> 24 hours. At the same time I ran several instances of rtorrent. This
> morning I rebooted to a non-WITNESS kernel (the same sources from 2
> weeks ago) and the exact same crash occured within a few hours from
> bootup. In all this time, smartd didn't report anything suspicious.
> WITNESS only reported a LOR related to kqueue that is already known.
>  Any ideas for further stresstesting would be welcome. I am
> familiar with a few parts of the kernel, but VFS is a total stranger
> to me.
>
>


Did you get a crash dump?  If not, you might want to start with adding
all the debugger options into the kernel.


   Yes, but for objective reasons I can't publish it :( The only
debugging option that I didn't use was INVARIANTS. However, I issued
an output of "bt full" during the beginning of this thread. See
http://lists.freebsd.org/pipermail/freebsd-stable/2006-September/028985.html.



Eric



--

Eric AndersonSr. Systems AdministratorCentaur Technology
Anything that works is better than anything that doesn't.





--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Frequent VFS crashes with RELENG_6

2006-10-31 Thread Eric Anderson

On 10/31/06 08:03, Vlad Galu wrote:

On 10/1/06, Cy Schubert <[EMAIL PROTECTED]> wrote:

In message <[EMAIL PROTECTED]>,
"Vlad
 GALU" writes:

On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote:

Hi,

1.) Bad ram ? Have you run some memory tester ?

   Yes, memtest86 didn't show anything weird.


2.) Have you background fsck running on this disk ? If
so try to boot into single user and do a full fsck on this
disk.


   I have background_fsck="NO" in rc.conf and I checked the whole disk
several times.
   Something I forgot to mention earlier: the crash is easier to
reproduce when running rtorrent. The machine did crash without running
it as well, but far more seldom.

I've been experiencing the same problem as well. I discovered that the disk on 
which the filesystem was had some bad sectors causing dump -0Lauf to fail while 
taking snapshot causing the system to panic. Running smartctl on the device 
indicated that there were bad sectors 40% within the surface scan being 
performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB 
Western Digital (for a very good price, so good a price I purchased two of 
them). It was 906 days old, having only been powered off maybe a dozen times 
over the last three years.


 During the last 2 weeks I ran the same system with WITNESS turned
on. The fact that the purpose of this machine is not I/O dependant
allowed me to run bonnie++ and iozone every second day for the whole
24 hours. At the same time I ran several instances of rtorrent. This
morning I rebooted to a non-WITNESS kernel (the same sources from 2
weeks ago) and the exact same crash occured within a few hours from
bootup. In all this time, smartd didn't report anything suspicious.
WITNESS only reported a LOR related to kqueue that is already known.
 Any ideas for further stresstesting would be welcome. I am
familiar with a few parts of the kernel, but VFS is a total stranger
to me.





Did you get a crash dump?  If not, you might want to start with adding 
all the debugger options into the kernel.



Eric



--

Eric AndersonSr. Systems AdministratorCentaur Technology
Anything that works is better than anything that doesn't.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Frequent VFS crashes with RELENG_6

2006-10-31 Thread Vlad Galu

On 10/1/06, Cy Schubert <[EMAIL PROTECTED]> wrote:

In message <[EMAIL PROTECTED]>,
"Vlad
 GALU" writes:
> On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> >
> > 1.) Bad ram ? Have you run some memory tester ?
>
>Yes, memtest86 didn't show anything weird.
>
> > 2.) Have you background fsck running on this disk ? If
> > so try to boot into single user and do a full fsck on this
> > disk.
> >
>
>I have background_fsck="NO" in rc.conf and I checked the whole disk
> several times.
>Something I forgot to mention earlier: the crash is easier to
> reproduce when running rtorrent. The machine did crash without running
> it as well, but far more seldom.

I've been experiencing the same problem as well. I discovered that the disk on 
which the filesystem was had some bad sectors causing dump -0Lauf to fail while 
taking snapshot causing the system to panic. Running smartctl on the device 
indicated that there were bad sectors 40% within the surface scan being 
performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB 
Western Digital (for a very good price, so good a price I purchased two of 
them). It was 906 days old, having only been powered off maybe a dozen times 
over the last three years.


During the last 2 weeks I ran the same system with WITNESS turned
on. The fact that the purpose of this machine is not I/O dependant
allowed me to run bonnie++ and iozone every second day for the whole
24 hours. At the same time I ran several instances of rtorrent. This
morning I rebooted to a non-WITNESS kernel (the same sources from 2
weeks ago) and the exact same crash occured within a few hours from
bootup. In all this time, smartd didn't report anything suspicious.
WITNESS only reported a LOR related to kqueue that is already known.
Any ideas for further stresstesting would be welcome. I am
familiar with a few parts of the kernel, but VFS is a total stranger
to me.


--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Frequent VFS crashes with RELENG_6

2006-09-30 Thread Cy Schubert
In message <[EMAIL PROTECTED]>, 
"Vlad
 GALU" writes:
> On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> >
> > 1.) Bad ram ? Have you run some memory tester ?
> 
>Yes, memtest86 didn't show anything weird.
> 
> > 2.) Have you background fsck running on this disk ? If
> > so try to boot into single user and do a full fsck on this
> > disk.
> >
> 
>I have background_fsck="NO" in rc.conf and I checked the whole disk
> several times.
>Something I forgot to mention earlier: the crash is easier to
> reproduce when running rtorrent. The machine did crash without running
> it as well, but far more seldom.

I've been experiencing the same problem as well. I discovered that the disk on 
which the filesystem was had some bad sectors causing dump -0Lauf to fail while 
taking snapshot causing the system to panic. Running smartctl on the device 
indicated that there were bad sectors 40% within the surface scan being 
performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB 
Western Digital (for a very good price, so good a price I purchased two of 
them). It was 906 days old, having only been powered off maybe a dozen times 
over the last three years.


-- 
Cheers,
Cy Schubert <[EMAIL PROTECTED]>
FreeBSD UNIX:  <[EMAIL PROTECTED]>   Web:  http://www.FreeBSD.org

e**(i*pi)+1=0


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Frequent VFS crashes with RELENG_6

2006-09-30 Thread Vlad GALU

On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote:


Hi,

1.) Bad ram ? Have you run some memory tester ?


  Yes, memtest86 didn't show anything weird.


2.) Have you background fsck running on this disk ? If
so try to boot into single user and do a full fsck on this
disk.



  I have background_fsck="NO" in rc.conf and I checked the whole disk
several times.
  Something I forgot to mention earlier: the crash is easier to
reproduce when running rtorrent. The machine did crash without running
it as well, but far more seldom.



Martin

Martin Blapp, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
--
ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 61 826 93 00 Fax: +41 61 826 93 01
PGP: 
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
--

On Sat, 30 Sep 2006, Vlad GALU wrote:

> I've been getting random crashes like the one below, once or twice a
> week, always in the same code path. The system is a RELENG_6 as of Wed
> Sep 27 11:42:57 EEST 2006, running on amd64.
>
> -- cut here --
> #0  doadump () at pcpu.h:172
> No locals.
> #1  0x8022d033 in boot (howto=260) at
> ../../../kern/kern_shutdown.c:409
>   first_buf_printf = 1
> #2  0x8022d687 in panic (fmt=0xff002bb6e260 "°ö¾\"") at
> ../../../kern/kern_shutdown.c:565
>   bootopt = 260
>   newpanic = 0
>   ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
> 0xa7995790, reg_save_area = 0xa79956b0}}
>   buf = "vm_page_unwire: invalid wire count: 0", '\0'  times>




--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Frequent VFS crashes with RELENG_6

2006-09-30 Thread Martin Blapp


Hi,

1.) Bad ram ? Have you run some memory tester ?

2.) Have you background fsck running on this disk ? If
so try to boot into single user and do a full fsck on this
disk.

Martin

Martin Blapp, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
--
ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 61 826 93 00 Fax: +41 61 826 93 01
PGP: 
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
--

On Sat, 30 Sep 2006, Vlad GALU wrote:


I've been getting random crashes like the one below, once or twice a
week, always in the same code path. The system is a RELENG_6 as of Wed
Sep 27 11:42:57 EEST 2006, running on amd64.

-- cut here --
#0  doadump () at pcpu.h:172
No locals.
#1  0x8022d033 in boot (howto=260) at 
../../../kern/kern_shutdown.c:409

  first_buf_printf = 1
#2  0x8022d687 in panic (fmt=0xff002bb6e260 "°ö¾\"") at
../../../kern/kern_shutdown.c:565
  bootopt = 260
  newpanic = 0
  ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
0xa7995790, reg_save_area = 0xa79956b0}}
  buf = "vm_page_unwire: invalid wire count: 0", '\0' times>___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Frequent VFS crashes with RELENG_6

2006-09-30 Thread Vlad GALU

I've been getting random crashes like the one below, once or twice a
week, always in the same code path. The system is a RELENG_6 as of Wed
Sep 27 11:42:57 EEST 2006, running on amd64.

-- cut here --
#0  doadump () at pcpu.h:172
No locals.
#1  0x8022d033 in boot (howto=260) at ../../../kern/kern_shutdown.c:409
   first_buf_printf = 1
#2  0x8022d687 in panic (fmt=0xff002bb6e260 "°ö¾\"") at
../../../kern/kern_shutdown.c:565
   bootopt = 260
   newpanic = 0
   ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
0xa7995790, reg_save_area = 0xa79956b0}}
   buf = "vm_page_unwire: invalid wire count: 0", '\0' 
#3  0x8036980b in vm_page_unwire (m=0xff003e5c79e8,
activate=0) at ../../../vm/vm_page.c:1265
No locals.
#4  0x80282c15 in vfs_vmio_release (bp=0x9a6c2430) at
../../../kern/vfs_bio.c:1470
   i = 1
   m = 0xff003e5c79e8
#5  0x80285f78 in getnewbuf (slpflag=0, slptimeo=0, size=0,
maxsize=16384) at ../../../kern/vfs_bio.c:1779
   addr = 18446744072226429136
   bp = (struct buf *) 0x9a6c2430
   nbp = (struct buf *) 0x9a69ac48
   defrag = 0
   nqindex = 1
   flushingbufs = 0
#6  0x802863c0 in getblk (vp=0xff001015c5d0, blkno=0,
size=2048, slpflag=0, slptimeo=0, flags=0) at
../../../kern/vfs_bio.c:2486
   bsize = 0
   maxsize = 0
   vmio = 1
   offset = 0
   bp = (struct buf *) 0x0
   bo = (struct bufobj *) 0xff001015c720
#7  0x802880ec in breadn (vp=0xff001015c5d0, blkno=0,
size=0, rablkno=0x0, rabsize=0x0, cnt=0, cred=0x0, bpp=0x0) at
../../../kern/vfs_bio.c:738
   bp = (struct buf *) 0xa79958f0
   rabp = (struct buf *) 0x344
   i = -1
   rv = 0
   readwait = 0
#8  0x8028850e in bread (vp=0x0, blkno=0, size=0, cred=0x0,
bpp=0x0) at ../../../kern/vfs_bio.c:719
No locals.
#9  0x803427a5 in ffs_read (ap=0x0) at ../../../ufs/ffs/ffs_vnops.c:523
   vp = (struct vnode *) 0xff001015c5d0
   ip = (struct inode *) 0xff0017978780
   uio = (struct uio *) 0xa7995b50
   fs = (struct fs *) 0xff0012347000
   bp = (struct buf *) 0x0
   lbn = 0
   nextlbn = 1
   bytesinfile = 0
   size = 2048
   xfersize = 836
   blkoffset = 0
   error = 0
   orig_resid = 4096
   seqcount = 2
   ioflag = 131072
#10 0x803b374a in VOP_READ_APV (vop=0x0, a=0x0) at vnode_if.c:643
   rc = 0
#11 0x802a74e0 in vn_read (fp=0xff001e5f8078,
uio=0xa7995b50, active_cred=0x0, flags=0,
td=0xff002bb6e260) at vnode_if.h:343
   vp = (struct vnode *) 0xff001015c5d0
   error = 0
   ioflag = 131072
#12 0x80257b64 in dofileread (td=0xff002bb6e260, fd=5,
fp=0xff001e5f8078, auio=0xa7995b50, offset=0, flags=0) at
file.h:240
   cnt = 4096
   error = 509575288
   ktruio = (struct uio *) 0x0
#13 0x80257de0 in kern_readv (td=0xff002bb6e260, fd=5,
auio=0xa7995b50) at ../../../kern/sys_generic.c:192
   fp = (struct file *) 0xff001e5f8078
   error = 0
#14 0x80257eda in read (td=0x0, uap=0x0) at
../../../kern/sys_generic.c:116
   auio = {uio_iov = 0xa7995b40, uio_iovcnt = 1,
uio_offset = 0, uio_resid = 4096, uio_segflg = UIO_USERSPACE, uio_rw =
UIO_READ, uio_td = 0xff002bb6e260}
   aiov = {iov_base = 0x666000, iov_len = 4096}
#15 0x8038b2d8 in syscall (frame=
 {tf_rdi = 5, tf_rsi = 6709248, tf_rdx = 4096, tf_rcx =
542953472, tf_r8 = 1, tf_r9 = 0, tf_rax = 3, tf_rbx = 6151168, tf_rbp
= 4294967295, tf_r10 = 3260, tf_r11 = 518, tf_r12 = 0, tf_r13 =
140737488327200, tf_r14 = 140737488327328, tf_r15 = 5, tf_trapno = 12,
tf_addr = 9093168, tf_flags = 0, tf_err = 2, tf_rip = 550694412, tf_cs
= 43, tf_rflags = 518, tf_rsp = 140737488327160, tf_ss = 35}) at
../../../amd64/amd64/trap.c:792
   params = 0x7fff9200 
   callp = (struct sysent *) 0x80502ae8
   p = (struct proc *) 0xff0022bef6b0
   orig_tf_rflags = 518
   sticks = 116
   error = 0
   narg = 3
   args = {5, 6709248, 4096, 542953472, 1, 0, 140737488327328, 5}
   argp = (register_t *) 0x0
   code = 3
   reg = 48
   regcnt = 6
#16 0x80377bc8 in Xfast_syscall () at
../../../amd64/amd64/exception.S:270
-- and here --


--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"