Re: panic from _mutex_assert in kern_lock.c

2002-10-08 Thread Steven G. Kargl

Jeff Roberson said:
> 
> On Sat, 5 Oct 2002, Brian F. Feldman wrote:
> 
>> "Steven G. Kargl" <[EMAIL PROTECTED]> wrote:
>>> The source tree was retrieved by cvsup
>>> at 21:47 (PST) on Oct 4.
>>>
>>> This is a non-GEOM and non-acpi kernel.
>>>
>>> I have the core and kernel.debug, so any
>>> further postmortem is possible.
>>
>> I think the problem is that in src/sys/ufs/ffs/
>> ffs_snapshot.c:ffs_snapshot(),
>> as the mnt vnode list is traversed none of the vnodes ("xvp") would
>> actually GET VI_LOCK()ed in the first place, and so the LK_INTERLOCK
>> is bogus in the vn_lock() call.  Kirk would know for sure what to do
>> about this...
>>
> 
> Yeah, I broke this.  I didn't see the LK_INTERLOCK near by when I removed
> the interlocking around usecount.  I will fix this.
> 

I sent Kirk a private email, but I haven't heard back from him.
Hopefully, he is watching the freebsd-current mailing list.

I'm actually surprise that more people haven't reported this problem.

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: panic from _mutex_assert in kern_lock.c

2002-10-08 Thread Jeff Roberson


On Sat, 5 Oct 2002, Brian F. Feldman wrote:

> "Steven G. Kargl" <[EMAIL PROTECTED]> wrote:
> > The source tree was retrieved by cvsup
> > at 21:47 (PST) on Oct 4.
> >
> > This is a non-GEOM and non-acpi kernel.
> >
> > I have the core and kernel.debug, so any
> > further postmortem is possible.
>
> I think the problem is that in src/sys/ufs/ffs/
> ffs_snapshot.c:ffs_snapshot(),
> as the mnt vnode list is traversed none of the vnodes ("xvp") would actually GET
> VI_LOCK()ed in the first place, and so the LK_INTERLOCK is bogus in the
> vn_lock() call.  Kirk would know for sure what to do about this...
>

Yeah, I broke this.  I didn't see the LK_INTERLOCK near by when I removed
the interlocking around usecount.  I will fix this.

Thanks!
Jeff


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: panic from _mutex_assert in kern_lock.c

2002-10-05 Thread Steven G. Kargl

Robert Watson said:
> 
> On Sat, 5 Oct 2002, Steven G. Kargl wrote:
> 
> > One other point, the machine was doing a background fsck on /var.  Does
> > a background fsck go through ffs_snapshot()? 
> 
> Yes -- the background file system checker creates a snapshot of the file
> system in the un-checked state, then performs the check against the
> snapshot.  It trickles the changes generated against the snapshot into the
> live file system.  Because of the conservative nature of failures with
> soft updates, the only theoretical inconsistencies relate either to marked
> as non-free yet unreferenced resources, and referenece counts that are
> high.  The snapshot allows fsck a consistent view of the file system "as
> it was" so that it doesn't get confused by the live file system. 
> 

Thanks, Brian and Robert.  Of course, the above makes sense
when someone explains it to you.

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: panic from _mutex_assert in kern_lock.c

2002-10-05 Thread Robert Watson


On Sat, 5 Oct 2002, Steven G. Kargl wrote:

> I came to the same conclusion after I sent the original email. 
> 
> What I don't understand is how I ended up in ffs_snapshot(), because I
> don't have a snapshot of /var.  I tried snapshots when Kirk first
> introduced the feature, but I removed all of the snapshots a long time
> ago.  Is there a flag in the superblock that I need to clear? 
> 
> One other point, the machine was doing a background fsck on /var.  Does
> a background fsck go through ffs_snapshot()? 

Yes -- the background file system checker creates a snapshot of the file
system in the un-checked state, then performs the check against the
snapshot.  It trickles the changes generated against the snapshot into the
live file system.  Because of the conservative nature of failures with
soft updates, the only theoretical inconsistencies relate either to marked
as non-free yet unreferenced resources, and referenece counts that are
high.  The snapshot allows fsck a consistent view of the file system "as
it was" so that it doesn't get confused by the live file system. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: panic from _mutex_assert in kern_lock.c

2002-10-05 Thread Brian F. Feldman

"Steven G. Kargl" <[EMAIL PROTECTED]> wrote:
> Brian F. Feldman said:
> > "Steven G. Kargl" <[EMAIL PROTECTED]> wrote:
> > > The source tree was retrieved by cvsup
> > > at 21:47 (PST) on Oct 4.
> > > 
> > > This is a non-GEOM and non-acpi kernel.
> > > 
> > > I have the core and kernel.debug, so any
> > > further postmortem is possible.
> > 
> > I think the problem is that in src/sys/ufs/ffs/
> > ffs_snapshot.c:ffs_snapshot(),
> > as the mnt vnode list is traversed none of the vnodes ("xvp") would actually GET
> > VI_LOCK()ed in the first place, and so the LK_INTERLOCK is bogus in the
> > vn_lock() call.  Kirk would know for sure what to do about this...
> > 
> 
> I came to the same conclusion after I sent the original email.
> 
> What I don't understand is how I ended up in ffs_snapshot(),
> because I don't have a snapshot of /var.  I tried snapshots
> when Kirk first introduced the feature, but I removed all
> of the snapshots a long time ago.  Is there a flag in the
> superblock that I need to clear?
> 
> One other point, the machine was doing a background fsck
> on /var.  Does a background fsck go through ffs_snapshot()?

Exactly: background fsck takes a snapshot to work on.  I think 
background_fsck="NO" is a good workaround at the moment for this.

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
  <> [EMAIL PROTECTED]  <> [EMAIL PROTECTED]  \  The Power to Serve! \
 Opinions expressed are my own.   \,,\



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: panic from _mutex_assert in kern_lock.c

2002-10-05 Thread Steven G. Kargl

Brian F. Feldman said:
> "Steven G. Kargl" <[EMAIL PROTECTED]> wrote:
> > The source tree was retrieved by cvsup
> > at 21:47 (PST) on Oct 4.
> > 
> > This is a non-GEOM and non-acpi kernel.
> > 
> > I have the core and kernel.debug, so any
> > further postmortem is possible.
> 
> I think the problem is that in src/sys/ufs/ffs/
> ffs_snapshot.c:ffs_snapshot(),
> as the mnt vnode list is traversed none of the vnodes ("xvp") would actually GET
> VI_LOCK()ed in the first place, and so the LK_INTERLOCK is bogus in the
> vn_lock() call.  Kirk would know for sure what to do about this...
> 

I came to the same conclusion after I sent the original email.

What I don't understand is how I ended up in ffs_snapshot(),
because I don't have a snapshot of /var.  I tried snapshots
when Kirk first introduced the feature, but I removed all
of the snapshots a long time ago.  Is there a flag in the
superblock that I need to clear?

One other point, the machine was doing a background fsck
on /var.  Does a background fsck go through ffs_snapshot()?

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: panic from _mutex_assert in kern_lock.c

2002-10-05 Thread Brian F. Feldman

"Steven G. Kargl" <[EMAIL PROTECTED]> wrote:
> The source tree was retrieved by cvsup
> at 21:47 (PST) on Oct 4.
> 
> This is a non-GEOM and non-acpi kernel.
> 
> I have the core and kernel.debug, so any
> further postmortem is possible.

I think the problem is that in src/sys/ufs/ffs/
ffs_snapshot.c:ffs_snapshot(),
as the mnt vnode list is traversed none of the vnodes ("xvp") would actually GET
VI_LOCK()ed in the first place, and so the LK_INTERLOCK is bogus in the
vn_lock() call.  Kirk would know for sure what to do about this...

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
  <> [EMAIL PROTECTED]  <> [EMAIL PROTECTED]  \  The Power to Serve! \
 Opinions expressed are my own.   \,,\



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



panic from _mutex_assert in kern_lock.c

2002-10-05 Thread Steven G. Kargl

The source tree was retrieved by cvsup
at 21:47 (PST) on Oct 4.

This is a non-GEOM and non-acpi kernel.

I have the core and kernel.debug, so any
further postmortem is possible.

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/


panic: from debugger
panic messages:
---
panic: mutex vnode interlock not owned at /usr/src/sys/kern/kern_lock.c:229
panic: from debugger
Uptime: 1m57s
pfs_vncache_unload(): 2 entries remaining
Dumping 128 MB
 16 32 48 64 80 96 112
---
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:223
223 dumping++;
(kgdb) bt
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:223
#1  0xc01ab96a in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:355
#2  0xc01abbb3 in panic () at /usr/src/sys/kern/kern_shutdown.c:508
#3  0xc013c3c2 in db_panic () at /usr/src/sys/ddb/db_command.c:450
#4  0xc013c342 in db_command (last_cmdp=0xc02e7d80, cmd_table=0xc02e7ba0, 
aux_cmd_tablep=0xc02e04d0, aux_cmd_tablep_end=0xc02e04d4)
at /usr/src/sys/ddb/db_command.c:346
#5  0xc013c456 in db_command_loop () at /usr/src/sys/ddb/db_command.c:472
#6  0xc013f0ba in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_trap.c:72
#7  0xc028c1a2 in kdb_trap (type=3, code=0, regs=0xc8e6d7b4)
at /usr/src/sys/i386/i386/db_interface.c:166
#8  0xc029c8f7 in trap (frame=
  {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -1048044496, tf_esi = 256, tf_ebp 
= -924395520, tf_isp = -924395552, tf_ebx = 0, tf_edx = 0, tf_ecx = 126, tf_eax = 18, 
tf_trapno = 3, tf_err = 0, tf_eip = -1071070140, tf_cs = 8, tf_eflags = 658, tf_esp = 
-1070749825, tf_ss = -1070840167}) at /usr/src/sys/i386/i386/trap.c:605
#9  0xc028d958 in calltrap () at {standard input}:98
#10 0xc01abb9b in panic (fmt=0xc02c39f4 "mutex %s not owned at %s:%d")
at /usr/src/sys/kern/kern_shutdown.c:494
#11 0xc01a226c in _mtx_assert (m=0xc18c0de0, what=9, 
file=0xc02c28a0 "/usr/src/sys/kern/kern_lock.c", line=229)
at /usr/src/sys/kern/kern_mutex.c:835
#12 0xc019e88b in lockmgr (lkp=0xc18c0ea4, flags=16842754, interlkp=0xc18c0de0, 
td=0xc1881c30) at /usr/src/sys/kern/kern_lock.c:229
#13 0xc01f53cc in vop_stdlock (ap=0xc8e6d8c0)
at /usr/src/sys/kern/vfs_default.c:279
#14 0xc0257118 in ufs_vnoperate (ap=0xc8e6d8c0)
at /usr/src/sys/ufs/ufs/ufs_vnops.c:2715
#15 0xc020965b in vn_lock (vp=0xc18c0de0, flags=65538, td=0xc1881c30)
at vnode_if.h:990
#16 0xc023a555 in ffs_snapshot (mp=0xc1921600, snapfile=---Can't read userspace from 
dump, or kernel process---

)
at /usr/src/sys/ufs/ffs/ffs_snapshot.c:409
#17 0xc0247cf8 in ffs_mount (mp=0xc1921600, path=0xc1b31000 "/var", data=---Can't read 
userspace from dump, or kernel process---

)
at /usr/src/sys/ufs/ffs/ffs_vfsops.c:291
#18 0xc01f97d4 in vfs_mount (td=0xc1881c30, fstype=0xc1929c20 "ffs", 
fspath=0xc1b31000 "/var", fsflags=18944000, fsdata=0xbfbffcc0)
at /usr/src/sys/kern/vfs_mount.c:1062
#19 0xc01f8f98 in mount (td=0xc1881c30, uap=0xc8e6dd10)
at /usr/src/sys/kern/vfs_mount.c:818
#20 0xc029d20e in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = -1077936672, tf_ebp = 
-1077936824, tf_isp = -924394124, tf_ebx = 135000998, tf_edx = 19, tf_ecx = 135000832, 
tf_eax = 21, tf_trapno = 12, tf_err = 2, tf_eip = 134568967, tf_cs = 31, tf_---Type 
 to continue, or q  to quit--- 
eflags = 518, tf_esp = -1077937140, tf_ss = 47})
at /usr/src/sys/i386/i386/trap.c:1050
#21 0xc028d9ad in Xint0x80_syscall () at {standard input}:140
---Can't read userspace from dump, or kernel process---

(kgdb) quit

Script done on Sat Oct  5 08:28:03 2002

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message