Re: anyone seen these outside of alpha? or on non-SMP?
> Why can't a filesystem hacker back it out until his return? Things are > not getting better and this is tripping up more and more people. The enclosed patch might help somewhat against the "active pagedep" panics introduced in revision 1.98 of ffs_softdep.c. Instead of a panic, a message is printed and the pagedep structure isn't freed (it will be freed later by free_newdirblk()). - Tor Egge Index: sys/ufs/ffs/ffs_softdep.c === RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_softdep.c,v retrieving revision 1.98 diff -u -r1.98 ffs_softdep.c --- sys/ufs/ffs/ffs_softdep.c 2001/06/05 01:49:37 1.98 +++ sys/ufs/ffs/ffs_softdep.c 2001/06/07 18:30:16 @@ -1932,14 +1932,16 @@ WORKLIST_INSERT(&inodedep->id_bufwait, &dirrem->dm_list); } + + WORKLIST_REMOVE(&pagedep->pd_list); if ((pagedep->pd_state & NEWBLOCK) != 0) { - FREE_LOCK(&lk); - panic("deallocate_dependencies: " - "active pagedep"); + /* XXX: Wait for newdirblk to be freed */ + printf("deallocate_dependencies: " + "active pagedep\n"); + } else { + LIST_REMOVE(pagedep, pd_hash); + WORKITEM_FREE(pagedep, D_PAGEDEP); } - WORKLIST_REMOVE(&pagedep->pd_list); - LIST_REMOVE(pagedep, pd_hash); - WORKITEM_FREE(pagedep, D_PAGEDEP); continue; case D_ALLOCINDIR:
Re: anyone seen these outside of alpha? or on non-SMP?
> My guess would be that the inode in question is a directory inode, > and that there are temp files there, or a lot of open files, but > that is just a ballpark guess. Correct. A sample program to reproduce this problem is enclosed. When a diradd dependency that causes a newdirblk dependency to be allocated is made obsolete in newdirrem(), the pagedep structure is likely to be freed without first removing the newdirblk dependency that still points to the pagedep structure. - Tor Egge #!/bin/sh dovmstat() { vmstat -m | awk '/^ *(mkdir|newdirblk|dirrem|diradd|pagedep)/ { print }' } dovmstat rm -rf a dirrems=`vmstat -m | awk '/^ *dirrem/ { print $2 }'` while test $dirrems -gt 0 do sync sleep 1 dirrems=`vmstat -m | awk '/^ *dirrem/ { print $2 }'` done mkdir a mkdirs=`vmstat -m | awk '/^ *mkdir/ { print $2 }'` while test $mkdirs -gt 0 do sync sleep 1 mkdirs=`vmstat -m | awk '/^ *mkdir/ { print $2 }'` done dovmstat touch a/000 dovmstat touch a/001 dovmstat touch a/002 dovmstat touch a/003 dovmstat touch a/004 dovmstat touch a/005 dovmstat touch a/006 dovmstat touch a/007 dovmstat touch a/007 dovmstat touch a/008 dovmstat touch a/009 dovmstat touch a/00a dovmstat touch a/00b dovmstat touch a/00c dovmstat touch a/00d dovmstat touch a/00e dovmstat touch a/00f dovmstat rm a/00f dovmstat ls -ld a dovmstat rm -rf a dovmstat echo FINISHED
Re: anyone seen these outside of alpha? or on non-SMP?
] Data modified on freelist: word 2 of object 0xfe000190b780 size 72 ] previous type inodedep (0xd6adc0de != 0xdeadc0de) ] ... ] Data modified on freelist: word 2 of object 0xfe0001806700 size 72 ] previous type pagedep (0xd6adc0de != 0xdeadc0de) ] ] ] Anyone seen these on non-SMP? On i386? Yes. I have seen this on 4.3, after opening more than 32,767 network connections, only in my case the problem occurred in the close, after the credential structure reference count overflowed. There will probably be significantly more of these problems in -current, since much of the recent locking work has been a bit less than comprehensive, so there are probably free races in a lot of places that used to be implicitly protected via past serialization through the BGL. There are exactly 12 structures 72 bytes long in the FreeBSD kernel: struct rusage= 72 struct nameidata= 72 struct ifpppstatsreq = 72 struct ifpppcstatsreq = 72 struct sadb_comb = 72 struct ddpcb = 72 struct atmsetreq = 72 struct linkinfo = 72 struct ng_one2many_config = 72 struct ng_ppp_mp_state = 72 struct ipfw_dyn_rule = 72 struct secasvar = 72 Despite the obvious involvement os the soft updates code (there is a reference counted object reference underflow, which resulted in the data being in use after nominally being freed), my money is on the "struct rusage" on a process exit. This implies a race condition in the sync'ing of data for files being resource-tracking closed as a result of the process exit triggering a dependency failure. My guess would be that the inode in question is a directory inode, and that there are temp files there, or a lot of open files, but that is just a ballpark guess. -- My first suggestion would be to turn the printf() as a result of INVARIANTS (which is where the message is coming from, in the kern_malloc.c code) into a true panic, since what you are seeing isundoubtedly a cascade failure. This will (if you have the debugger enabled) let you examine the object that is being spam'med before it gets stepped on into illegibility. Knowing the size will let you catch the allocation. -- Note that in the message referenced by David, the errors were on different objects; I'll guess at decoding them, as well: May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de) May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc16f02c0 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de) May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc1a60480 size 52 previous type pagedep (0xd6adc0de != 0xdeadc0de) ...these are 64 and 52 bytes each -- different structures. Here are the probables: struct iodone_chain= 52 struct lockf= 52 struct protosw= 52 struct attr_calling = 52 struct ng_bpf_hookprog = 52 struct mrtstat = 52 struct ipprotosw = 52 struct udpstat = 52 struct ip6protosw = 52 struct ostat= 64 struct ifaliasreq = 64 struct at_aliasreq = 64 struct attr_traffic = 64 struct atm_sock_stat = 64 struct ng_type = 64 struct ng_pptpgre_stats = 64 struct in_aliasreq = 64 struct in6_prefixreq = 64 struct ip6_pktopts = 64 ...my ballpark bets, again, would be: struct lockf= 52 struct ostat= 64 Terry Lambert [EMAIL PROTECTED] --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: anyone seen these outside of alpha? or on non-SMP?
> On Mon, Jun 04, 2001 at 09:37:36PM -0700, Matthew Jacob wrote: > > > > It's an easy fix except if it's your root fs- turn off softupdates. > > Yeah that's the solution -- just keep disabling features. How far do we Oh, c'mon Dave, take a pill... It's only 'til Kirk gets back from rafting... > go? Disable FFS, disable the VM system. Well, I might be left with > enough to get a printf() to display something. Nope- no console device... What we'll be able to do is to blink the keyboard LEDs in morse code though F...Y ... D O To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: anyone seen these outside of alpha? or on non-SMP?
On Mon, Jun 04, 2001 at 09:37:36PM -0700, Matthew Jacob wrote: > > It's an easy fix except if it's your root fs- turn off softupdates. Yeah that's the solution -- just keep disabling features. How far do we go? Disable FFS, disable the VM system. Well, I might be left with enough to get a printf() to display something. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: anyone seen these outside of alpha? or on non-SMP?
It's an easy fix except if it's your root fs- turn off softupdates. On Mon, 4 Jun 2001, David O'Brien wrote: > On Mon, Jun 04, 2001 at 02:25:41PM -0700, John Baldwin wrote: > > Yes. Many, many, many, many times. Softupdates is broken in -current > > right now and has been since Kirk's last commit. :-P > > Why can't a filesystem hacker back it out until his return? Things are > not getting better and this is tripping up more and more people. > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-current" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: anyone seen these outside of alpha? or on non-SMP?
On Mon, Jun 04, 2001 at 02:25:41PM -0700, John Baldwin wrote: > Yes. Many, many, many, many times. Softupdates is broken in -current > right now and has been since Kirk's last commit. :-P Why can't a filesystem hacker back it out until his return? Things are not getting better and this is tripping up more and more people. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: anyone seen these outside of alpha? or on non-SMP?
I applied Tor's patch and while it helped, I still got a panic ("deallocate_dependencies: active_pagedep", which, btw, Tor's patch added) partway into a buildworld. I've resigned to disabling softupdates on my machine for now =-( > -Original Message- > From: Bruce A. Mah [mailto:[EMAIL PROTECTED]] > Sent: Monday, June 04, 2001 4:56 PM > To: David Wolfskill > Cc: [EMAIL PROTECTED] > Subject: Re: anyone seen these outside of alpha? or on non-SMP? > > > If memory serves me right, David Wolfskill wrote: > > > >Someone should test and commit Tor's patch. I didn't have time to > > >check whether it fixed the problems before I left (and I'm sure as > > >hell not going to update back to -current remotely to > check myself :-) > > > > FWIW, I applied that patch to the -CURRENT side of my > laptop a couple > > of days ago. Since then, I've been able to do my daily > -CURRENT builds > > in multi-user mode, within an X environment, using -j4 on the "make > > buildworld" step. > > I did the patch on one of my scratch boxes, and it's allowed me to do > "make release" without the machine dying mid-way through. (i386, UP, > GENERIC kernel, softupdates enabled on all filesystems except /, > multi-user, no X). > > There was a bit of discussion when I reported this apparent > progress to > -current last week (look for a thread entitled "freelist corruption: > more info"). > > Bruce. > > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: anyone seen these outside of alpha? or on non-SMP?
If memory serves me right, David Wolfskill wrote: > >Someone should test and commit Tor's patch. I didn't have time to > >check whether it fixed the problems before I left (and I'm sure as > >hell not going to update back to -current remotely to check myself :-) > > FWIW, I applied that patch to the -CURRENT side of my laptop a couple > of days ago. Since then, I've been able to do my daily -CURRENT builds > in multi-user mode, within an X environment, using -j4 on the "make > buildworld" step. I did the patch on one of my scratch boxes, and it's allowed me to do "make release" without the machine dying mid-way through. (i386, UP, GENERIC kernel, softupdates enabled on all filesystems except /, multi-user, no X). There was a bit of discussion when I reported this apparent progress to -current last week (look for a thread entitled "freelist corruption: more info"). Bruce. PGP signature
Re: anyone seen these outside of alpha? or on non-SMP?
>Date: Mon, 4 Jun 2001 15:02:00 -0700 >From: Kris Kennaway <[EMAIL PROTECTED]> >Someone should test and commit Tor's patch. I didn't have time to >check whether it fixed the problems before I left (and I'm sure as >hell not going to update back to -current remotely to check myself :-) FWIW, I applied that patch to the -CURRENT side of my laptop a couple of days ago. Since then, I've been able to do my daily -CURRENT builds in multi-user mode, within an X environment, using -j4 on the "make buildworld" step. The previous several days, I often needed to do everything in single-user mode Granted, the "make buildworld" is generally the most strenuous thing I do in -CURRENT (I normally do my "real work" in -STABLE), but the patch certainly makes things better for me. Cheers, david -- David H. Wolfskill [EMAIL PROTECTED] As a computing professional, I believe it would be unethical for me to advise, recommend, or support the use (save possibly for personal amusement) of any product that is or depends on any Microsoft product. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: anyone seen these outside of alpha? or on non-SMP?
On Mon, Jun 04, 2001 at 02:25:41PM -0700, John Baldwin wrote: > > On 04-Jun-01 Matthew Jacob wrote: > > > > > > Data modified on freelist: word 2 of object 0xfe000190b780 size 72 > > previous type inodedep (0xd6adc0de != 0xdeadc0de) > > ... > > Data modified on freelist: word 2 of object 0xfe0001806700 size 72 > > previous type pagedep (0xd6adc0de != 0xdeadc0de) > > > > > > Anyone seen these on non-SMP? On i386? > > Yes. Many, many, many, many times. Softupdates is broken in -current right > now and has been since Kirk's last commit. :-P Someone should test and commit Tor's patch. I didn't have time to check whether it fixed the problems before I left (and I'm sure as hell not going to update back to -current remotely to check myself :-) Kris PGP signature
RE: anyone seen these outside of alpha? or on non-SMP?
On 04-Jun-01 Matthew Jacob wrote: > > > Data modified on freelist: word 2 of object 0xfe000190b780 size 72 > previous type inodedep (0xd6adc0de != 0xdeadc0de) > ... > Data modified on freelist: word 2 of object 0xfe0001806700 size 72 > previous type pagedep (0xd6adc0de != 0xdeadc0de) > > > Anyone seen these on non-SMP? On i386? Yes. Many, many, many, many times. Softupdates is broken in -current right now and has been since Kirk's last commit. :-P -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: anyone seen these outside of alpha? or on non-SMP?
Of course, Kris' message doesn't say "non-SMP" or "non-Alpha". I think I can assume, though, that it was non-Alpha :-). > > Whoops- I *did* look, but didn't see that one... sorry > > > > I believe so; see -current archives, such as > > > > >http://docs.freebsd.org/cgi/getmsg.cgi?fetch=73390+0+archive/2001/freebsd-current/20010603.freebsd-current > > > > Cheers, > > david > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: anyone seen these outside of alpha? or on non-SMP?
Whoops- I *did* look, but didn't see that one... sorry > I believe so; see -current archives, such as > > >http://docs.freebsd.org/cgi/getmsg.cgi?fetch=73390+0+archive/2001/freebsd-current/20010603.freebsd-current > > Cheers, > david > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
anyone seen these outside of alpha? or on non-SMP?
Data modified on freelist: word 2 of object 0xfe000190b780 size 72 previous type inodedep (0xd6adc0de != 0xdeadc0de) ... Data modified on freelist: word 2 of object 0xfe0001806700 size 72 previous type pagedep (0xd6adc0de != 0xdeadc0de) Anyone seen these on non-SMP? On i386? -matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message