Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-08 Thread Tor . Egge

> Why can't a filesystem hacker back it out until his return?  Things are
> not getting better and this is tripping up more and more people.

The enclosed patch might help somewhat against the "active pagedep"
panics introduced in revision 1.98 of ffs_softdep.c.  Instead of a
panic, a message is printed and the pagedep structure isn't freed (it
will be freed later by free_newdirblk()).

- Tor Egge



Index: sys/ufs/ffs/ffs_softdep.c
===
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_softdep.c,v
retrieving revision 1.98
diff -u -r1.98 ffs_softdep.c
--- sys/ufs/ffs/ffs_softdep.c   2001/06/05 01:49:37 1.98
+++ sys/ufs/ffs/ffs_softdep.c   2001/06/07 18:30:16
@@ -1932,14 +1932,16 @@
WORKLIST_INSERT(&inodedep->id_bufwait,
&dirrem->dm_list);
}
+   
+   WORKLIST_REMOVE(&pagedep->pd_list);
if ((pagedep->pd_state & NEWBLOCK) != 0) {
-   FREE_LOCK(&lk);
-   panic("deallocate_dependencies: "
- "active pagedep");
+   /* XXX: Wait for newdirblk to be freed */
+   printf("deallocate_dependencies: "
+  "active pagedep\n");
+   } else {
+   LIST_REMOVE(pagedep, pd_hash);
+   WORKITEM_FREE(pagedep, D_PAGEDEP);
}
-   WORKLIST_REMOVE(&pagedep->pd_list);
-   LIST_REMOVE(pagedep, pd_hash);
-   WORKITEM_FREE(pagedep, D_PAGEDEP);
continue;
 
case D_ALLOCINDIR:



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-06 Thread Tor . Egge


> My guess would be that the inode in question is a directory inode,
> and that there are temp files there, or a lot of open files, but
> that is just a ballpark guess.

Correct.  A sample program to reproduce this problem is enclosed.
When a diradd dependency that causes a newdirblk dependency to be
allocated is made obsolete in newdirrem(), the pagedep structure is
likely to be freed without first removing the newdirblk dependency
that still points to the pagedep structure.

- Tor Egge




#!/bin/sh

dovmstat() {
  vmstat -m |  awk '/^ *(mkdir|newdirblk|dirrem|diradd|pagedep)/ { print }'
}

dovmstat
rm -rf a
dirrems=`vmstat -m |  awk '/^ *dirrem/ { print $2 }'`
while test $dirrems -gt 0
do
  sync
  sleep 1
  dirrems=`vmstat -m |  awk '/^ *dirrem/ { print $2 }'`
done
mkdir a
mkdirs=`vmstat -m |  awk '/^ *mkdir/ { print $2 }'`
while test $mkdirs -gt 0
do
  sync
  sleep 1
  mkdirs=`vmstat -m |  awk '/^ *mkdir/ { print $2 }'`
done
dovmstat
touch a/000
dovmstat
touch a/001
dovmstat
touch a/002
dovmstat
touch a/003
dovmstat
touch a/004
dovmstat
touch a/005
dovmstat
touch a/006
dovmstat
touch a/007
dovmstat
touch a/007
dovmstat
touch a/008
dovmstat
touch a/009
dovmstat
touch a/00a
dovmstat
touch a/00b
dovmstat
touch a/00c
dovmstat
touch a/00d
dovmstat
touch a/00e
dovmstat
touch a/00f
dovmstat
rm a/00f
dovmstat
ls -ld a
dovmstat
rm -rf a
dovmstat
echo FINISHED



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-05 Thread Terry Lambert

] Data modified on freelist: word 2 of object 0xfe000190b780 size 72
] previous type inodedep (0xd6adc0de != 0xdeadc0de)
] ...
] Data modified on freelist: word 2 of object 0xfe0001806700 size 72
] previous type pagedep (0xd6adc0de != 0xdeadc0de)
] 
] 
] Anyone seen these on non-SMP? On i386?

Yes.

I have seen this on 4.3, after opening more than 32,767 network
connections, only in my case the problem occurred in the close,
after the credential structure reference count overflowed.

There will probably be significantly more of these problems in
-current, since much of the recent locking work has been a bit
less than comprehensive, so there are probably free races in a
lot of places that used to be implicitly protected via past
serialization through the BGL.

There are exactly 12 structures 72 bytes long in the FreeBSD
kernel:

struct rusage= 72
struct nameidata= 72
struct ifpppstatsreq = 72
struct ifpppcstatsreq = 72
struct sadb_comb = 72
struct ddpcb = 72
struct atmsetreq = 72
struct linkinfo = 72
struct ng_one2many_config = 72
struct ng_ppp_mp_state = 72
struct ipfw_dyn_rule = 72
struct secasvar = 72


Despite the obvious involvement os the soft updates code (there
is a reference counted object reference underflow, which resulted
in the data being in use after nominally being freed), my money
is on the "struct rusage" on a process exit.

This implies a race condition in the sync'ing of data for files
being resource-tracking closed as a result of the process exit
triggering a dependency failure.

My guess would be that the inode in question is a directory inode,
and that there are temp files there, or a lot of open files, but
that is just a ballpark guess.

--
My first suggestion would be to turn the printf() as a result of
INVARIANTS (which is where the message is coming from, in the
kern_malloc.c code) into a true panic, since what you are seeing
isundoubtedly a cascade failure.  This will (if you have the
debugger enabled) let you examine the object that is being spam'med
before it gets stepped on into illegibility.  Knowing the size will
let you catch the allocation.
--



Note that in the message referenced by David, the errors were on
different objects; I'll guess at decoding them, as well:

May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 
0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de)
May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 
0xc16f02c0 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de)
May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 
0xc1a60480 size 52 previous type pagedep (0xd6adc0de != 0xdeadc0de)


...these are 64 and 52 bytes each -- different structures.  Here
are the probables:

struct iodone_chain= 52
struct lockf= 52
struct protosw= 52
struct attr_calling = 52
struct ng_bpf_hookprog = 52
struct mrtstat = 52
struct ipprotosw = 52
struct udpstat = 52
struct ip6protosw = 52

struct ostat= 64
struct ifaliasreq = 64
struct at_aliasreq = 64
struct attr_traffic = 64
struct atm_sock_stat = 64
struct ng_type = 64
struct ng_pptpgre_stats = 64
struct in_aliasreq = 64
struct in6_prefixreq = 64
struct ip6_pktopts = 64

...my ballpark bets, again, would be:

struct lockf= 52
struct ostat= 64


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-05 Thread Matthew Jacob


> On Mon, Jun 04, 2001 at 09:37:36PM -0700, Matthew Jacob wrote:
> > 
> > It's an easy fix except if it's your root fs- turn off softupdates.
> 
> Yeah that's the solution -- just keep disabling features.  How far do we

Oh, c'mon Dave, take a pill... It's only 'til Kirk gets back from rafting...

> go?  Disable FFS, disable the VM system.  Well, I might be left with
> enough to get a printf() to display something.

Nope- no console device... What we'll be able to do is to blink the
keyboard LEDs in morse code though F...Y ... D  O  




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread David O'Brien

On Mon, Jun 04, 2001 at 09:37:36PM -0700, Matthew Jacob wrote:
> 
> It's an easy fix except if it's your root fs- turn off softupdates.

Yeah that's the solution -- just keep disabling features.  How far do we
go?  Disable FFS, disable the VM system.  Well, I might be left with
enough to get a printf() to display something.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread Matthew Jacob


It's an easy fix except if it's your root fs- turn off softupdates.


On Mon, 4 Jun 2001, David O'Brien wrote:

> On Mon, Jun 04, 2001 at 02:25:41PM -0700, John Baldwin wrote:
> > Yes.  Many, many, many, many times.  Softupdates is broken in -current
> > right now and has been since Kirk's last commit.  :-P
> 
> Why can't a filesystem hacker back it out until his return?  Things are
> not getting better and this is tripping up more and more people.
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread David O'Brien

On Mon, Jun 04, 2001 at 02:25:41PM -0700, John Baldwin wrote:
> Yes.  Many, many, many, many times.  Softupdates is broken in -current
> right now and has been since Kirk's last commit.  :-P

Why can't a filesystem hacker back it out until his return?  Things are
not getting better and this is tripping up more and more people.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



RE: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread Long, Scott

I applied Tor's patch and while it helped, I still got a panic
("deallocate_dependencies: active_pagedep", which, btw, Tor's patch added)
partway into a buildworld.  I've resigned to disabling softupdates on my
machine for now =-(

> -Original Message-
> From: Bruce A. Mah [mailto:[EMAIL PROTECTED]]
> Sent: Monday, June 04, 2001 4:56 PM
> To: David Wolfskill
> Cc: [EMAIL PROTECTED]
> Subject: Re: anyone seen these outside of alpha? or on non-SMP? 
> 
> 
> If memory serves me right, David Wolfskill wrote:
> 
> > >Someone should test and commit Tor's patch.  I didn't have time to
> > >check whether it fixed the problems before I left (and I'm sure as
> > >hell not going to update back to -current remotely to 
> check myself :-)
> > 
> > FWIW, I applied that patch to the -CURRENT side of my 
> laptop a couple
> > of days ago.  Since then, I've been able to do my daily 
> -CURRENT builds
> > in multi-user mode, within an X environment, using -j4 on the "make
> > buildworld" step.
> 
> I did the patch on one of my scratch boxes, and it's allowed me to do 
> "make release" without the machine dying mid-way through.  (i386, UP, 
> GENERIC kernel, softupdates enabled on all filesystems except /, 
> multi-user, no X).
> 
> There was a bit of discussion when I reported this apparent 
> progress to
> -current last week (look for a thread entitled "freelist corruption:
> more info").
> 
> Bruce.
> 
> 
> 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread Bruce A. Mah

If memory serves me right, David Wolfskill wrote:

> >Someone should test and commit Tor's patch.  I didn't have time to
> >check whether it fixed the problems before I left (and I'm sure as
> >hell not going to update back to -current remotely to check myself :-)
> 
> FWIW, I applied that patch to the -CURRENT side of my laptop a couple
> of days ago.  Since then, I've been able to do my daily -CURRENT builds
> in multi-user mode, within an X environment, using -j4 on the "make
> buildworld" step.

I did the patch on one of my scratch boxes, and it's allowed me to do 
"make release" without the machine dying mid-way through.  (i386, UP, 
GENERIC kernel, softupdates enabled on all filesystems except /, 
multi-user, no X).

There was a bit of discussion when I reported this apparent progress to
-current last week (look for a thread entitled "freelist corruption:
more info").

Bruce.



 PGP signature


Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread David Wolfskill

>Date: Mon, 4 Jun 2001 15:02:00 -0700
>From: Kris Kennaway <[EMAIL PROTECTED]>

>Someone should test and commit Tor's patch.  I didn't have time to
>check whether it fixed the problems before I left (and I'm sure as
>hell not going to update back to -current remotely to check myself :-)

FWIW, I applied that patch to the -CURRENT side of my laptop a couple
of days ago.  Since then, I've been able to do my daily -CURRENT builds
in multi-user mode, within an X environment, using -j4 on the "make
buildworld" step.

The previous several days, I often needed to do everything in
single-user mode

Granted, the "make buildworld" is generally the most strenuous thing I
do in -CURRENT (I normally do my "real work" in -STABLE), but the patch
certainly makes things better for me.

Cheers,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
As a computing professional, I believe it would be unethical for me to
advise, recommend, or support the use (save possibly for personal
amusement) of any product that is or depends on any Microsoft product.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread Kris Kennaway

On Mon, Jun 04, 2001 at 02:25:41PM -0700, John Baldwin wrote:
> 
> On 04-Jun-01 Matthew Jacob wrote:
> > 
> > 
> > Data modified on freelist: word 2 of object 0xfe000190b780 size 72
> > previous type inodedep (0xd6adc0de != 0xdeadc0de)
> > ...
> > Data modified on freelist: word 2 of object 0xfe0001806700 size 72
> > previous type pagedep (0xd6adc0de != 0xdeadc0de)
> > 
> > 
> > Anyone seen these on non-SMP? On i386?
> 
> Yes.  Many, many, many, many times.  Softupdates is broken in -current right
> now and has been since Kirk's last commit.  :-P

Someone should test and commit Tor's patch.  I didn't have time to
check whether it fixed the problems before I left (and I'm sure as
hell not going to update back to -current remotely to check myself :-)

Kris

 PGP signature


RE: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread John Baldwin


On 04-Jun-01 Matthew Jacob wrote:
> 
> 
> Data modified on freelist: word 2 of object 0xfe000190b780 size 72
> previous type inodedep (0xd6adc0de != 0xdeadc0de)
> ...
> Data modified on freelist: word 2 of object 0xfe0001806700 size 72
> previous type pagedep (0xd6adc0de != 0xdeadc0de)
> 
> 
> Anyone seen these on non-SMP? On i386?

Yes.  Many, many, many, many times.  Softupdates is broken in -current right
now and has been since Kirk's last commit.  :-P

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread Matthew Jacob


Of course, Kris' message doesn't say "non-SMP" or "non-Alpha". I think I can
assume, though, that it was non-Alpha :-).


> 
> Whoops- I *did* look, but didn't see that one... sorry
> 
> 
> > I believe so; see -current archives, such as
> > 
> > 
>http://docs.freebsd.org/cgi/getmsg.cgi?fetch=73390+0+archive/2001/freebsd-current/20010603.freebsd-current
> > 
> > Cheers,
> > david
> > 
> 
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread Matthew Jacob


Whoops- I *did* look, but didn't see that one... sorry


> I believe so; see -current archives, such as
> 
> 
>http://docs.freebsd.org/cgi/getmsg.cgi?fetch=73390+0+archive/2001/freebsd-current/20010603.freebsd-current
> 
> Cheers,
> david
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



anyone seen these outside of alpha? or on non-SMP?

2001-06-04 Thread Matthew Jacob



Data modified on freelist: word 2 of object 0xfe000190b780 size 72
previous type inodedep (0xd6adc0de != 0xdeadc0de)
...
Data modified on freelist: word 2 of object 0xfe0001806700 size 72
previous type pagedep (0xd6adc0de != 0xdeadc0de)


Anyone seen these on non-SMP? On i386?

-matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message