The only thing I would add is that by AMD I didn't mean
Advanced Micro Devices. I meant /usr/sbin/amd. In my case
this behavior has been observed on a Pentium III and on a
K7, so it's CPU independent.

David Gilbert wrote:
> 
> I had reported this earlier, but the similarities are striking:
> 
> I too have seen strange AMD panics where stack variables inexplicably
> go to zero.  My systems are K6/2-400's, and I have often witnessed the
> following fault (only happens on a *really* busy web server)

The common denominator seems to be that the machine has to be very
active. VMware stresses the vm system quite a bit (64M of shared
memory with multiple processes digging around, etc). A very busy
web server is going to do a lot of context switching (I think?).
In that situation, it appears that the stack is being smashed.

I tried insulating the code where my machines go nuts inside of
splhigh() / splx(), but it didn't help.

Is your machine running the automounter?

> 
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:285
> #1  0xc014aad1 in panic (fmt=0xc023878a "page fault")
>     at ../../kern/kern_shutdown.c:446
> #2  0xc02098ce in trap_fatal (frame=0xcc74eecc, eva=134812896)
>     at ../../i386/i386/trap.c:942
> #3  0xc0209587 in trap_pfault (frame=0xcc74eecc, usermode=0, eva=134812896)
>     at ../../i386/i386/trap.c:835
> #4  0xc02091ba in trap (frame={tf_es = -887750640, tf_ds = -1036058608,
>       tf_edi = -1050208512, tf_esi = -1043943040, tf_ebp = -864751828,
>       tf_isp = -864751884, tf_ebx = 2287, tf_edx = -1036043576, tf_ecx = 0,
>       tf_eax = 134812884, tf_trapno = 12, tf_err = 2, tf_eip = -1072417321,
>       tf_cs = 8, tf_eflags = 66054, tf_esp = -1041509376, tf_ss = -1036024832})
>     at ../../i386/i386/trap.c:437
> #5  0xc01435d7 in fdcopy (p=0xcc5796e0) at ../../kern/kern_descrip.c:954
> #6  0xc014587b in fork1 (p1=0xcc5796e0, flags=-2147483596)
>     at ../../kern/kern_fork.c:379
> #7  0xc014533b in vfork (p=0xcc5796e0, uap=0xcc74ef94)
>     at ../../kern/kern_fork.c:109
> #8  0xc0209b17 in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 236237520,
>       tf_esi = 236231856, tf_ebp = -1077952324, tf_isp = -864751644,
>       tf_ebx = 673171048, tf_edx = 163766316, tf_ecx = 672877149, tf_eax = 66,
>       tf_trapno = 7, tf_err = 2, tf_eip = 672936705, tf_cs = 31,
>       tf_eflags = 514, tf_esp = -1077952368, tf_ss = 39})
>     at ../../i386/i386/trap.c:1100
> #9  0xc01feedc in Xint0x80_syscall ()
> 
> Now the interesting code here is at stack from #5:
> 
> (kgdb) list
> 948             fpp = newfdp->fd_ofiles;
> 949             for (i = newfdp->fd_lastfile; i-- >= 0; fpp++)
> 950                     if (*fpp != NULL)
> 951                             (*fpp)->f_count++;
> 
> (kgdb) p newfdp->fd_ofiles
> $1 = (struct file **) 0xc23f2000
> (kgdb) p fpp
> $2 = (struct file **) 0x0
> 
> Now... the only operation on fpp is fpp++.  It should take a _long_
> time for fpp to get around to 0 and you'd thing that *fpp would be
> zero long before that (or cause a page fault at some other
> non-existant location).
> 
> So... the similarity here is that deep in the kernel, we have a
> automatic (possibly register) local variable that's getting zero'd.
> 
> I have half-a-dozen crash dumps of this nature.  For me, it always
> happens in fdcopy().  This may be due to the fact that the machine is
> running a large apache config --- so fork() is something it's doing
> often.
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to