Re: panic: APIC: Previous IPI is stuck

2004-11-23 Thread Stephan Uphoff
On Tue, 2004-11-23 at 22:19, Doug White wrote:
> On Mon, 15 Nov 2004, Adrian Wontroba wrote:
> 
> > At work, I've just taken an old cast off NT server and used it as
> > a replacement for an equally elderly low end PC which performs an
> > important monitoring task.
> >
> > I took the opportunity to upgrade to 5.3 (5.3-RC2 now, yesterday's
> > 5.3-STABLE when I get to work again) rather than stay on 4.10-RELEASE.
> >
> > The rationale was this would be a nice resilient machine, demonstrating
> > how FreeBSD can extend the useful working life of aging hardware.
> >
> > The practice is that it it has now crashed three times in a couple of
> > days with "panic: APIC: Previous IPI is stuck", the most recent one
> > dragging me out from home early in a Monday morning.
> 
> Welcome to the club. This is a known problem with affects older, true 4
> proc machines.  Stephan Uphoff ([EMAIL PROTECTED]) has posted a patch to
> -current that seems to help. I have a Dell PE6500 (4x500MHz) I'm trying to
> get to duplicate the problem (and compile world without resetting) before
> I try the patch.  (Replacing a CPU has made it happy again, thankfully)
> 
> Dual proc hyperthreaded machines don't seem to be affected, or at least
> not as frequently.
> 
> I'd suggest trying the patch and see if that helps for you. It doesn't
> seem to be making things worse for people :)

The patch has a few testers and no "APIC: Previous IPI is stuck" panics
have been reported.

Hopefully I will be able to get a new patch out the next days that will
be optimized.

Once the new patch got some testing it will go into current.
( And hopefully I can MFC it later)

> > Over in current there are a couple of threads starting in late September
> > where a few people are suffering this problem.  Like them, I'm using an
> > old (1997) Pentium Pro multiprocessor, in my case a 4 way Fujitsu M700.
> >
> > The machine is running with the SMP kernel (ie GENERIC + SMP), 4BSD
> > scheduler, without preemption.
> >
> > I've set kern.sched.ipiwakeup.enabled=0 and crossed my fingers.
> >
> > I'm a SMP novice.  Would the machine become stable if I switched to a
> > non-SMP kernel?  Reliability is more important than speed in this case,
> > and the opportunity for experimentation close to zero.  Creditability
> > has already been damaged by the gvinum RAID5 experience (8-(
> >
> > I'm not knocking 5.3 - in all other respects it seems wonderful.
> >
> > "me too" diagnostics:
> >
> > kern.sched.name: 4BSD
> > kern.sched.quantum: 10
> > kern.sched.ipiwakeup.enabled: 1
> > kern.sched.ipiwakeup.requested: 858129
> > kern.sched.ipiwakeup.delivered: 858129
> > kern.sched.ipiwakeup.usemask: 1
> > kern.sched.ipiwakeup.useloop: 0
> > kern.sched.ipiwakeup.onecpu: 0
> > kern.sched.ipiwakeup.htt2: 0
> > kern.sched.followon: 0
> > kern.sched.pfollowons: 0
> > kern.sched.kgfollowons: 0
> > kern.sched.runq_fuzz: 1
> >
> > 
> >
> > MPTable, version 2.0.15
> >
> >  looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0008f000
> >  searching CMOS 'top of mem' @ 0x0008ec00 (571K)
> >  searching default 'top of mem' @ 0x0009fc00 (639K)
> >  searching BIOS @ 0x000f
> >
> >  MP FPS found in BIOS @ physical addr: 0x000fdc30
> >
> > 
> >
> > MP Floating Pointer Structure:
> >
> >   location: BIOS
> >   physical address: 0x000fdc30
> >   signature:'_MP_'
> >   length:   16 bytes
> >   version:  1.4
> >   checksum: 0x56
> >   mode: Virtual Wire
> >
> > 
> >
> > MP Config Table Header:
> >
> >   physical address: 0x0008f151
> >   signature:'PCMP'
> >   base table length:332
> >   version:  1.4
> >   checksum: 0x05
> >   OEM ID:   'Fujitsu '
> >   Product ID:   'Pro Server  '
> >   OEM table pointer:0x
> >   OEM table size:   0
> >   entry count:  30
> >   local APIC address:   0xfee0
> >   extended table length:0
> >   extended table checksum:  0
> >
> > 
> >
> > MP Config Base Table Entries:
> >
> > --
> > Processors: APIC ID Version State   Family  Model   Step
> > Flags
> >  3   0x11BSP, usable 6   1   9
> > 0xfbff
> >  0   0x11AP, usable  6   1   9
> > 0xfbff
> >  1   0x11AP, usable  6   1   9
> > 0xfbff
> >  2   0x11AP, usable  6   1   9
> > 0xfbff
> > --
> > Bus:Bus ID  Type
> >  0   PCI
> > 

Re: panic: APIC: Previous IPI is stuck

2004-11-23 Thread Doug White
On Mon, 15 Nov 2004, Adrian Wontroba wrote:

> At work, I've just taken an old cast off NT server and used it as
> a replacement for an equally elderly low end PC which performs an
> important monitoring task.
>
> I took the opportunity to upgrade to 5.3 (5.3-RC2 now, yesterday's
> 5.3-STABLE when I get to work again) rather than stay on 4.10-RELEASE.
>
> The rationale was this would be a nice resilient machine, demonstrating
> how FreeBSD can extend the useful working life of aging hardware.
>
> The practice is that it it has now crashed three times in a couple of
> days with "panic: APIC: Previous IPI is stuck", the most recent one
> dragging me out from home early in a Monday morning.

Welcome to the club. This is a known problem with affects older, true 4
proc machines.  Stephan Uphoff ([EMAIL PROTECTED]) has posted a patch to
-current that seems to help. I have a Dell PE6500 (4x500MHz) I'm trying to
get to duplicate the problem (and compile world without resetting) before
I try the patch.  (Replacing a CPU has made it happy again, thankfully)

Dual proc hyperthreaded machines don't seem to be affected, or at least
not as frequently.

I'd suggest trying the patch and see if that helps for you. It doesn't
seem to be making things worse for people :)

> Over in current there are a couple of threads starting in late September
> where a few people are suffering this problem.  Like them, I'm using an
> old (1997) Pentium Pro multiprocessor, in my case a 4 way Fujitsu M700.
>
> The machine is running with the SMP kernel (ie GENERIC + SMP), 4BSD
> scheduler, without preemption.
>
> I've set kern.sched.ipiwakeup.enabled=0 and crossed my fingers.
>
> I'm a SMP novice.  Would the machine become stable if I switched to a
> non-SMP kernel?  Reliability is more important than speed in this case,
> and the opportunity for experimentation close to zero.  Creditability
> has already been damaged by the gvinum RAID5 experience (8-(
>
> I'm not knocking 5.3 - in all other respects it seems wonderful.
>
> "me too" diagnostics:
>
> kern.sched.name: 4BSD
> kern.sched.quantum: 10
> kern.sched.ipiwakeup.enabled: 1
> kern.sched.ipiwakeup.requested: 858129
> kern.sched.ipiwakeup.delivered: 858129
> kern.sched.ipiwakeup.usemask: 1
> kern.sched.ipiwakeup.useloop: 0
> kern.sched.ipiwakeup.onecpu: 0
> kern.sched.ipiwakeup.htt2: 0
> kern.sched.followon: 0
> kern.sched.pfollowons: 0
> kern.sched.kgfollowons: 0
> kern.sched.runq_fuzz: 1
>
> 
>
> MPTable, version 2.0.15
>
>  looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0008f000
>  searching CMOS 'top of mem' @ 0x0008ec00 (571K)
>  searching default 'top of mem' @ 0x0009fc00 (639K)
>  searching BIOS @ 0x000f
>
>  MP FPS found in BIOS @ physical addr: 0x000fdc30
>
> 
>
> MP Floating Pointer Structure:
>
>   location:   BIOS
>   physical address:   0x000fdc30
>   signature:  '_MP_'
>   length: 16 bytes
>   version:1.4
>   checksum:   0x56
>   mode:   Virtual Wire
>
> 
>
> MP Config Table Header:
>
>   physical address:   0x0008f151
>   signature:  'PCMP'
>   base table length:  332
>   version:1.4
>   checksum:   0x05
>   OEM ID: 'Fujitsu '
>   Product ID: 'Pro Server  '
>   OEM table pointer:  0x
>   OEM table size: 0
>   entry count:30
>   local APIC address: 0xfee0
>   extended table length:  0
>   extended table checksum:0
>
> 
>
> MP Config Base Table Entries:
>
> --
> Processors:   APIC ID Version State   Family  Model   Step
> Flags
>3   0x11BSP, usable 6   1   9
> 0xfbff
>0   0x11AP, usable  6   1   9
> 0xfbff
>1   0x11AP, usable  6   1   9
> 0xfbff
>2   0x11AP, usable  6   1   9
> 0xfbff
> --
> Bus:  Bus ID  Type
>0   PCI
>1   PCI
>2   EISA
> --
> I/O APICs:APIC ID Version State   Address
>8   0x11usable  0xfec0
>9   0x11usable  0xfec0c000
> --
> I/O Ints: TypePolarityTrigger Bus ID   IRQAPIC ID PIN#
>   ExtINT  active-hiedge2 0  80
>   INT  conformsconforms2 1  81
>   INT  conformsconforms2 2  82
>   INT   

Re: panic: APIC: Previous IPI is stuck

2004-11-15 Thread Adrian Wontroba
On Mon, Nov 15, 2004 at 04:49:56PM +1100, Andy Farkas wrote:
> [freebsd.org is rejecting my email (cant find hostname)
> so please feel free to copy this to the list]

So quoted in full.

> On Mon, 15 Nov 2004, Adrian Wontroba wrote:
> ...
> > The practice is that it it has now crashed three times in a couple of
> > days with "panic: APIC: Previous IPI is stuck", the most recent one
> > dragging me out from home early in a Monday morning.
> 
> /me raises hand
> 
> I still get panics too (5.3-STABLE cvsup'd last thursday).
> At one stage I thought it was fixed, but I was wrong.
> My box does not reboot itself either.
> 
> > Over in current there are a couple of threads starting in late September
> > where a few people are suffering this problem.  Like them, I'm using an
> > old (1997) Pentium Pro multiprocessor, in my case a 4 way Fujitsu M700.
> >
> > The machine is running with the SMP kernel (ie GENERIC + SMP), 4BSD
> > scheduler, without preemption.
> 
> Robert Watson has said it happens on his 4-way xeon box,
> so its not the "old hardware" thats to blame. (My box is
> an old Dell quad-ppro too). Something changed in the code
> around the end of August this year.
> 
> > I've set kern.sched.ipiwakeup.enabled=0 and crossed my fingers.
> 
> Doesn't help. I already tried. Panic will still happen.

Ah.  Will it last the day I wonder?

> > I'm a SMP novice.  Would the machine become stable if I switched to a
> > non-SMP kernel?  Reliability is more important than speed in this case,
> > and the opportunity for experimentation close to zero.  Creditability
> > has already been damaged by the gvinum RAID5 experience (8-(
> 
> A UP kernel will probably run forever. The IPI panic can
> only happen on SMP kernels.

Thanks.  I'll switch back to GENERIC.

> > I'm not knocking 5.3 - in all other respects it seems wonderful.
> 
> I'm not knocking 5.3 either, but it seems to its not quite
> stable. Its more of ".0" release, where things are still
> getting ironed out (like gvinum, which I also have problems
> with).

"RELENG_4: Time to die" - for all kinds of good reasons.  It was time
for 5-STABLE. The future release plan looks promising, but there
is still the age old problem - how do you get the more of the user
population to try out and find the problems in new versions before they
acquire -RELEASE status?

"Mea culpa" - I no longer have a "crash box".  Time to get my mail
off my own PPro (uniprocessor) box to free it up as such.  If I had
done this, I would have run into the vinum / gvinum issues in a less
embarrassing fashion.

> Stephan, you mentioned that the IPI code needs rewriting in order to
> fix this problem... how's it going?
>
> - andyf

-- 
Adrian Wontroba
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"