Re: 4-way SMP broken ?

1999-06-10 Thread Richard Cownie
On Thu, 10 Jun 1999, Luoqi Chen wrote:
> Could you narrow down the crash further inside mp_start()? I'd like to
> know whether the crash occurred inside start_all_aps(). One or two lines of
> debug messages would be really helpful, you don't have to write down the exact
> words. Do you have options DDB enabled in the kernel? It helps to stop
> the last few lines of console messages to scroll of the screen.

Yes, I added more messages and it's inside start_all_aps() - it seems
to start AP #1 ok, then crashes while starting AP #2.  

> If possible, could you try a kernel built from sources with the
> POST_SMP_VMSHARE tag? I may have broken something during the commit.

Have to get out the door right now, will try this either tomorrow morning
or Monday.

Thanks
Richard Cownie


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: 4-way SMP broken ?

1999-06-10 Thread Luoqi Chen
> I have added more debugging messages, and the crash appears to be inside
> mp_start().  I don't have a log because this is too early in the boot 
> to get the messages saved anywhere, and they go by too quickly to
> write it down.  The evidence that this is an SMP problem is simple -
> with 2 cpu's plugged in, it works fine;  with 3 or 4 cpu's plugged in,
> it crashes.
> 
Could you narrow down the crash further inside mp_start()? I'd like to
know whether the crash occurred inside start_all_aps(). One or two lines of
debug messages would be really helpful, you don't have to write down the exact
words. Do you have options DDB enabled in the kernel? It helps to stop
the last few lines of console messages to scroll of the screen.

> I believe the hardware is fine because I was previously running 
> 19990421-CURRENT with all 4 cpu's without serious problems (it was
> a little unstable, but always booted ok).
> 
If possible, could you try a kernel built from sources with the
POST_SMP_VMSHARE tag? I may have broken something during the commit.

-lq


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: 4-way SMP broken ?

1999-06-10 Thread Richard Cownie
On Wed, 09 Jun 1999, Luoqi Chen wrote:
> > I've been trying to install 19990604-CURRENT on a couple of SC450NX
> > boxes.  It works fine with 2 cpu's, but an SMP kernel with 4 cpu's
> > falls over very quickly (I think while it's setting up the APIC
> > stuff, or very shortly after - the messages about APIC bus ids appear
> > on the screen very briefly, then the machine reboots itself).
> > 
> Do you mean messages like these?
> FreeBSD/SMP: Multiprocessor motherboard
>  cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfec08000
>  cpu1 (AP):  apic id: 12, version: 0x00040011, at 0xfec08000
>  io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0
> By the time you see these messages, all cpus should have been booted up
> successfully, any crash immediately follows is not likely to be SMP related.
> It's helpful to pinpoint the crash if you could include the last few lines
> from a verbose boot.

I have added more debugging messages, and the crash appears to be inside
mp_start().  I don't have a log because this is too early in the boot 
to get the messages saved anywhere, and they go by too quickly to
write it down.  The evidence that this is an SMP problem is simple -
with 2 cpu's plugged in, it works fine;  with 3 or 4 cpu's plugged in,
it crashes.

I believe the hardware is fine because I was previously running 
19990421-CURRENT with all 4 cpu's without serious problems (it was
a little unstable, but always booted ok).

> > Does anyone know a) when was the last time it worked on 4 cpu's
> > b) what's changed recently which might relate to this.

So if anyone has an answer to these questions I'd still be interested.

> > Also in trying to figure this out I looked at the DRAM probing
> > code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks
> > as though it's not safe for >2GB (e.g. comparisons of byte addresses
> > against signed "int end").  It would also be good if this probing

I've tried various hacks to this code, but have not succeeded in making it
work for 4GB.  Changing "int end" to "vm_offset_t end" is not sufficient.
It has a tendency to say "Too many holes in address space" ...  Even 
defining MAXMEM does not solve the problem.

Richard Cownie (t...@ma.ikos.com)


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: 4-way SMP broken ?

1999-06-09 Thread Jim Bryant
In reply:
> > interesting.  then why the delay in bringing up the AP?  Note in the
> > dmesg output below, that the AP only comes up during th SCSI delay.  I
> > have also added other comments to the following output.
> 
> The APs need the giant kernel lock when initializing the 
> local APIC and printing the "launched" message.
> 
> I added code for bringing up the APs earlier, but had to disable it,
> since it caused some machines to hang.  The APs were probably launched
> too early, causing the BSP to attempt to send IPIs before the local
> APIC was initialized).
> 
> A revised patch for bringing up the APs early is enclosed.
> 
> - Tor Egge
> 

> Index: mp_machdep.c
> ===
> RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v
> retrieving revision 1.102
> diff -u -r1.102 mp_machdep.c
> --- mp_machdep.c  1999/06/01 18:19:42 1.102
> +++ mp_machdep.c  1999/06/08 00:27:19

the second hunk rejects.

i show this:

-rw-r--r--  1 root  wheel  62767 Jun  1 23:38 
/usr/src/sys/i386/i386/mp_machdep.c

i'll apply this by hand a little later, i'll get back with you on
compatability.  I am using a Tyan S1696DLUA "Thunder2" motherboard.

jim
-- 
All opinions expressed are mine, if you|  "I will not be pushed, stamped,
think otherwise, then go jump into turbid  |  briefed, debriefed, indexed, or
radioactive waters and yell WAHOO !!!  |  numbered!" - #1, "The Prisoner"
--
Inet: jbry...@tfs.netAX.25: kc5...@wv0t.#neks.ks.usa.noam grid: EM28pw
voice: KC5VDJ - 6 & 2 Meters AM/FM/SSB, 70cm FM.   http://www.tfs.net/~jbryant
--
HF/6M/2M: IC-706-MkII, 2M: HTX-212, 2M: HTX-202, 70cm: HTX-404, Packet: KPC-3+


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: 4-way SMP broken ?

1999-06-09 Thread Jonathan Lemon
In article  
you write:
>Also in trying to figure this out I looked at the DRAM probing
>code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks
>as though it's not safe for >2GB (e.g. comparisons of byte addresses
>against signed "int end").

I just made this into a vm_offset_t, so it should be good for up to 4GB.
--
Jonathan


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: 4-way SMP broken ?

1999-06-09 Thread Tor . Egge

> interesting.  then why the delay in bringing up the AP?  Note in the
> dmesg output below, that the AP only comes up during th SCSI delay.  I
> have also added other comments to the following output.

The APs need the giant kernel lock when initializing the 
local APIC and printing the "launched" message.

I added code for bringing up the APs earlier, but had to disable it,
since it caused some machines to hang.  The APs were probably launched
too early, causing the BSP to attempt to send IPIs before the local
APIC was initialized).

A revised patch for bringing up the APs early is enclosed.

- Tor Egge

Index: mp_machdep.c
===
RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v
retrieving revision 1.102
diff -u -r1.102 mp_machdep.c
--- mp_machdep.c1999/06/01 18:19:42 1.102
+++ mp_machdep.c1999/06/08 00:27:19
@@ -494,6 +494,10 @@
 
 
 #if defined(APIC_IO)
+
+/* Wait for all APs to be fully initialized */
+extern int wait_ap(unsigned int);
+
 /*
  * Final configuration of the BSP's local APIC:
  *  - disable 'pic mode'.
@@ -526,6 +530,9 @@
 
if (bootverbose)
apic_dump("bsp_apic_configure()");
+   wait_ap(100);
+   if (smp_started == 0)
+   printf("WARNING: Failed to start all APs\n");
 }
 #endif  /* APIC_IO */
 
@@ -1743,9 +1750,6 @@
 #endif /* USE_CLOCKLOCK */
 }
 
-
-/* Wait for all APs to be fully initialized */
-extern int wait_ap(unsigned int);
 
 /*
  * start each AP in our list


Re: 4-way SMP broken ?

1999-06-09 Thread Luoqi Chen
> > Do you mean messages like these?
> > FreeBSD/SMP: Multiprocessor motherboard
> >  cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfec08000
> >  cpu1 (AP):  apic id: 12, version: 0x00040011, at 0xfec08000
> >  io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0
> > By the time you see these messages, all cpus should have been booted up
> > successfully, any crash immediately follows is not likely to be SMP related.
> > It's helpful to pinpoint the crash if you could include the last few lines
> > from a verbose boot.
> 
> interesting.  then why the delay in bringing up the AP?  Note in the
> dmesg output below, that the AP only comes up during th SCSI delay.  I
> have also added other comments to the following output.
> 
The APs are up, but not fully initialized. Initializations that require
holding of the giant lock are done near the end of the booting process,
until then the APs are just spinning around the lock. Tor Egge tried once
to move to an earlier time, but it didn't work well on some motherboards.

-lq


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: 4-way SMP broken ?

1999-06-09 Thread Jim Bryant
In reply:
> > Hi,
> > 
> > I've been trying to install 19990604-CURRENT on a couple of SC450NX
> > boxes.  It works fine with 2 cpu's, but an SMP kernel with 4 cpu's
> > falls over very quickly (I think while it's setting up the APIC
> > stuff, or very shortly after - the messages about APIC bus ids appear
> > on the screen very briefly, then the machine reboots itself).
> > 
> Do you mean messages like these?
> FreeBSD/SMP: Multiprocessor motherboard
>  cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfec08000
>  cpu1 (AP):  apic id: 12, version: 0x00040011, at 0xfec08000
>  io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0
> By the time you see these messages, all cpus should have been booted up
> successfully, any crash immediately follows is not likely to be SMP related.
> It's helpful to pinpoint the crash if you could include the last few lines
> from a verbose boot.

interesting.  then why the delay in bringing up the AP?  Note in the
dmesg output below, that the AP only comes up during th SCSI delay.  I
have also added other comments to the following output.

---
[Last night's kernel]

Copyright (c) 1992-1999 The FreeBSD Project.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
FreeBSD 4.0-CURRENT #7: Wed Jun  9 16:10:23 CDT 1999
jbry...@wahoo:/usr/src/sys/compile/WAHOO
Timecounter "i8254"  frequency 1192990 Hz
CPU: Pentium II/Xeon/Celeron (686-class CPU)
  Origin = "GenuineIntel"  Id = 0x650  Stepping=0
  
Features=0x183fbff
real memory  = 134217728 (131072K bytes)
avail memory = 126902272 (123928K bytes)
Programming 24 pins in IOAPIC #0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec0
Preloaded elf kernel "kernel" at 0xc0393000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc039309c.
DEVFS: ready for devices
Pentium Pro MTRR support enabled, default memory type is uncacheable
ipl: ERROR: driver has bogus cdevsw->d_maj = -1
 ^^ ??
ccd0-3: Concatenated disk drivers
Probing for PnP devices:
CSN 1 Vendor ID: YMH0802 [0x0208a865] Serial 0x Comp ID: PNPb02f 
[0x2fb0d041]
npx0:  on motherboard
npx0: INT 16 interface
pcib0:  on motherboard
pci0:  on pcib0
chip0:  at device 0.0 on pci0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
vga-pci0:  irq 2 at device 0.0 on pci1
isab0:  at device 7.0 on pci0
chip1:  at device 7.1 on pci0
uhci0:  irq 19 at device 7.2 on 
pci0
usb0:  on uhci0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
intpm0:  at device 7.3 on pci0
intpm0: I/O mapped fcb0
intpm0: intr IRQ 9 enabled revision 0
intsmb0: 
smbus0:  on intsmb0
intpm0: PM I/O mapped fc00 
ed0:  irq 17 at device 12.0 on pci0
ed0: address 00:00:e8:4e:0e:16, type NE2000 (16 bit) 
ahc0:  irq 19 at device 15.0 on pci0
ahc0: Using left over BIOS settings
ahc0: aic7895 Wide Channel A, SCSI Id=7, 255 SCBs
ahc1:  irq 16 at device 15.1 on pci0
ahc1: Using left over BIOS settings
ahc1: aic7895 Wide Channel B, SCSI Id=7, 255 SCBs
devclass_alloc_unit: ed0 already exists, using next available unit number
^ ??
isa0:  on motherboard
fdc0:  at port 0x3f0-0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> at fdc0 drive 0
ppc0 at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
plip0:  on ppbus 0
lpt0:  on ppbus 0
lpt0: Interrupt-driven port
ppi0:  on ppbus 0
lppps0:  on ppbus 0
sio0 at port 0x3f8-0x3ff irq 4 on isa0
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
joy0 at port 0x201 on isa0
joy0: joystick
atkbdc0:  at port 0x60-0x6f on isa0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
psm0:  irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
vga0:  on isa0
sc0:  at flags 0x6 on isa0
pca0 at port 0x40 on isa0
pca0: PC speaker audio driver
DEVFS: ready to run
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IO APIC int pin 2
APIC_IO: routing 8254 via 8259 on pin 0
^ Tyan Thunder2 
S1696DLUA Motherboard, Rogue?
IP packet filtering initialized, divert enabled, rule-based forwarding 
disabled, default to accept, unlimited logging
DUMMYNET initialized (990504)
ds0 XXX: driver didn't set ifq_maxlen
^ ???
BRIDGE 981214, have 12 interfaces
-- index 1 ed0 type 6 phy 0 addrl 6 addr 00.00.e8.4e.0e.16
IP Filter: initialized.  Default = pass all, Logging = enabled
Waiting 2 seconds for SCSI devices to settle
SM

Re: 4-way SMP broken ?

1999-06-09 Thread Luoqi Chen
> Hi,
> 
> I've been trying to install 19990604-CURRENT on a couple of SC450NX
> boxes.  It works fine with 2 cpu's, but an SMP kernel with 4 cpu's
> falls over very quickly (I think while it's setting up the APIC
> stuff, or very shortly after - the messages about APIC bus ids appear
> on the screen very briefly, then the machine reboots itself).
> 
Do you mean messages like these?
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfec08000
 cpu1 (AP):  apic id: 12, version: 0x00040011, at 0xfec08000
 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0
By the time you see these messages, all cpus should have been booted up
successfully, any crash immediately follows is not likely to be SMP related.
It's helpful to pinpoint the crash if you could include the last few lines
from a verbose boot.

> Does anyone know a) when was the last time it worked on 4 cpu's
> b) what's changed recently which might relate to this.
> 
> Also in trying to figure this out I looked at the DRAM probing
> code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks
> as though it's not safe for >2GB (e.g. comparisons of byte addresses
> against signed "int end").  It would also be good if this probing
> code was carefule not to ventrue past 4GB-64MB (PCI device space) -
> then a generic kernel could work on a 4GB machine without any tweaking,
> which would simplify installation - I get nervous shuffling DIMMs
> in and out of the machine ...
> 
> Thanks
>Richard Cownie (t...@ma.ikos.com)
> 

-lq


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



4-way SMP broken ?

1999-06-09 Thread Richard Cownie
Hi,

I've been trying to install 19990604-CURRENT on a couple of SC450NX
boxes.  It works fine with 2 cpu's, but an SMP kernel with 4 cpu's
falls over very quickly (I think while it's setting up the APIC
stuff, or very shortly after - the messages about APIC bus ids appear
on the screen very briefly, then the machine reboots itself).

Does anyone know a) when was the last time it worked on 4 cpu's
b) what's changed recently which might relate to this.

Also in trying to figure this out I looked at the DRAM probing
code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks
as though it's not safe for >2GB (e.g. comparisons of byte addresses
against signed "int end").  It would also be good if this probing
code was carefule not to ventrue past 4GB-64MB (PCI device space) -
then a generic kernel could work on a 4GB machine without any tweaking,
which would simplify installation - I get nervous shuffling DIMMs
in and out of the machine ...

Thanks
   Richard Cownie (t...@ma.ikos.com)


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message