Re: 4-way SMP broken ?
On Thu, 10 Jun 1999, Luoqi Chen wrote: > Could you narrow down the crash further inside mp_start()? I'd like to > know whether the crash occurred inside start_all_aps(). One or two lines of > debug messages would be really helpful, you don't have to write down the exact > words. Do you have options DDB enabled in the kernel? It helps to stop > the last few lines of console messages to scroll of the screen. Yes, I added more messages and it's inside start_all_aps() - it seems to start AP #1 ok, then crashes while starting AP #2. > If possible, could you try a kernel built from sources with the > POST_SMP_VMSHARE tag? I may have broken something during the commit. Have to get out the door right now, will try this either tomorrow morning or Monday. Thanks Richard Cownie To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: 4-way SMP broken ?
> I have added more debugging messages, and the crash appears to be inside > mp_start(). I don't have a log because this is too early in the boot > to get the messages saved anywhere, and they go by too quickly to > write it down. The evidence that this is an SMP problem is simple - > with 2 cpu's plugged in, it works fine; with 3 or 4 cpu's plugged in, > it crashes. > Could you narrow down the crash further inside mp_start()? I'd like to know whether the crash occurred inside start_all_aps(). One or two lines of debug messages would be really helpful, you don't have to write down the exact words. Do you have options DDB enabled in the kernel? It helps to stop the last few lines of console messages to scroll of the screen. > I believe the hardware is fine because I was previously running > 19990421-CURRENT with all 4 cpu's without serious problems (it was > a little unstable, but always booted ok). > If possible, could you try a kernel built from sources with the POST_SMP_VMSHARE tag? I may have broken something during the commit. -lq To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: 4-way SMP broken ?
On Wed, 09 Jun 1999, Luoqi Chen wrote: > > I've been trying to install 19990604-CURRENT on a couple of SC450NX > > boxes. It works fine with 2 cpu's, but an SMP kernel with 4 cpu's > > falls over very quickly (I think while it's setting up the APIC > > stuff, or very shortly after - the messages about APIC bus ids appear > > on the screen very briefly, then the machine reboots itself). > > > Do you mean messages like these? > FreeBSD/SMP: Multiprocessor motherboard > cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 > cpu1 (AP): apic id: 12, version: 0x00040011, at 0xfec08000 > io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0 > By the time you see these messages, all cpus should have been booted up > successfully, any crash immediately follows is not likely to be SMP related. > It's helpful to pinpoint the crash if you could include the last few lines > from a verbose boot. I have added more debugging messages, and the crash appears to be inside mp_start(). I don't have a log because this is too early in the boot to get the messages saved anywhere, and they go by too quickly to write it down. The evidence that this is an SMP problem is simple - with 2 cpu's plugged in, it works fine; with 3 or 4 cpu's plugged in, it crashes. I believe the hardware is fine because I was previously running 19990421-CURRENT with all 4 cpu's without serious problems (it was a little unstable, but always booted ok). > > Does anyone know a) when was the last time it worked on 4 cpu's > > b) what's changed recently which might relate to this. So if anyone has an answer to these questions I'd still be interested. > > Also in trying to figure this out I looked at the DRAM probing > > code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks > > as though it's not safe for >2GB (e.g. comparisons of byte addresses > > against signed "int end"). It would also be good if this probing I've tried various hacks to this code, but have not succeeded in making it work for 4GB. Changing "int end" to "vm_offset_t end" is not sufficient. It has a tendency to say "Too many holes in address space" ... Even defining MAXMEM does not solve the problem. Richard Cownie (t...@ma.ikos.com) To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: 4-way SMP broken ?
In reply: > > interesting. then why the delay in bringing up the AP? Note in the > > dmesg output below, that the AP only comes up during th SCSI delay. I > > have also added other comments to the following output. > > The APs need the giant kernel lock when initializing the > local APIC and printing the "launched" message. > > I added code for bringing up the APs earlier, but had to disable it, > since it caused some machines to hang. The APs were probably launched > too early, causing the BSP to attempt to send IPIs before the local > APIC was initialized). > > A revised patch for bringing up the APs early is enclosed. > > - Tor Egge > > Index: mp_machdep.c > === > RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v > retrieving revision 1.102 > diff -u -r1.102 mp_machdep.c > --- mp_machdep.c 1999/06/01 18:19:42 1.102 > +++ mp_machdep.c 1999/06/08 00:27:19 the second hunk rejects. i show this: -rw-r--r-- 1 root wheel 62767 Jun 1 23:38 /usr/src/sys/i386/i386/mp_machdep.c i'll apply this by hand a little later, i'll get back with you on compatability. I am using a Tyan S1696DLUA "Thunder2" motherboard. jim -- All opinions expressed are mine, if you| "I will not be pushed, stamped, think otherwise, then go jump into turbid | briefed, debriefed, indexed, or radioactive waters and yell WAHOO !!! | numbered!" - #1, "The Prisoner" -- Inet: jbry...@tfs.netAX.25: kc5...@wv0t.#neks.ks.usa.noam grid: EM28pw voice: KC5VDJ - 6 & 2 Meters AM/FM/SSB, 70cm FM. http://www.tfs.net/~jbryant -- HF/6M/2M: IC-706-MkII, 2M: HTX-212, 2M: HTX-202, 70cm: HTX-404, Packet: KPC-3+ To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: 4-way SMP broken ?
In article you write: >Also in trying to figure this out I looked at the DRAM probing >code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks >as though it's not safe for >2GB (e.g. comparisons of byte addresses >against signed "int end"). I just made this into a vm_offset_t, so it should be good for up to 4GB. -- Jonathan To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: 4-way SMP broken ?
> interesting. then why the delay in bringing up the AP? Note in the > dmesg output below, that the AP only comes up during th SCSI delay. I > have also added other comments to the following output. The APs need the giant kernel lock when initializing the local APIC and printing the "launched" message. I added code for bringing up the APs earlier, but had to disable it, since it caused some machines to hang. The APs were probably launched too early, causing the BSP to attempt to send IPIs before the local APIC was initialized). A revised patch for bringing up the APs early is enclosed. - Tor Egge Index: mp_machdep.c === RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v retrieving revision 1.102 diff -u -r1.102 mp_machdep.c --- mp_machdep.c1999/06/01 18:19:42 1.102 +++ mp_machdep.c1999/06/08 00:27:19 @@ -494,6 +494,10 @@ #if defined(APIC_IO) + +/* Wait for all APs to be fully initialized */ +extern int wait_ap(unsigned int); + /* * Final configuration of the BSP's local APIC: * - disable 'pic mode'. @@ -526,6 +530,9 @@ if (bootverbose) apic_dump("bsp_apic_configure()"); + wait_ap(100); + if (smp_started == 0) + printf("WARNING: Failed to start all APs\n"); } #endif /* APIC_IO */ @@ -1743,9 +1750,6 @@ #endif /* USE_CLOCKLOCK */ } - -/* Wait for all APs to be fully initialized */ -extern int wait_ap(unsigned int); /* * start each AP in our list
Re: 4-way SMP broken ?
> > Do you mean messages like these? > > FreeBSD/SMP: Multiprocessor motherboard > > cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 > > cpu1 (AP): apic id: 12, version: 0x00040011, at 0xfec08000 > > io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0 > > By the time you see these messages, all cpus should have been booted up > > successfully, any crash immediately follows is not likely to be SMP related. > > It's helpful to pinpoint the crash if you could include the last few lines > > from a verbose boot. > > interesting. then why the delay in bringing up the AP? Note in the > dmesg output below, that the AP only comes up during th SCSI delay. I > have also added other comments to the following output. > The APs are up, but not fully initialized. Initializations that require holding of the giant lock are done near the end of the booting process, until then the APs are just spinning around the lock. Tor Egge tried once to move to an earlier time, but it didn't work well on some motherboards. -lq To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: 4-way SMP broken ?
In reply: > > Hi, > > > > I've been trying to install 19990604-CURRENT on a couple of SC450NX > > boxes. It works fine with 2 cpu's, but an SMP kernel with 4 cpu's > > falls over very quickly (I think while it's setting up the APIC > > stuff, or very shortly after - the messages about APIC bus ids appear > > on the screen very briefly, then the machine reboots itself). > > > Do you mean messages like these? > FreeBSD/SMP: Multiprocessor motherboard > cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 > cpu1 (AP): apic id: 12, version: 0x00040011, at 0xfec08000 > io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0 > By the time you see these messages, all cpus should have been booted up > successfully, any crash immediately follows is not likely to be SMP related. > It's helpful to pinpoint the crash if you could include the last few lines > from a verbose boot. interesting. then why the delay in bringing up the AP? Note in the dmesg output below, that the AP only comes up during th SCSI delay. I have also added other comments to the following output. --- [Last night's kernel] Copyright (c) 1992-1999 The FreeBSD Project. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 4.0-CURRENT #7: Wed Jun 9 16:10:23 CDT 1999 jbry...@wahoo:/usr/src/sys/compile/WAHOO Timecounter "i8254" frequency 1192990 Hz CPU: Pentium II/Xeon/Celeron (686-class CPU) Origin = "GenuineIntel" Id = 0x650 Stepping=0 Features=0x183fbff real memory = 134217728 (131072K bytes) avail memory = 126902272 (123928K bytes) Programming 24 pins in IOAPIC #0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee0 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee0 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec0 Preloaded elf kernel "kernel" at 0xc0393000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc039309c. DEVFS: ready for devices Pentium Pro MTRR support enabled, default memory type is uncacheable ipl: ERROR: driver has bogus cdevsw->d_maj = -1 ^^ ?? ccd0-3: Concatenated disk drivers Probing for PnP devices: CSN 1 Vendor ID: YMH0802 [0x0208a865] Serial 0x Comp ID: PNPb02f [0x2fb0d041] npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 chip0: at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 vga-pci0: irq 2 at device 0.0 on pci1 isab0: at device 7.0 on pci0 chip1: at device 7.1 on pci0 uhci0: irq 19 at device 7.2 on pci0 usb0: on uhci0 uhub0 at usb0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered intpm0: at device 7.3 on pci0 intpm0: I/O mapped fcb0 intpm0: intr IRQ 9 enabled revision 0 intsmb0: smbus0: on intsmb0 intpm0: PM I/O mapped fc00 ed0: irq 17 at device 12.0 on pci0 ed0: address 00:00:e8:4e:0e:16, type NE2000 (16 bit) ahc0: irq 19 at device 15.0 on pci0 ahc0: Using left over BIOS settings ahc0: aic7895 Wide Channel A, SCSI Id=7, 255 SCBs ahc1: irq 16 at device 15.1 on pci0 ahc1: Using left over BIOS settings ahc1: aic7895 Wide Channel B, SCSI Id=7, 255 SCBs devclass_alloc_unit: ed0 already exists, using next available unit number ^ ?? isa0: on motherboard fdc0: at port 0x3f0-0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> at fdc0 drive 0 ppc0 at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold plip0: on ppbus 0 lpt0: on ppbus 0 lpt0: Interrupt-driven port ppi0: on ppbus 0 lppps0: on ppbus 0 sio0 at port 0x3f8-0x3ff irq 4 on isa0 sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 joy0 at port 0x201 on isa0 joy0: joystick atkbdc0: at port 0x60-0x6f on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 psm0: irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 vga0: on isa0 sc0: at flags 0x6 on isa0 pca0 at port 0x40 on isa0 pca0: PC speaker audio driver DEVFS: ready to run APIC_IO: Testing 8254 interrupt delivery APIC_IO: Broken MP table detected: 8254 is not connected to IO APIC int pin 2 APIC_IO: routing 8254 via 8259 on pin 0 ^ Tyan Thunder2 S1696DLUA Motherboard, Rogue? IP packet filtering initialized, divert enabled, rule-based forwarding disabled, default to accept, unlimited logging DUMMYNET initialized (990504) ds0 XXX: driver didn't set ifq_maxlen ^ ??? BRIDGE 981214, have 12 interfaces -- index 1 ed0 type 6 phy 0 addrl 6 addr 00.00.e8.4e.0e.16 IP Filter: initialized. Default = pass all, Logging = enabled Waiting 2 seconds for SCSI devices to settle SM
Re: 4-way SMP broken ?
> Hi, > > I've been trying to install 19990604-CURRENT on a couple of SC450NX > boxes. It works fine with 2 cpu's, but an SMP kernel with 4 cpu's > falls over very quickly (I think while it's setting up the APIC > stuff, or very shortly after - the messages about APIC bus ids appear > on the screen very briefly, then the machine reboots itself). > Do you mean messages like these? FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 cpu1 (AP): apic id: 12, version: 0x00040011, at 0xfec08000 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0 By the time you see these messages, all cpus should have been booted up successfully, any crash immediately follows is not likely to be SMP related. It's helpful to pinpoint the crash if you could include the last few lines from a verbose boot. > Does anyone know a) when was the last time it worked on 4 cpu's > b) what's changed recently which might relate to this. > > Also in trying to figure this out I looked at the DRAM probing > code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks > as though it's not safe for >2GB (e.g. comparisons of byte addresses > against signed "int end"). It would also be good if this probing > code was carefule not to ventrue past 4GB-64MB (PCI device space) - > then a generic kernel could work on a 4GB machine without any tweaking, > which would simplify installation - I get nervous shuffling DIMMs > in and out of the machine ... > > Thanks >Richard Cownie (t...@ma.ikos.com) > -lq To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
4-way SMP broken ?
Hi, I've been trying to install 19990604-CURRENT on a couple of SC450NX boxes. It works fine with 2 cpu's, but an SMP kernel with 4 cpu's falls over very quickly (I think while it's setting up the APIC stuff, or very shortly after - the messages about APIC bus ids appear on the screen very briefly, then the machine reboots itself). Does anyone know a) when was the last time it worked on 4 cpu's b) what's changed recently which might relate to this. Also in trying to figure this out I looked at the DRAM probing code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks as though it's not safe for >2GB (e.g. comparisons of byte addresses against signed "int end"). It would also be good if this probing code was carefule not to ventrue past 4GB-64MB (PCI device space) - then a generic kernel could work on a 4GB machine without any tweaking, which would simplify installation - I get nervous shuffling DIMMs in and out of the machine ... Thanks Richard Cownie (t...@ma.ikos.com) To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message