> > - boot with kmdb, use F1-A to break into kmdb while the startup attempt for 
> > the
> >   second core is hanging, and simply ":c" continue:
> >   system startup continues and both cpu cores are online
> >
> >   (why does dropping into kmdb fix this problem?)
> 
> What does ::cpuinfo -v show when you drop in kmdb in
> this case?


It boots like this (btw, an snv_40 system, bfu'ed to opensolaris-20060626):

SunOS Release 5.11 Version wos_b44 32-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
features: 
1167fdf<cpuid,cmp,sse3,nx,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,pge,mtrr,msr,tsc,lgpg>
cpuid 0: initialized cpumod: cpu.generic
mem = 2088252K (0x7f74f000)
root nexus = i86pc
...
8042 device:  [EMAIL PROTECTED], kb8042 # 0
kb80420 is /isa/[EMAIL PROTECTED],60/[EMAIL PROTECTED]
8042 device:  [EMAIL PROTECTED], mouse8042 # 0
mouse80420 is /isa/[EMAIL PROTECTED],60/[EMAIL PROTECTED]
NOTICE: Kernel debugger present: disabling console power management.
pcplusmp: pciclass,0c0320 (ehci) instance 0 vector 0x17 ioapic 0x2 intin 0x17 
is bound to cpu 1
PCI Express-device: pci8086,[EMAIL PROTECTED],7, ehci0
ehci0 is /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED],7
PCI Express-device: pci8086,[EMAIL PROTECTED], uhci0
uhci0 is /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]
pcplusmp: pciclass,0c0300 (uhci) instance 1 vector 0x13 ioapic 0x2 intin 0x13 
is bound to cpu 0
PCI Express-device: pci8086,[EMAIL PROTECTED],1, uhci1
uhci1 is /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED],1
pcplusmp: pciclass,0c0300 (uhci) instance #2 vector 0x12 ioapic 0x2 intin 0x12 
is bound to cpu 0
PCI Express-device: pci8086,[EMAIL PROTECTED],2, uhci2
uhci2 is /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED],2
pcplusmp: pciclass,0c0300 (uhci) instance #3 vector 0x10 ioapic 0x2 intin 0x10 
is bound to cpu 0
PCI Express-device: pci8086,[EMAIL PROTECTED],3, uhci3
uhci3 is /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED],3
cpu0: x86 (chipid 0x0 GenuineIntel family 6 model 14 step 8 clock 2000 MHz)
cpu0: Intel(r) CPU           T2500  @ 2.00GHz



That's the last message on the screen.
I think cpu0 is now looping here, waiting 20 seconds for cpu1 to be added to
"procset" variable:
http://cvs.opensolaris.org/source/xref/on/usr/src/uts/i86pc/os/mp_startup.c#982


When I type "F1-A" on the PS/2 keyboard before the 20 second wait has
completed, it drops into kmdb and output on the screen looks something like 
this:

cpu0: x86 (chipid 0x0 GenuineIntel family 6 model 14 step 8 clock 2000 MHz)
cpu0: Intel(r) CPU           T2500  @ 2.00GHz
{ I type F1-A here }
{ .... some kmdb stuff about new kernel modules .... }
[0]>  cpu1: x86 (chipid 0x0 GenuineIntel family 6 model 14 step 8 clock 2000 
MHz)
cpu1: Intel(r) CPU           T2500  @ 2.00GHz


Note the cpu1 messages, *after* the [0]> kmdb prompt.


That is, as soon as I drop into kmdb, the two log messages printed by
the cmn_err(CE_CONT, ...) call in init_cpu_info()...

http://cvs.opensolaris.org/source/xref/on/usr/src/uts/i86pc/os/mp_startup.c#init_cpu_info

... appear on the console screen, after the kmdb prompt was printed!


cpuinfo output, manually copied from the console screen

> ::cpuinfo -v
 ID ADDR     FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD   PROC
  0 fec20b0c  1b    0    0 104   no    no t-0    d0594de0
               |         |
    RUNNING <--+         +-- .....
      READY
     EXISTS
     ENABLE

 ID ADDR     FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD   PROC
  1 d0c86800   0    0   10  99   no    no -      d0e25de0



When I print the contents of the variable "procset", I see the value of "3".

I seems as if the initialization code for cpu1 is waiting/hanging in the 
"init_cpu_info()" call,
http://cvs.opensolaris.org/source/xref/on/usr/src/uts/i86pc/os/mp_startup.c#1133
before cpu1 is added to the procset, at line 1136:

   1133         init_cpu_info(cp);                  <<<<< hangs inside this 
function
   1134 
   1135         mutex_enter(&cpu_lock);
   1136         CPUSET_ADD(procset, cp->cpu_id);    <<<<< this is what 
start_other_cpus() is waiting for
   1137         mutex_exit(&cpu_lock);


Somehow, dropping into kmdb seem to "unblock" the hanging cmn_err call,
and initialization for cpu core 1 completes.  Hmm, I guess since we're just
trying to start all the cpus in the system, kmdb does not yet know about the new
cpu #1 and doesn't stop cpu 1 when I drop into kmdb.  While I'm at the kmdb 
prompt, cpu 1 initialization completes in the background.
 
 
This message posted from opensolaris.org
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to