Thomas, We are seeing failures booting medium sized machines which I think is a change in expectations that dyntick put on x86's start_secondary.
During boot of cpus, we see an occassional panic in tick_do_broadcast at 195 if (!cpumask_empty(mask)) { 196 /* 197 * It might be necessary to actually check whether the devices 198 * have different broadcast functions. For now, just use the 199 * one of the first device. This works as long as we have this 200 * misfeature only on x86 (lapic) 201 */ 202 td = &per_cpu(tick_cpu_device, cpumask_first(mask)); 203 td->evtdev->broadcast(mask); ^^^^^^ NULL --------+ This is called from: 211 static void tick_do_periodic_broadcast(void) 212 { 213 raw_spin_lock(&tick_broadcast_lock); 214 215 cpumask_and(tmpmask, cpu_online_mask, tick_broadcast_mask); 216 tick_do_broadcast(tmpmask); Now the problem. In start_secondary, we have: 272 lock_vector_lock(); 273 set_cpu_online(smp_processor_id(), true); 274 unlock_vector_lock(); 275 per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE; 276 x86_platform.nmi_init(); 277 278 /* enable local interrupts */ 279 local_irq_enable(); 280 281 /* to prevent fake stack check failure in clock setup */ 282 boot_init_stack_canary(); 283 284 x86_cpuinit.setup_percpu_clockev(); So we have the cpu marked online on line 273, but evtdesc is not set until line 284. This code has been in start_secondary for a considerable period of time. I think it is just being revealed now. It does not show up with a normal config, but taking a 'make x86_64_defconfig' kernel and changing CONFIG_MAXSMP seems to change boot timing enouogh to make it reproducible on 4 socket and above machines. The following makes it boot, but I am not sure if this is the right thing to do. $ git diff diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 9c73b51..8456432 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -264,6 +264,8 @@ notrace static void __cpuinit start_secondary(void *unused) */ check_tsc_sync_target(); + x86_cpuinit.setup_percpu_clockev(); + /* * We need to hold vector_lock so there the set of online cpus * does not change while we are assigning vectors to cpus. Holding @@ -281,8 +283,6 @@ notrace static void __cpuinit start_secondary(void *unused) /* to prevent fake stack check failure in clock setup */ boot_init_stack_canary(); - x86_cpuinit.setup_percpu_clockev(); - wmb(); cpu_startup_entry(CPUHP_ONLINE); } Thanks, Robin Holt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/