Re: [BUG] NMI watchdog lockups caused by mwait_idle
Pallipadi, Venkatesh wrote: > Darrick, > > I tried 2.6.20-rc4 on a Dempsey system here in my lab and it worked > fine. No watchdog lockups. > Can you try idle routine with hlt instead of mwait. There is no boot > option for this in x86_64, but you can change > arch/x86_64/kernel/process.c:select_idle_routine() not to enable mwait. > With that default kernel should use hlt based idle. > > Also, worth seeing will be, what happens when nmi_watchdog=0, > nmi_watchdog=1, and nmi_watchdog=2 boot options. That should tell us > whether nmi_watchdog is raising some false alarm or the CPUs are indeed > getting locked up here.. > Locks up with hlt-based idle too. :( Here's what I get with nmi_watchdog=0: [ 206.088703] BUG: soft lockup detected on CPU#0! [ 206.093284] [ 206.093286] Call Trace: [ 206.097324][] softlockup_tick+0xd4/0xe9 [ 206.103618] [] do_flush_tlb_all+0x0/0x68 [ 206.109238] [] run_local_timers+0x13/0x15 [ 206.114949] [] update_process_times+0x4c/0x78 [ 206.121008] [] smp_local_timer_interrupt+0x34/0x51 [ 206.127498] [] smp_apic_timer_interrupt+0x49/0x60 [ 206.133901] [] apic_timer_interrupt+0x66/0x70 [ 206.139956][] __smp_call_function+0x66/0x87 [ 206.146594] [] __smp_call_function+0x62/0x87 [ 206.152564] [] do_flush_tlb_all+0x0/0x68 [ 206.158188] [] do_flush_tlb_all+0x0/0x68 [ 206.163813] [] smp_call_function+0x32/0x49 [ 206.169611] [] do_flush_tlb_all+0x0/0x68 [ 206.175236] [] on_each_cpu+0x30/0x67 [ 206.180514] [] flush_tlb_all+0x1c/0x1e [ 206.185965] [] unmap_vm_area+0x1c3/0x265 [ 206.191590] [] init_level4_pgt+0xc20/0x1000 [ 206.197474] [] remove_vm_area+0x41/0x67 [ 206.203010] [] iounmap+0x8e/0xc8 [ 206.207933] [] acpi_os_unmap_memory+0x9/0xb [ 206.213810] [] acpi_ev_system_memory_region_setup+0x52/0x105 [ 206.221174] [] acpi_ut_delete_internal_obj+0x2c4/0x3b2 [ 206.228012] [] acpi_ut_update_ref_count+0x180/0x1d2 [ 206.234587] [] acpi_ut_update_object_reference+0x160/0x207 [ 206.241770] [] acpi_ut_remove_reference+0xb5/0xd5 [ 206.248173] [] acpi_ns_detach_object+0xca/0xee [ 206.254318] [] acpi_ns_delete_namespace_by_owner+0xcf/0x154 [ 206.261597] [] acpi_ds_terminate_control_method+0xb5/0x14f [ 206.268779] [] acpi_ps_parse_aml+0x242/0x3a0 [ 206.274750] [] acpi_ps_execute_pass+0xd5/0x10b [ 206.280895] [] acpi_ps_execute_method+0x1bf/0x2cb [ 206.287298] [] acpi_ns_evaluate+0x1f8/0x315 [ 206.293180] [] acpi_evaluate_object+0x1d9/0x2fa [ 206.299411] [] kmem_cache_alloc+0xce/0xda [ 206.305125] [] :processor:acpi_processor_start+0x656/0x6fd [ 206.312307] [] kmem_cache_zalloc+0xce/0xf4 [ 206.318103] [] acpi_start_single_object+0x2a/0x54 [ 206.324509] [] acpi_bus_register_driver+0xcd/0x14c [ 206.331001] [] :processor:acpi_processor_init+0x61/0xb7 [ 206.337923] [] sys_init_module+0xac/0x16c [ 206.343630] [] system_call+0x7e/0x83 nmi_watchdog={1,2} produce the same errors. --D - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [BUG] NMI watchdog lockups caused by mwait_idle
Darrick, I tried 2.6.20-rc4 on a Dempsey system here in my lab and it worked fine. No watchdog lockups. Can you try idle routine with hlt instead of mwait. There is no boot option for this in x86_64, but you can change arch/x86_64/kernel/process.c:select_idle_routine() not to enable mwait. With that default kernel should use hlt based idle. Also, worth seeing will be, what happens when nmi_watchdog=0, nmi_watchdog=1, and nmi_watchdog=2 boot options. That should tell us whether nmi_watchdog is raising some false alarm or the CPUs are indeed getting locked up here.. Thanks, Venki >-Original Message- >From: Darrick J. Wong [mailto:[EMAIL PROTECTED] >Sent: Friday, January 12, 2007 1:01 PM >To: Pallipadi, Venkatesh >Cc: Linux Kernel Mailing List >Subject: [BUG] NMI watchdog lockups caused by mwait_idle > >Hi Venkatesh, > >I have an IBM IntelliStation Z30 with two Dempsey CPUs. When I try to >boot 2.6.20-rc4 on it, the system prints messages about NMI watchdog >lockups. git-bisect determined that the patch "[PATCH] x86-64: Fix >interrupt race in idle callback (3rd try)" was the source of these >problems, and I can work around the problem either by passing >"idle=poll" to get avoid mwait_idle or by reverting the patch. > >Other non-Dempsey Xeon machines with mwait support do not exhibit these >symptoms. I will try to determine if this is a bug specific to Dempsey >CPUs or this particular type of machine. I suspect the latter, but I >don't know enough about monitor/mwait to pursue this much further. > >What else can I do to diagnose this? > >--D > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [BUG] NMI watchdog lockups caused by mwait_idle
Darrick, I tried 2.6.20-rc4 on a Dempsey system here in my lab and it worked fine. No watchdog lockups. Can you try idle routine with hlt instead of mwait. There is no boot option for this in x86_64, but you can change arch/x86_64/kernel/process.c:select_idle_routine() not to enable mwait. With that default kernel should use hlt based idle. Also, worth seeing will be, what happens when nmi_watchdog=0, nmi_watchdog=1, and nmi_watchdog=2 boot options. That should tell us whether nmi_watchdog is raising some false alarm or the CPUs are indeed getting locked up here.. Thanks, Venki -Original Message- From: Darrick J. Wong [mailto:[EMAIL PROTECTED] Sent: Friday, January 12, 2007 1:01 PM To: Pallipadi, Venkatesh Cc: Linux Kernel Mailing List Subject: [BUG] NMI watchdog lockups caused by mwait_idle Hi Venkatesh, I have an IBM IntelliStation Z30 with two Dempsey CPUs. When I try to boot 2.6.20-rc4 on it, the system prints messages about NMI watchdog lockups. git-bisect determined that the patch [PATCH] x86-64: Fix interrupt race in idle callback (3rd try) was the source of these problems, and I can work around the problem either by passing idle=poll to get avoid mwait_idle or by reverting the patch. Other non-Dempsey Xeon machines with mwait support do not exhibit these symptoms. I will try to determine if this is a bug specific to Dempsey CPUs or this particular type of machine. I suspect the latter, but I don't know enough about monitor/mwait to pursue this much further. What else can I do to diagnose this? --D - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] NMI watchdog lockups caused by mwait_idle
Pallipadi, Venkatesh wrote: Darrick, I tried 2.6.20-rc4 on a Dempsey system here in my lab and it worked fine. No watchdog lockups. Can you try idle routine with hlt instead of mwait. There is no boot option for this in x86_64, but you can change arch/x86_64/kernel/process.c:select_idle_routine() not to enable mwait. With that default kernel should use hlt based idle. Also, worth seeing will be, what happens when nmi_watchdog=0, nmi_watchdog=1, and nmi_watchdog=2 boot options. That should tell us whether nmi_watchdog is raising some false alarm or the CPUs are indeed getting locked up here.. Locks up with hlt-based idle too. :( Here's what I get with nmi_watchdog=0: [ 206.088703] BUG: soft lockup detected on CPU#0! [ 206.093284] [ 206.093286] Call Trace: [ 206.097324] IRQ [801b1f89] softlockup_tick+0xd4/0xe9 [ 206.103618] [80173c55] do_flush_tlb_all+0x0/0x68 [ 206.109238] [8014d8f8] run_local_timers+0x13/0x15 [ 206.114949] [80192844] update_process_times+0x4c/0x78 [ 206.121008] [80174fcd] smp_local_timer_interrupt+0x34/0x51 [ 206.127498] [801756b1] smp_apic_timer_interrupt+0x49/0x60 [ 206.133901] [8015cd16] apic_timer_interrupt+0x66/0x70 [ 206.139956] EOI [80173baa] __smp_call_function+0x66/0x87 [ 206.146594] [80173ba6] __smp_call_function+0x62/0x87 [ 206.152564] [80173c55] do_flush_tlb_all+0x0/0x68 [ 206.158188] [80173c55] do_flush_tlb_all+0x0/0x68 [ 206.163813] [80173cef] smp_call_function+0x32/0x49 [ 206.169611] [80173c55] do_flush_tlb_all+0x0/0x68 [ 206.175236] [8018e117] on_each_cpu+0x30/0x67 [ 206.180514] [80173d46] flush_tlb_all+0x1c/0x1e [ 206.185965] [80150f2a] unmap_vm_area+0x1c3/0x265 [ 206.191590] [80101c20] init_level4_pgt+0xc20/0x1000 [ 206.197474] [801bfc47] remove_vm_area+0x41/0x67 [ 206.203010] [8017c33c] iounmap+0x8e/0xc8 [ 206.207933] [80230032] acpi_os_unmap_memory+0x9/0xb [ 206.213810] [8023aaff] acpi_ev_system_memory_region_setup+0x52/0x105 [ 206.221174] [80259465] acpi_ut_delete_internal_obj+0x2c4/0x3b2 [ 206.228012] [802596d3] acpi_ut_update_ref_count+0x180/0x1d2 [ 206.234587] [80259885] acpi_ut_update_object_reference+0x160/0x207 [ 206.241770] [802599e1] acpi_ut_remove_reference+0xb5/0xd5 [ 206.248173] [8024da8a] acpi_ns_detach_object+0xca/0xee [ 206.254318] [8024b08a] acpi_ns_delete_namespace_by_owner+0xcf/0x154 [ 206.261597] [80234481] acpi_ds_terminate_control_method+0xb5/0x14f [ 206.268779] [8024ef7c] acpi_ps_parse_aml+0x242/0x3a0 [ 206.274750] [80250a00] acpi_ps_execute_pass+0xd5/0x10b [ 206.280895] [80250c3c] acpi_ps_execute_method+0x1bf/0x2cb [ 206.287298] [8024b4da] acpi_ns_evaluate+0x1f8/0x315 [ 206.293180] [8024abf1] acpi_evaluate_object+0x1d9/0x2fa [ 206.299411] [8010ab03] kmem_cache_alloc+0xce/0xda [ 206.305125] [880146a9] :processor:acpi_processor_start+0x656/0x6fd [ 206.312307] [801cc2a0] kmem_cache_zalloc+0xce/0xf4 [ 206.318103] [80261097] acpi_start_single_object+0x2a/0x54 [ 206.324509] [8026192d] acpi_bus_register_driver+0xcd/0x14c [ 206.331001] [88022061] :processor:acpi_processor_init+0x61/0xb7 [ 206.337923] [801a4d6e] sys_init_module+0xac/0x16c [ 206.343630] [8015c11e] system_call+0x7e/0x83 nmi_watchdog={1,2} produce the same errors. --D - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/