RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
Hi, > From: Michal Hocko [mailto:mho...@kernel.org] > On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > From: Michal Hocko [mailto:mho...@kernel.org] > > > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know > > > > why CPU 130 waits so long. I'll try to consider for a while. > > > > > > Yes, I do not understand the timing here either and the fact that the > > > log is a complete mess in the important parts doesn't help a wee bit. > > > > I'm interested in where "kernel panic -not syncing: " is. > > It may give us a clue. > > This one is lost in the mangled text: > [ 167.843771] U<0>[ 167.843771] hhuh. NMI received for unkn<0><0>[ > 167.843765] Uh[ 16NM843774I own rea reived for > unknow<0 r 16n 2d 765] Uhhuh. CPU recei11. <0known reason 7. on770] Ker<[ - > not rn NMI:nic - not contt sing > > <0 >[ : Not con.inu437azed and confused, b] Dtryingaed annue > > fu 167.8ut trying>[ to 7.<0377 167.843775] U<0>[ 167.843776] ]hhu.ived for > u3nknown rMason 3 re oived for [nk167.843781] Thanks for the information. I anticipated that some lock contention on issuing messages (e.g. locks used by network/netconsole driver) delayed the panic procedure, but it seems not to be related because the panic message finished to be issued early. If I come up with something, I will post a mail. I think there may be potential issues. > 1. > <. N0>[ 167.843781] Uh. NMI recen 3d on CPU 0.i< >[ nowon 3d on] Chhuh.MI > eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120. > <0nk>no 167.wn843ason 3na s p120. > o<0er savi d6 e843ab88] Do yeu have a > [ er saving mode e nabl1d?7<4][ 167 84hu94]MIuh. NceIived for > unknown reas vdfor 1no3was0>[ 2d 67.84380on CI > rUe 12e. > ive7d8u3800wn rveaseo f2d on CPo3.r< u>k[o 1 rea6s.o2d8 oo you hn aPve <0st>a > e power 1s7.843816] Do yoauv ng moade > enbslra?ng[ e 167.8438p41o]er shhuhavi.ngIroenived fbled?nknow > < reaso0> 2d on [PU1626.41]0> Uh67.h. NM387I] receihed for .nknown reason > 2Nn MC U ceived for . > [son 2d on CPU 6. > < 160>7.8467.84873] Uhhuh. 3MI received 908 o knstra > [ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed? > n
Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote: > > From: Michal Hocko [mailto:mho...@kernel.org] [...] > > I am saying that watchdog_overflow_callback might trigger on more CPUs > > and panic from NMI context as well. So this is not reduced to the NMI > > button sends NMI to more CPUs. > > I understand. So, I have to also modify watchdog_overflow_callback > to call nmi_panic(). yes. [...] > > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know > > > why CPU 130 waits so long. I'll try to consider for a while. > > > > Yes, I do not understand the timing here either and the fact that the > > log is a complete mess in the important parts doesn't help a wee bit. > > I'm interested in where "kernel panic -not syncing: " is. > It may give us a clue. This one is lost in the mangled text: [ 167.843771] U<0>[ 167.843771] hhuh. NMI received for unkn<0><0>[ 167.843765] Uh[ 16NM843774I own rea reived for unknow<0 r 16n 2d 765] Uhhuh. CPU recei11. <0known reason 7. on770] Ker<[ - not rn NMI:nic - not contt sing <0 >[ : Not con.inu437azed and confused, b] Dtryingaed annue fu 167.8ut trying>[ to 7.<0377 167.843775] U<0>[ 167.843776] ]hhu.ived for u3nknown rMason 3 re oived for [nk167.843781] 1. <. N0>[ 167.843781] Uh. NMI recen 3d on CPU 0.i< >[ nowon 3d on] Chhuh.MI eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120. <0nk>no 167.wn843ason 3na s p120. o<0er savi d6 e843ab88] Do yeu have a [ er saving mode e nabl1d?7<4][ 167 84hu94]MIuh. NceIived for unknown reas vdfor 1no3was0>[ 2d 67.84380on CI rUe 12e. ive7d8u3800wn rveaseo f2d on CPo3.r< u>k[o 1 rea6s.o2d8 oo you hn aPve <0st>a e power 1s7.843816] Do yoauv ng moade enbslra?ng[ e 167.8438p41o]er shhuhavi.ngIroenived fbled?nknow < reaso0> 2d on [PU1626.41]0> Uh67.h. NM387I] receihed for .nknown reason 2Nn MC U ceived for . [son 2d on CPU 6. < 160>7.8467.84873] Uhhuh. 3MI received 908 o knstra [ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed? nhttp://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
> From: Michal Hocko [mailto:mho...@kernel.org] > > On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > From: Michal Hocko [mailto:mho...@kernel.org] > [...] > > > Could you point me to the code which does that, please? Maybe we are > > > missing that in our 3.0 kernel. I was quite surprised to see this > > > behavior as well. > > > > Please see the snippet below. > > > > void setup_local_APIC(void) > > { > > ... > > /* > > * only the BP should see the LINT1 NMI signal, obviously. > > */ > > if (!cpu) > > value = APIC_DM_NMI; > > else > > value = APIC_DM_NMI | APIC_LVT_MASKED; > > if (!lapic_is_integrated()) /* 82489DX */ > > value |= APIC_LVT_LEVEL_TRIGGER; > > apic_write(APIC_LVT1, value); > > > > > > LINT1 pins of cpus other than CPU 0 are masked here. > > However, at least on some of Hitachi servers, NMI caused by NMI > > button doesn't seem to be delivered through LINT1. So, my `external NMI' > > word may not be correct. > > I am not familiar with details here but I can tell you that this > particular code snippet is the same in our 3.0 based kernel so it seems > that the HW is indeed doing something differently. Yes, and it turned out my PATCH 3/3 doesn't work at all on some hardware... > > > You might still get a panic on hardlockup which will happen on all CPUs > > > from the NMI context so we have to be able to handle panic in NMI on > > > many CPUs. > > > > Do you say about the case of a kerne panic while other cpus locks up > > in NMI context? In that case, there is no way to do things needed by > > kdump procedure including saving registeres... > > I am saying that watchdog_overflow_callback might trigger on more CPUs > and panic from NMI context as well. So this is not reduced to the NMI > button sends NMI to more CPUs. I understand. So, I have to also modify watchdog_overflow_callback to call nmi_panic(). > Why cannot the panic() context save all the registers if we are going to > loop in NMI context? This would be imho preferable to returning from > panic IMO. I'm not saying we cannot save registers and do some cleanups in NMI context. I fell that it would introduce unneeded complexity. Since watchdog_overflow_callback is defined as generic code, we need to implement the preparation for kdump for other architectures. I haven't checked which architectures support both nmi watchdog and kdump, though. Anyway, I came up with a simple solution for x86. Waiting for the timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the callback registered by nmi_shootdown_cpus(). > > > I can provide the full log but it is quite mangled. I guess the > > > CPU130 was the only one allowed to proceed with the panic while others > > > returned from the unknown NMI handling. It took a lot of time until > > > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls > > > reports. CPU0 is most probably locked up waiting for CPU130 to > > > acknowledge the IPI which will not happen apparently. > > > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know > > why CPU 130 waits so long. I'll try to consider for a while. > > Yes, I do not understand the timing here either and the fact that the > log is a complete mess in the important parts doesn't help a wee bit. I'm interested in where "kernel panic -not syncing: " is. It may give us a clue. Regards, Kawai
Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote: > > From: Michal Hocko [mailto:mho...@kernel.org] [...] > > Could you point me to the code which does that, please? Maybe we are > > missing that in our 3.0 kernel. I was quite surprised to see this > > behavior as well. > > Please see the snippet below. > > void setup_local_APIC(void) > { > ... > /* > * only the BP should see the LINT1 NMI signal, obviously. > */ > if (!cpu) > value = APIC_DM_NMI; > else > value = APIC_DM_NMI | APIC_LVT_MASKED; > if (!lapic_is_integrated()) /* 82489DX */ > value |= APIC_LVT_LEVEL_TRIGGER; > apic_write(APIC_LVT1, value); > > > LINT1 pins of cpus other than CPU 0 are masked here. > However, at least on some of Hitachi servers, NMI caused by NMI > button doesn't seem to be delivered through LINT1. So, my `external NMI' > word may not be correct. I am not familiar with details here but I can tell you that this particular code snippet is the same in our 3.0 based kernel so it seems that the HW is indeed doing something differently. > > You might still get a panic on hardlockup which will happen on all CPUs > > from the NMI context so we have to be able to handle panic in NMI on > > many CPUs. > > Do you say about the case of a kerne panic while other cpus locks up > in NMI context? In that case, there is no way to do things needed by > kdump procedure including saving registeres... I am saying that watchdog_overflow_callback might trigger on more CPUs and panic from NMI context as well. So this is not reduced to the NMI button sends NMI to more CPUs. Why cannot the panic() context save all the registers if we are going to loop in NMI context? This would be imho preferable to returning from panic IMO. [...] > > I can provide the full log but it is quite mangled. I guess the > > CPU130 was the only one allowed to proceed with the panic while others > > returned from the unknown NMI handling. It took a lot of time until > > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls > > reports. CPU0 is most probably locked up waiting for CPU130 to > > acknowledge the IPI which will not happen apparently. > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know > why CPU 130 waits so long. I'll try to consider for a while. Yes, I do not understand the timing here either and the fact that the log is a complete mess in the important parts doesn't help a wee bit. [...] -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
> From: Michal Hocko [mailto:mho...@kernel.org] > > On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote: > > Hi, > > > > > From: Michal Hocko [mailto:mho...@kernel.org] > > > > > > On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote: > [...] > > > > #define nmi_panic(fmt, ...)\ > > > >do {\ > > > >if (atomic_cmpxchg(&panic_cpu, -1, > > > > raw_smp_processor_id()) \ > > > >== -1) \ > > > >panic(fmt, ##__VA_ARGS__); \ > > > >} while (0) > > > > > > This would allow to return from NMI too eagerly. > > > > Yes, but what's the problem? > > I believe that panic should be noreturn as much as possible and return > only when we do not have any other options. In my case, external NMI is delivered to only CPU 0. This means other CPUs continues to run until receiving NMI IPI. I think returning from NMI handler soon doesn't differ from this so much. > Moreover I would ask an > opposite question, what is the problem to loop in NMI on other CPUs than > the one which is performing crash_kexec? We will not save registers, so > what? Actually, we have to do at least save registers, clean up VMX/SVM states and announce that the cpu has stopped. Just returning from NMI handler is simpler. > > The root cause of your case hasn't been clarified yet. > > I can't fix for an unclear issue because I don't know what's the right > > solution. > > > > > When I was testing my > > > previous approach (on 3.0 based kernel) I had basically the same thing > > > (one NMI to process panic) and others to return. This led to a strange > > > behavior when the NMI button triggered NMI on all (hundreds) CPUs. > > > > It's strange. Usually, NMI caused by NMI button is routed to only CPU 0 > > as an external NMI. External NMI for CPUs other than CPU 0 are masked > > at boot time. Does it really happen? > > Could you point me to the code which does that, please? Maybe we are > missing that in our 3.0 kernel. I was quite surprised to see this > behavior as well. Please see the snippet below. void setup_local_APIC(void) { ... /* * only the BP should see the LINT1 NMI signal, obviously. */ if (!cpu) value = APIC_DM_NMI; else value = APIC_DM_NMI | APIC_LVT_MASKED; if (!lapic_is_integrated()) /* 82489DX */ value |= APIC_LVT_LEVEL_TRIGGER; apic_write(APIC_LVT1, value); LINT1 pins of cpus other than CPU 0 are masked here. However, at least on some of Hitachi servers, NMI caused by NMI button doesn't seem to be delivered through LINT1. So, my `external NMI' word may not be correct. > > Does the problem still happen on the latest kernel? > > I do not have machine accessible so I have to rely on the customer to > test and the current vanilla might be an issue. Sure. > > What kind of NMI is deliverd to each CPU? > > See the log below. > > > Traditionally, we should have assumed that NMI for crash dumping is > > delivered to only one cpu. Otherwise, we should often fail to take > > a proper crash dump. > > You might still get a panic on hardlockup which will happen on all CPUs > from the NMI context so we have to be able to handle panic in NMI on > many CPUs. Do you say about the case of a kerne panic while other cpus locks up in NMI context? In that case, there is no way to do things needed by kdump procedure including saving registeres... > > It seems that your case is another problem to be solved separately. > > I do not think so, quite contrary. If you want to solve the reentrancy > then other CPUs might be spinning in NMI if there is a guarantee that at > least one CPU can progress to finish crash_kexec(). > > > > The > > > crash kernel booted eventually but the log contained lockups when a > > > CPU waited for an IPI to the CPU which was handling the NMI panic. > > > > Could you explain more precisely? > > [ 167.843761] Uhhuh. NMI received for unknown reason 3d on CPU 130. > [ 167.843763] Do you have a strange power saving mode enabled? > [... Mangled output ] > [ 167.856415] Dazed and confused, but trying to continue > [ 167.856428] Dazed and confused, but trying to continue > [ 167.856442] Dazed and confused, but trying to continue > [...] > [ 193.108440] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4] > [...] > [ 193.108586] Call Trace: > [ 193.108595] [] smp_call_function_single+0x15b/0x170 > [ 193.108600] [] smp_call_function_any+0x4e/0x110 > [ 193.108607] [] get_cur_val+0xbc/0x130 [acpi_cpufreq] > [ 193.108630] [] get_cur_freq_on_cpu+0x77/0xf0 > [acpi_cpufreq] > [ 193.108638] [] cpufreq_update_policy+0x97/0x140 > [ 193.108646] [] acpi_processor_notify+0x4b/0x145 > [processor] > [ 193.108654] [] acpi_ev_notify_dispatch+0x61
RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
Hi, > -Original Message- > From: Michal Hocko [mailto:mho...@kernel.org] > > On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote: > [...] > > Are you using SGI UV? On that platform, NMIs may be delivered to > > all cpus because LVT1 of all cpus are not masked as follows: > > This is Compute Blade 520XB1 from Hitachi with 240 cpus. Thanks for the information! I asked my colleague in other department about NMI button behavior of our server just before receive your mail, and certainly what you said happens; all cpus say "Uhhuh. NMI received for unknown reason 3d on CPU N" when NMI button is pusshed. So, this was also our problem... N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote: [...] > Are you using SGI UV? On that platform, NMIs may be delivered to > all cpus because LVT1 of all cpus are not masked as follows: This is Compute Blade 520XB1 from Hitachi with 240 cpus. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote: > Hi, > > > From: Michal Hocko [mailto:mho...@kernel.org] > > > > On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote: [...] > > > #define nmi_panic(fmt, ...)\ > > >do {\ > > >if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) > > > \ > > >== -1) \ > > >panic(fmt, ##__VA_ARGS__); \ > > >} while (0) > > > > This would allow to return from NMI too eagerly. > > Yes, but what's the problem? I believe that panic should be noreturn as much as possible and return only when we do not have any other options. Moreover I would ask an opposite question, what is the problem to loop in NMI on other CPUs than the one which is performing crash_kexec? We will not save registers, so what? > The root cause of your case hasn't been clarified yet. > I can't fix for an unclear issue because I don't know what's the right > solution. > > > When I was testing my > > previous approach (on 3.0 based kernel) I had basically the same thing > > (one NMI to process panic) and others to return. This led to a strange > > behavior when the NMI button triggered NMI on all (hundreds) CPUs. > > It's strange. Usually, NMI caused by NMI button is routed to only CPU 0 > as an external NMI. External NMI for CPUs other than CPU 0 are masked > at boot time. Does it really happen? Could you point me to the code which does that, please? Maybe we are missing that in our 3.0 kernel. I was quite surprised to see this behavior as well. > Does the problem still happen on the latest kernel? I do not have machine accessible so I have to rely on the customer to test and the current vanilla might be an issue. > What kind of NMI is deliverd to each CPU? See the log below. > Traditionally, we should have assumed that NMI for crash dumping is > delivered to only one cpu. Otherwise, we should often fail to take > a proper crash dump. You might still get a panic on hardlockup which will happen on all CPUs from the NMI context so we have to be able to handle panic in NMI on many CPUs. > It seems that your case is another problem to be solved separately. I do not think so, quite contrary. If you want to solve the reentrancy then other CPUs might be spinning in NMI if there is a guarantee that at least one CPU can progress to finish crash_kexec(). > > The > > crash kernel booted eventually but the log contained lockups when a > > CPU waited for an IPI to the CPU which was handling the NMI panic. > > Could you explain more precisely? [ 167.843761] Uhhuh. NMI received for unknown reason 3d on CPU 130. [ 167.843763] Do you have a strange power saving mode enabled? [... Mangled output ] [ 167.856415] Dazed and confused, but trying to continue [ 167.856428] Dazed and confused, but trying to continue [ 167.856442] Dazed and confused, but trying to continue [...] [ 193.108440] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4] [...] [ 193.108586] Call Trace: [ 193.108595] [] smp_call_function_single+0x15b/0x170 [ 193.108600] [] smp_call_function_any+0x4e/0x110 [ 193.108607] [] get_cur_val+0xbc/0x130 [acpi_cpufreq] [ 193.108630] [] get_cur_freq_on_cpu+0x77/0xf0 [acpi_cpufreq] [ 193.108638] [] cpufreq_update_policy+0x97/0x140 [ 193.108646] [] acpi_processor_notify+0x4b/0x145 [processor] [ 193.108654] [] acpi_ev_notify_dispatch+0x61/0x77 [ 193.108659] [] acpi_os_execute_deferred+0x21/0x2c [ 193.108667] [] process_one_work+0x16c/0x350 [ 193.108673] [] worker_thread+0x17a/0x410 [ 193.108679] [] kthread+0x96/0xa0 [ 193.108688] [] kernel_thread_helper+0x4/0x10 [...] [ 221.068390] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4] [...] [ 227.991235] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 130} (detected by 56, t=15002 jiffies) [ 227.991247] sending NMI to all CPUs: [ 227.991251] NMI backtrace for cpu 0 [ 229.074091] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 130} (detected by 105, t=15013 jiffies) [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.0.101-0.47.55.9.8853.0.TEST-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Thu May 28 08:25:11 UTC 2015 (dc083ee) [0.00] Command line: root=/dev/system/lvroot resume=/dev/system/lvswap intel_idle.max_cstate=0 processor.max_cstate=0 elevator=deadline nmi_watchdog=1 console=tty0 console=ttyS1,115200 elevator=deadline sysrq=yes reset_devices irqpoll maxcpus=1 disable_cpu_apicid=0 noefi acpi_rsdp=0xba7a4014 crashkernel=1024M-:512M memmap=exactmap memmap=576K@64K memmap=523684K@393216K elfcorehdr=916900K memmap=32768K#3018748K memmap=3736K#3051516K memmap=262144K$3145728K I can provide the ful
RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
Hi Michal, > From: 河合英宏 / KAWAI,HIDEHIRO [mailto:hidehiro.kawai...@hitachi.com] > > When I was testing my > > previous approach (on 3.0 based kernel) I had basically the same thing > > (one NMI to process panic) and others to return. This led to a strange > > behavior when the NMI button triggered NMI on all (hundreds) CPUs. > > It's strange. Usually, NMI caused by NMI button is routed to only CPU 0 > as an external NMI. External NMI for CPUs other than CPU 0 are masked > at boot time. Does it really happen? Does the problem still happen on > the latest kernel? What kind of NMI is deliverd to each CPU? Are you using SGI UV? On that platform, NMIs may be delivered to all cpus because LVT1 of all cpus are not masked as follows: void uv_nmi_init(void) { unsigned int value; /* * Unmask NMI on all cpus */ value = apic_read(APIC_LVT1) | APIC_DM_NMI; value &= ~APIC_LVT_MASKED; apic_write(APIC_LVT1, value); } > > Traditionally, we should have assumed that NMI for crash dumping is > delivered to only one cpu. Otherwise, we should often fail to take > a proper crash dump. It seems that your case is another problem to be > solved separately. > > > The > > crash kernel booted eventually but the log contained lockups when a > > CPU waited for an IPI to the CPU which was handling the NMI panic. > > Could you explain more precisely? > > > Anyway, I do not thing this is really necessary to solve the panic > > reentrancy issue. > > If the missing saved state is a real problem then it > > should be handled separately - maybe it can be achieved without an IPI > > and directly from the panic context if we are in NMI. > > What I would like to do via this patchse is to solve race issues > among NMI, panic() and crash_kexec(). So, I don't think we should fix > that separately, although I would need to reword some descriptions > and titles. > > Anyway, I'm going to sent out my revised version once in order to > tidy up. I also would like to hear kexec/kdump guys' opinions. > > Regards, > Kawai N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
Hi, > From: Michal Hocko [mailto:mho...@kernel.org] > > On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > From: Michal Hocko [mailto:mho...@kernel.org] > > > On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > > Hi, > > > > > > > > > From: linux-kernel-ow...@vger.kernel.org > > > > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro > > > > > Kawai > > > > > (2015/07/27 23:34), Michal Hocko wrote: > > > > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote: > > > > [...] > > > > > > The check could be also relaxed a bit and nmi_panic would > > > > > > return only if the ongoing panic is the current cpu when we really > > > > > > have > > > > > > to return and allow the preempted panic to finish. > > > > > > > > > > It's reasonable. I'll do that in the next version. > > > > > > > > I noticed atomic_read() is insufficient. Please consider the following > > > > scenario. > > > > > > > > CPU 1: call panic() in the normal context > > > > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic() > > > > CPU 1: set 1 to panic_cpu > > > > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop > > > > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus() > > > > > > > > At this point, since CPU 0 loops in NMI context, it never executes > > > > the NMI handler registered by kdump_nmi_shootdown_cpus(). This means > > > > that no register states are saved and no cleanups for VMX/SVM are > > > > performed. > > > > > > Yes this is true but it is no different from the current state, isn't > > > it? So if you want to handle that then it deserves a separate patch. > > > It is certainly not harmful wrt. panic behavior. > > > > > > > So, we should still use atomic_cmpxchg() in nmi_panic() to > > > > prevent other cpus from running panic routines. > > > > > > Not sure what you mean by that. > > > > I mean that we should use the same logic as my V2 patch like this: > > > > #define nmi_panic(fmt, ...)\ > >do {\ > >if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) \ > >== -1) \ > >panic(fmt, ##__VA_ARGS__); \ > >} while (0) > > This would allow to return from NMI too eagerly. Yes, but what's the problem? The root cause of your case hasn't been clarified yet. I can't fix for an unclear issue because I don't know what's the right solution. > When I was testing my > previous approach (on 3.0 based kernel) I had basically the same thing > (one NMI to process panic) and others to return. This led to a strange > behavior when the NMI button triggered NMI on all (hundreds) CPUs. It's strange. Usually, NMI caused by NMI button is routed to only CPU 0 as an external NMI. External NMI for CPUs other than CPU 0 are masked at boot time. Does it really happen? Does the problem still happen on the latest kernel? What kind of NMI is deliverd to each CPU? Traditionally, we should have assumed that NMI for crash dumping is delivered to only one cpu. Otherwise, we should often fail to take a proper crash dump. It seems that your case is another problem to be solved separately. > The > crash kernel booted eventually but the log contained lockups when a > CPU waited for an IPI to the CPU which was handling the NMI panic. Could you explain more precisely? > Anyway, I do not thing this is really necessary to solve the panic > reentrancy issue. > If the missing saved state is a real problem then it > should be handled separately - maybe it can be achieved without an IPI > and directly from the panic context if we are in NMI. What I would like to do via this patchse is to solve race issues among NMI, panic() and crash_kexec(). So, I don't think we should fix that separately, although I would need to reword some descriptions and titles. Anyway, I'm going to sent out my revised version once in order to tidy up. I also would like to hear kexec/kdump guys' opinions. Regards, Kawai
Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote: > > From: Michal Hocko [mailto:mho...@kernel.org] > > On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > Hi, > > > > > > > From: linux-kernel-ow...@vger.kernel.org > > > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai > > > > (2015/07/27 23:34), Michal Hocko wrote: > > > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote: > > > [...] > > > > > The check could be also relaxed a bit and nmi_panic would > > > > > return only if the ongoing panic is the current cpu when we really > > > > > have > > > > > to return and allow the preempted panic to finish. > > > > > > > > It's reasonable. I'll do that in the next version. > > > > > > I noticed atomic_read() is insufficient. Please consider the following > > > scenario. > > > > > > CPU 1: call panic() in the normal context > > > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic() > > > CPU 1: set 1 to panic_cpu > > > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop > > > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus() > > > > > > At this point, since CPU 0 loops in NMI context, it never executes > > > the NMI handler registered by kdump_nmi_shootdown_cpus(). This means > > > that no register states are saved and no cleanups for VMX/SVM are > > > performed. > > > > Yes this is true but it is no different from the current state, isn't > > it? So if you want to handle that then it deserves a separate patch. > > It is certainly not harmful wrt. panic behavior. > > > > > So, we should still use atomic_cmpxchg() in nmi_panic() to > > > prevent other cpus from running panic routines. > > > > Not sure what you mean by that. > > I mean that we should use the same logic as my V2 patch like this: > > #define nmi_panic(fmt, ...)\ >do {\ >if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) \ >== -1) \ >panic(fmt, ##__VA_ARGS__); \ >} while (0) This would allow to return from NMI too eagerly. When I was testing my previous approach (on 3.0 based kernel) I had basically the same thing (one NMI to process panic) and others to return. This led to a strange behavior when the NMI button triggered NMI on all (hundreds) CPUs. The crash kernel booted eventually but the log contained lockups when a CPU waited for an IPI to the CPU which was handling the NMI panic. Anyway, I do not thing this is really necessary to solve the panic reentrancy issue. If the missing saved state is a real problem then it should be handled separately - maybe it can be achieved without an IPI and directly from the panic context if we are in NMI. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
> From: Michal Hocko [mailto:mho...@kernel.org] > On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote: > > Hi, > > > > > From: linux-kernel-ow...@vger.kernel.org > > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai > > > (2015/07/27 23:34), Michal Hocko wrote: > > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote: > > [...] > > > > The check could be also relaxed a bit and nmi_panic would > > > > return only if the ongoing panic is the current cpu when we really have > > > > to return and allow the preempted panic to finish. > > > > > > It's reasonable. I'll do that in the next version. > > > > I noticed atomic_read() is insufficient. Please consider the following > > scenario. > > > > CPU 1: call panic() in the normal context > > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic() > > CPU 1: set 1 to panic_cpu > > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop > > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus() > > > > At this point, since CPU 0 loops in NMI context, it never executes > > the NMI handler registered by kdump_nmi_shootdown_cpus(). This means > > that no register states are saved and no cleanups for VMX/SVM are > > performed. > > Yes this is true but it is no different from the current state, isn't > it? So if you want to handle that then it deserves a separate patch. > It is certainly not harmful wrt. panic behavior. > > > So, we should still use atomic_cmpxchg() in nmi_panic() to > > prevent other cpus from running panic routines. > > Not sure what you mean by that. I mean that we should use the same logic as my V2 patch like this: #define nmi_panic(fmt, ...)\ do {\ if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) \ == -1) \ panic(fmt, ##__VA_ARGS__); \ } while (0) By using atomic_cmpxchg here, we can ensure that only this cpu runs panic routines. It is important to prevent a NMI-context cpu from calling panic_smp_self_stop(). void panic(const char *fmt, ...) { ... * `old_cpu == -1' means we are the first comer. * `old_cpu == this_cpu' means we came here due to panic on NMI. */ this_cpu = raw_smp_processor_id(); old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu); if (old_cpu != -1 && old_cpu != this_cpu) panic_smp_self_stop(); Please assume that CPU 0 calls nmi_panic() in NMI context and CPU 1 calls panic() in normal context at tha same time. If CPU 1 set panic_cpu before CPU 0 does, CPU 1 runs panic routines and CPU 0 return from the nmi handler. Eventually CPU 0 is stopped by nmi_shootdown_cpus(). If CPU 0 set panic_cpu before CPU 1 does, CPU 0 runs panic routines. CPU 1 calls panic_smp_self_stop(), and wait for NMI by nmi_shootdown_cpus(). Anyway, I tested my approach and it worked fine. Regards, Kawai
Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote: > Hi, > > > From: linux-kernel-ow...@vger.kernel.org > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai > > (2015/07/27 23:34), Michal Hocko wrote: > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote: > [...] > > > The check could be also relaxed a bit and nmi_panic would > > > return only if the ongoing panic is the current cpu when we really have > > > to return and allow the preempted panic to finish. > > > > It's reasonable. I'll do that in the next version. > > I noticed atomic_read() is insufficient. Please consider the following > scenario. > > CPU 1: call panic() in the normal context > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic() > CPU 1: set 1 to panic_cpu > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus() > > At this point, since CPU 0 loops in NMI context, it never executes > the NMI handler registered by kdump_nmi_shootdown_cpus(). This means > that no register states are saved and no cleanups for VMX/SVM are > performed. Yes this is true but it is no different from the current state, isn't it? So if you want to handle that then it deserves a separate patch. It is certainly not harmful wrt. panic behavior. > So, we should still use atomic_cmpxchg() in nmi_panic() to > prevent other cpus from running panic routines. Not sure what you mean by that. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI
Hi, > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai > (2015/07/27 23:34), Michal Hocko wrote: > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote: [...] > > The check could be also relaxed a bit and nmi_panic would > > return only if the ongoing panic is the current cpu when we really have > > to return and allow the preempted panic to finish. > > It's reasonable. I'll do that in the next version. I noticed atomic_read() is insufficient. Please consider the following scenario. CPU 1: call panic() in the normal context CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic() CPU 1: set 1 to panic_cpu CPU 0: fail to set 0 to panic_cpu, then do an infinite loop CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus() At this point, since CPU 0 loops in NMI context, it never executes the NMI handler registered by kdump_nmi_shootdown_cpus(). This means that no register states are saved and no cleanups for VMX/SVM are performed. So, we should still use atomic_cmpxchg() in nmi_panic() to prevent other cpus from running panic routines. > > +void nmi_panic(const char *fmt, ...) > > +{ > > + /* > > +* We have to back off if the NMI has preempted an ongoing panic and > > +* allow it to finish > > +*/ > > + if (atomic_read(&panic_cpu) == raw_smp_processor_id()) > > + return; > > + > > + panic(); > > +} > > +EXPORT_SYMBOL(nmi_panic); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/