Re: [PATCH v3 3/3] powernv/kdump: Fix cases where the kdump kernel can get HMI's
On Fri, 15 Dec 2017 14:34:03 +1100 Balbir Singh wrote: > On Fri, Dec 15, 2017 at 2:10 PM, Nicholas Piggin wrote: > > On Fri, 15 Dec 2017 12:27:40 +1100 > > Balbir Singh wrote: > > But then you still need an explicit kdump check for non-sreset wakeups > > because the platform may not implement sreset, or it may fall back to > > normal IPI if the sreset scom fails. So you then still need your > > > > if (kdump) crash_ipi_callback > > Yep, its required for the the non NMI case. > > > How does this look -- based on your comment > > + } else if ((srr1 & wmask) == SRR1_WAKESRESET) { > + irq_set_pending_from_srr1(srr1); > + /* Does not return */ > } > + > smp_mb(); > > /* > * For kdump kernels, we process the ipi and jump to > -* handling the system reset exception. > +* crash_ipi_callback > */ > - if (kdump_in_progress()) > - irq_set_pending_from_srr1(srr1); > + if (kdump_in_progress()) { > + /* > +* If we got to this point, we've not used > +* NMI's, otherwise we would have gone > +* via the SRR1_WAKESRESET path. We are > +* using regular IPI's for waking up offline > +* threads. > +*/ > + struct pt_regs regs; > + > + ppc_save_regs(®s); > + crash_ipi_callback(regs); > + /* Does not return */ > + } That looks like it should do the trick. Thanks, Nick
Re: [PATCH v3 3/3] powernv/kdump: Fix cases where the kdump kernel can get HMI's
On Fri, Dec 15, 2017 at 2:10 PM, Nicholas Piggin wrote: > On Fri, 15 Dec 2017 12:27:40 +1100 > Balbir Singh wrote: > >> Certain HMI's such as malfunction error propagate through >> all threads/core on the system. If a thread was offline >> prior to us crashing the system and jumping to the kdump >> kernel, bad things happen when it wakes up due to an HMI >> in the kdump kernel. >> >> There are several possible ways to solve this problem >> >> 1. Put the offline cores in a state such that they are >> not woken up for machine check and HMI errors. This >> does not work, since we might need to wake up offline >> threads to handle TB errors >> 2. Ignore HMI errors, setup HMEER to mask HMI errors, >> but this still leads the window open for any MCEs >> and masking them for the duration of the dump might >> be a concern >> 3. Wake up offline CPUs, as in send them to >> crash_ipi_callback (not wake them up as in mark them >> online as seen by the hotplug). kexec does a >> wake_online_cpus() call, this patch does something >> similar, but instead sends an IPI and forces them to >> crash_ipi_callback() >> >> This patch takes approach #3. >> >> Care is taken to enable this only for powenv platforms >> via crash_wake_offline (a global value set at setup >> time). The crash code sends out IPI's to all CPU's >> which then move to crash_ipi_callback and kexec_smp_wait(). >> >> Signed-off-by: Balbir Singh >> --- >> >> Changelog v3 >> - Use SRR1's reason to wake up to drive replay_system_reset() >>as a means of getting to kdump() as opposed to calling >>crash_ipi_callback based on comments from Nick Piggin. >> >> arch/powerpc/include/asm/kexec.h | 2 ++ >> arch/powerpc/kernel/crash.c | 13 - >> arch/powerpc/kernel/smp.c| 18 ++ >> arch/powerpc/platforms/powernv/smp.c | 12 >> 4 files changed, 44 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/include/asm/kexec.h >> b/arch/powerpc/include/asm/kexec.h >> index 4419d435639a..9dcbfa6bbb91 100644 >> --- a/arch/powerpc/include/asm/kexec.h >> +++ b/arch/powerpc/include/asm/kexec.h >> @@ -73,6 +73,8 @@ extern void kexec_smp_wait(void); /* get and clear naca >> physid, wait for >> master to copy new code to 0 */ >> extern int crashing_cpu; >> extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)); >> +extern void crash_ipi_callback(struct pt_regs *); >> +extern int crash_wake_offline; >> >> struct kimage; >> struct pt_regs; >> diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c >> index 29c56ca2ddfd..00b215125d3e 100644 >> --- a/arch/powerpc/kernel/crash.c >> +++ b/arch/powerpc/kernel/crash.c >> @@ -44,6 +44,14 @@ >> #define REAL_MODE_TIMEOUT1 >> >> static int time_to_dump; >> +/* >> + * crash_wake_offline should be set to 1 by platforms that intend to wake >> + * up offline cpus prior to jumping to a kdump kernel. Currently powernv >> + * sets it to 1, since we want to avoid things from happening when an >> + * offline CPU wakes up due to something like an HMI (malfunction error), >> + * which propagates to all threads. >> + */ >> +int crash_wake_offline; >> >> #define CRASH_HANDLER_MAX 3 >> /* List of shutdown handles */ >> @@ -63,7 +71,7 @@ static int handle_fault(struct pt_regs *regs) >> #ifdef CONFIG_SMP >> >> static atomic_t cpus_in_crash; >> -static void crash_ipi_callback(struct pt_regs *regs) >> +void crash_ipi_callback(struct pt_regs *regs) >> { >> static cpumask_t cpus_state_saved = CPU_MASK_NONE; >> >> @@ -106,6 +114,9 @@ static void crash_kexec_prepare_cpus(int cpu) >> >> printk(KERN_EMERG "Sending IPI to other CPUs\n"); >> >> + if (crash_wake_offline) >> + ncpus = num_present_cpus() - 1; >> + >> crash_send_ipi(crash_ipi_callback); >> smp_wmb(); >> >> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c >> index e0a4c1f82e25..bbe7634b3a43 100644 >> --- a/arch/powerpc/kernel/smp.c >> +++ b/arch/powerpc/kernel/smp.c >> @@ -543,7 +543,25 @@ void smp_send_debugger_break(void) >> #ifdef CONFIG_KEXEC_CORE >> void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) >> { >> + int cpu; >> + >> smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 100); >> + if (kdump_in_progress() && crash_wake_offline) { >> + for_each_present_cpu(cpu) { >> + if (cpu_online(cpu)) >> + continue; >> + /* >> + * crash_ipi_callback will wait for >> + * all cpus, including offline CPUs. >> + * We don't care about nmi_ipi_function. >> + * Offline cpus will jump straight into >> + * crash_ipi_callback, we can skip the >> + * entire NMI dance and waiting for >> + * cpus to clear pending mask, etc. >> +
Re: [PATCH v3 3/3] powernv/kdump: Fix cases where the kdump kernel can get HMI's
On Fri, 15 Dec 2017 12:27:40 +1100 Balbir Singh wrote: > Certain HMI's such as malfunction error propagate through > all threads/core on the system. If a thread was offline > prior to us crashing the system and jumping to the kdump > kernel, bad things happen when it wakes up due to an HMI > in the kdump kernel. > > There are several possible ways to solve this problem > > 1. Put the offline cores in a state such that they are > not woken up for machine check and HMI errors. This > does not work, since we might need to wake up offline > threads to handle TB errors > 2. Ignore HMI errors, setup HMEER to mask HMI errors, > but this still leads the window open for any MCEs > and masking them for the duration of the dump might > be a concern > 3. Wake up offline CPUs, as in send them to > crash_ipi_callback (not wake them up as in mark them > online as seen by the hotplug). kexec does a > wake_online_cpus() call, this patch does something > similar, but instead sends an IPI and forces them to > crash_ipi_callback() > > This patch takes approach #3. > > Care is taken to enable this only for powenv platforms > via crash_wake_offline (a global value set at setup > time). The crash code sends out IPI's to all CPU's > which then move to crash_ipi_callback and kexec_smp_wait(). > > Signed-off-by: Balbir Singh > --- > > Changelog v3 > - Use SRR1's reason to wake up to drive replay_system_reset() >as a means of getting to kdump() as opposed to calling >crash_ipi_callback based on comments from Nick Piggin. > > arch/powerpc/include/asm/kexec.h | 2 ++ > arch/powerpc/kernel/crash.c | 13 - > arch/powerpc/kernel/smp.c| 18 ++ > arch/powerpc/platforms/powernv/smp.c | 12 > 4 files changed, 44 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/kexec.h > b/arch/powerpc/include/asm/kexec.h > index 4419d435639a..9dcbfa6bbb91 100644 > --- a/arch/powerpc/include/asm/kexec.h > +++ b/arch/powerpc/include/asm/kexec.h > @@ -73,6 +73,8 @@ extern void kexec_smp_wait(void); /* get and clear naca > physid, wait for > master to copy new code to 0 */ > extern int crashing_cpu; > extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)); > +extern void crash_ipi_callback(struct pt_regs *); > +extern int crash_wake_offline; > > struct kimage; > struct pt_regs; > diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c > index 29c56ca2ddfd..00b215125d3e 100644 > --- a/arch/powerpc/kernel/crash.c > +++ b/arch/powerpc/kernel/crash.c > @@ -44,6 +44,14 @@ > #define REAL_MODE_TIMEOUT1 > > static int time_to_dump; > +/* > + * crash_wake_offline should be set to 1 by platforms that intend to wake > + * up offline cpus prior to jumping to a kdump kernel. Currently powernv > + * sets it to 1, since we want to avoid things from happening when an > + * offline CPU wakes up due to something like an HMI (malfunction error), > + * which propagates to all threads. > + */ > +int crash_wake_offline; > > #define CRASH_HANDLER_MAX 3 > /* List of shutdown handles */ > @@ -63,7 +71,7 @@ static int handle_fault(struct pt_regs *regs) > #ifdef CONFIG_SMP > > static atomic_t cpus_in_crash; > -static void crash_ipi_callback(struct pt_regs *regs) > +void crash_ipi_callback(struct pt_regs *regs) > { > static cpumask_t cpus_state_saved = CPU_MASK_NONE; > > @@ -106,6 +114,9 @@ static void crash_kexec_prepare_cpus(int cpu) > > printk(KERN_EMERG "Sending IPI to other CPUs\n"); > > + if (crash_wake_offline) > + ncpus = num_present_cpus() - 1; > + > crash_send_ipi(crash_ipi_callback); > smp_wmb(); > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > index e0a4c1f82e25..bbe7634b3a43 100644 > --- a/arch/powerpc/kernel/smp.c > +++ b/arch/powerpc/kernel/smp.c > @@ -543,7 +543,25 @@ void smp_send_debugger_break(void) > #ifdef CONFIG_KEXEC_CORE > void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) > { > + int cpu; > + > smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 100); > + if (kdump_in_progress() && crash_wake_offline) { > + for_each_present_cpu(cpu) { > + if (cpu_online(cpu)) > + continue; > + /* > + * crash_ipi_callback will wait for > + * all cpus, including offline CPUs. > + * We don't care about nmi_ipi_function. > + * Offline cpus will jump straight into > + * crash_ipi_callback, we can skip the > + * entire NMI dance and waiting for > + * cpus to clear pending mask, etc. > + */ > + do_smp_send_nmi_ipi(cpu); > + } > + } > } > #endif > > diff --git a/arch/powerpc/platforms/p
Re: [PATCH v3 2/3] powerpc/powernv: Identify scom driven system reset
On Fri, 15 Dec 2017 13:54:18 +1100 Balbir Singh wrote: > On Fri, Dec 15, 2017 at 1:44 PM, Nicholas Piggin wrote: > > On Fri, 15 Dec 2017 12:27:39 +1100 > > Balbir Singh wrote: > > > >> In irq_set_pending_from_srr1() we were missing 0x2 as system > >> reset identified from SRR1 caused by back to back system > >> resets or when interrupts are caused by SCOM when the thread > >> is not in power saving mode. > >> > >> This helps us get to NMI handling in both the case where NMI > >> is caused when in power-saving and not in power-saving mode. > >> The actual exploitation is expected when we are doing a kdump > >> and an offline CPU might not be in power-saving mode due to > >> an already spurious IPI or any other reason. > >> > >> Signed-off-by: Balbir Singh > > > > When not in power saving mode, we don't look at SRR1 at all, so > > we don't need this. You should never be getting it returned as the > > result of your idle instruction (except on DD1 which has a bug, > > but firmware doesn't implement the NMI IPI). > > > > I added this for the next patch. We call irq_set_pending_from_srr1 > while coming out of CPU idle, but if for any reason the IPI was spurious > and then we see an NMI during kdump, I did want to detect that > and callback into the NMI handler. You'll never see it. The SRR1 value you get is the idle wakeup code. So any scom-when-not-idle bit should never be set. Thanks, Nick
Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
On Fri, Dec 15, 2017 at 12:32 AM, Nicholas Piggin wrote: > On Thu, 14 Dec 2017 23:16:26 +1100 > Balbir Singh wrote: > >> On Thu, Dec 14, 2017 at 12:51 PM, Nicholas Piggin wrote: > > >> >> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and >> >> I'm worried about calling a stale nmi_ipi_function via the >> >> system_reset_exception path, if we are OK with it, I can revisit >> >> the code path >> > >> > You shouldn't get a stale one, that would also be a bug -- we >> > have to cope with NMIs coming in at any time that are triggered >> > externally (not by smp_send_nmi_ipi), so if you see any bugs >> > there those need to be fixed separately. >> > >> >> Yes, I think it's a bug, nothing clears nmi_ipi_function (from what >> I can see), so when the next NMI comes in and goes into >> pnv_system_reset_exception >> it'll execute the stale handler. > > The CPU won't be in the nmi_ipi_pending_mask though, so it shouldn't > get that far. You could add a bit of paranoia to clear the function > pointer I suppose, but AFAIKS it's not needed. > Yep your right, but these things are so subtle :) I will as paranoia cleanup nmi_ipi_function, but I'll add that as a TODO Balbir Singh.
Re: [PATCH v3 2/3] powerpc/powernv: Identify scom driven system reset
On Fri, Dec 15, 2017 at 1:44 PM, Nicholas Piggin wrote: > On Fri, 15 Dec 2017 12:27:39 +1100 > Balbir Singh wrote: > >> In irq_set_pending_from_srr1() we were missing 0x2 as system >> reset identified from SRR1 caused by back to back system >> resets or when interrupts are caused by SCOM when the thread >> is not in power saving mode. >> >> This helps us get to NMI handling in both the case where NMI >> is caused when in power-saving and not in power-saving mode. >> The actual exploitation is expected when we are doing a kdump >> and an offline CPU might not be in power-saving mode due to >> an already spurious IPI or any other reason. >> >> Signed-off-by: Balbir Singh > > When not in power saving mode, we don't look at SRR1 at all, so > we don't need this. You should never be getting it returned as the > result of your idle instruction (except on DD1 which has a bug, > but firmware doesn't implement the NMI IPI). > I added this for the next patch. We call irq_set_pending_from_srr1 while coming out of CPU idle, but if for any reason the IPI was spurious and then we see an NMI during kdump, I did want to detect that and callback into the NMI handler. > It's possible we could pay more attention to the reason, for > example in the powernv system reset handle we might only call > the NMI IPI handler if it was a scom sreset... but that would > be a regs->msr test. regs->msr is the same as SRR1 in my kdump case, but your right in general we want to look at regs->msr Balbir Singh
Re: [PATCH v3 2/3] powerpc/powernv: Identify scom driven system reset
On Fri, 15 Dec 2017 12:27:39 +1100 Balbir Singh wrote: > In irq_set_pending_from_srr1() we were missing 0x2 as system > reset identified from SRR1 caused by back to back system > resets or when interrupts are caused by SCOM when the thread > is not in power saving mode. > > This helps us get to NMI handling in both the case where NMI > is caused when in power-saving and not in power-saving mode. > The actual exploitation is expected when we are doing a kdump > and an offline CPU might not be in power-saving mode due to > an already spurious IPI or any other reason. > > Signed-off-by: Balbir Singh When not in power saving mode, we don't look at SRR1 at all, so we don't need this. You should never be getting it returned as the result of your idle instruction (except on DD1 which has a bug, but firmware doesn't implement the NMI IPI). It's possible we could pay more attention to the reason, for example in the powernv system reset handle we might only call the NMI IPI handler if it was a scom sreset... but that would be a regs->msr test. > --- > arch/powerpc/kernel/irq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c > index b7a84522e652..ec89104e8ab9 100644 > --- a/arch/powerpc/kernel/irq.c > +++ b/arch/powerpc/kernel/irq.c > @@ -413,7 +413,7 @@ bool prep_irq_for_idle_irqsoff(void) > #define IRQ_SYSTEM_RESET 0xff > > static const u8 srr1_to_lazyirq[0x10] = { > - 0, 0, 0, > + 0, 0, IRQ_SYSTEM_RESET, > PACA_IRQ_DBELL, > IRQ_SYSTEM_RESET, > PACA_IRQ_DBELL,
[PATCH v3 3/3] powernv/kdump: Fix cases where the kdump kernel can get HMI's
Certain HMI's such as malfunction error propagate through all threads/core on the system. If a thread was offline prior to us crashing the system and jumping to the kdump kernel, bad things happen when it wakes up due to an HMI in the kdump kernel. There are several possible ways to solve this problem 1. Put the offline cores in a state such that they are not woken up for machine check and HMI errors. This does not work, since we might need to wake up offline threads to handle TB errors 2. Ignore HMI errors, setup HMEER to mask HMI errors, but this still leads the window open for any MCEs and masking them for the duration of the dump might be a concern 3. Wake up offline CPUs, as in send them to crash_ipi_callback (not wake them up as in mark them online as seen by the hotplug). kexec does a wake_online_cpus() call, this patch does something similar, but instead sends an IPI and forces them to crash_ipi_callback() This patch takes approach #3. Care is taken to enable this only for powenv platforms via crash_wake_offline (a global value set at setup time). The crash code sends out IPI's to all CPU's which then move to crash_ipi_callback and kexec_smp_wait(). Signed-off-by: Balbir Singh --- Changelog v3 - Use SRR1's reason to wake up to drive replay_system_reset() as a means of getting to kdump() as opposed to calling crash_ipi_callback based on comments from Nick Piggin. arch/powerpc/include/asm/kexec.h | 2 ++ arch/powerpc/kernel/crash.c | 13 - arch/powerpc/kernel/smp.c| 18 ++ arch/powerpc/platforms/powernv/smp.c | 12 4 files changed, 44 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 4419d435639a..9dcbfa6bbb91 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -73,6 +73,8 @@ extern void kexec_smp_wait(void); /* get and clear naca physid, wait for master to copy new code to 0 */ extern int crashing_cpu; extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)); +extern void crash_ipi_callback(struct pt_regs *); +extern int crash_wake_offline; struct kimage; struct pt_regs; diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index 29c56ca2ddfd..00b215125d3e 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -44,6 +44,14 @@ #define REAL_MODE_TIMEOUT 1 static int time_to_dump; +/* + * crash_wake_offline should be set to 1 by platforms that intend to wake + * up offline cpus prior to jumping to a kdump kernel. Currently powernv + * sets it to 1, since we want to avoid things from happening when an + * offline CPU wakes up due to something like an HMI (malfunction error), + * which propagates to all threads. + */ +int crash_wake_offline; #define CRASH_HANDLER_MAX 3 /* List of shutdown handles */ @@ -63,7 +71,7 @@ static int handle_fault(struct pt_regs *regs) #ifdef CONFIG_SMP static atomic_t cpus_in_crash; -static void crash_ipi_callback(struct pt_regs *regs) +void crash_ipi_callback(struct pt_regs *regs) { static cpumask_t cpus_state_saved = CPU_MASK_NONE; @@ -106,6 +114,9 @@ static void crash_kexec_prepare_cpus(int cpu) printk(KERN_EMERG "Sending IPI to other CPUs\n"); + if (crash_wake_offline) + ncpus = num_present_cpus() - 1; + crash_send_ipi(crash_ipi_callback); smp_wmb(); diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index e0a4c1f82e25..bbe7634b3a43 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -543,7 +543,25 @@ void smp_send_debugger_break(void) #ifdef CONFIG_KEXEC_CORE void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) { + int cpu; + smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 100); + if (kdump_in_progress() && crash_wake_offline) { + for_each_present_cpu(cpu) { + if (cpu_online(cpu)) + continue; + /* +* crash_ipi_callback will wait for +* all cpus, including offline CPUs. +* We don't care about nmi_ipi_function. +* Offline cpus will jump straight into +* crash_ipi_callback, we can skip the +* entire NMI dance and waiting for +* cpus to clear pending mask, etc. +*/ + do_smp_send_nmi_ipi(cpu); + } + } } #endif diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index ba030669eca1..1594bbff3dec 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -37,6 +37,8 @@ #include #include #include +#include +#include #incl
[PATCH v3 2/3] powerpc/powernv: Identify scom driven system reset
In irq_set_pending_from_srr1() we were missing 0x2 as system reset identified from SRR1 caused by back to back system resets or when interrupts are caused by SCOM when the thread is not in power saving mode. This helps us get to NMI handling in both the case where NMI is caused when in power-saving and not in power-saving mode. The actual exploitation is expected when we are doing a kdump and an offline CPU might not be in power-saving mode due to an already spurious IPI or any other reason. Signed-off-by: Balbir Singh --- arch/powerpc/kernel/irq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index b7a84522e652..ec89104e8ab9 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -413,7 +413,7 @@ bool prep_irq_for_idle_irqsoff(void) #define IRQ_SYSTEM_RESET 0xff static const u8 srr1_to_lazyirq[0x10] = { - 0, 0, 0, + 0, 0, IRQ_SYSTEM_RESET, PACA_IRQ_DBELL, IRQ_SYSTEM_RESET, PACA_IRQ_DBELL, -- 2.13.6
[PATCH v3 1/3] powerpc/crash: Remove the test for cpu_online in the IPI callback
Our check was extra cautious, we've audited crash_send_ipi and it sends an IPI only to online CPU's. Removal of this check should have not functional impact on crash kdump. Signed-off-by: Balbir Singh --- arch/powerpc/kernel/crash.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index cbabb5adccd9..29c56ca2ddfd 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -69,9 +69,6 @@ static void crash_ipi_callback(struct pt_regs *regs) int cpu = smp_processor_id(); - if (!cpu_online(cpu)) - return; - hard_irq_disable(); if (!cpumask_test_cpu(cpu, &cpus_state_saved)) { crash_save_cpu(regs, cpu); -- 2.13.6
[PATCH] powerpc: dts: Remove leading 0x and 0s from bindings notation
Improve the DTS files by removing all the leading "0x" and zeros to fix the following dtc warnings: Warning (unit_address_format): Node /XXX unit name should not have leading "0x" and Warning (unit_address_format): Node /XXX unit name should not have leading 0s Converted using the following command: find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} + For simplicity, two sed expressions were used to solve each warnings separately. To make the regex expression more robust a few other issues were resolved, namely setting unit-address to lower case, and adding a whitespace before the the opening curly brace: https://elinux.org/Device_Tree_Linux#Linux_conventions This is a follow up to commit 4c9847b7375a ("dt-bindings: Remove leading 0x from bindings notation") Reported-by: David Daney Suggested-by: Rob Herring Signed-off-by: Mathieu Malaterre --- arch/powerpc/boot/dts/a3m071.dts | 10 +- arch/powerpc/boot/dts/akebono.dts | 4 ++-- arch/powerpc/boot/dts/c2k.dts | 6 +++--- arch/powerpc/boot/dts/currituck.dts| 2 +- arch/powerpc/boot/dts/fsl/mpc8568mds.dts | 12 +-- arch/powerpc/boot/dts/fsl/mpc8569mds.dts | 20 +-- arch/powerpc/boot/dts/fsl/p1021mds.dts | 6 +++--- arch/powerpc/boot/dts/fsl/p1025rdb.dtsi| 8 arch/powerpc/boot/dts/fsl/p1025rdb_32b.dts | 2 +- arch/powerpc/boot/dts/fsl/p1025twr.dtsi| 8 arch/powerpc/boot/dts/fsl/t1040rdb.dts | 2 +- arch/powerpc/boot/dts/fsl/t1042d4rdb.dts | 10 +- arch/powerpc/boot/dts/fsl/t1042rdb.dts | 2 +- arch/powerpc/boot/dts/fsl/t104xrdb.dtsi| 6 +++--- arch/powerpc/boot/dts/fsp2.dts | 6 +++--- arch/powerpc/boot/dts/gamecube.dts | 14 ++--- arch/powerpc/boot/dts/haleakala.dts| 2 +- arch/powerpc/boot/dts/kilauea.dts | 4 ++-- arch/powerpc/boot/dts/kmeter1.dts | 10 +- arch/powerpc/boot/dts/makalu.dts | 4 ++-- arch/powerpc/boot/dts/mpc832x_mds.dts | 10 +- arch/powerpc/boot/dts/mpc832x_rdb.dts | 8 arch/powerpc/boot/dts/mpc836x_mds.dts | 8 arch/powerpc/boot/dts/sbc8548-altflash.dts | 8 arch/powerpc/boot/dts/sbc8548.dts | 8 arch/powerpc/boot/dts/wii.dts | 32 +++--- 26 files changed, 106 insertions(+), 106 deletions(-) diff --git a/arch/powerpc/boot/dts/a3m071.dts b/arch/powerpc/boot/dts/a3m071.dts index bf81b8f9704c..187ce458d03a 100644 --- a/arch/powerpc/boot/dts/a3m071.dts +++ b/arch/powerpc/boot/dts/a3m071.dts @@ -105,24 +105,24 @@ reg = <0 0x0 0x0200>; compatible = "cfi-flash"; bank-width = <2>; - partition@0x0 { + partition@0 { label = "u-boot"; reg = <0x 0x0004>; read-only; }; - partition@0x0004 { + partition@4 { label = "env"; reg = <0x0004 0x0002>; }; - partition@0x0006 { + partition@6 { label = "dtb"; reg = <0x0006 0x0002>; }; - partition@0x0008 { + partition@8 { label = "kernel"; reg = <0x0008 0x0050>; }; - partition@0x0058 { + partition@58 { label = "root"; reg = <0x0058 0x00A8>; }; diff --git a/arch/powerpc/boot/dts/akebono.dts b/arch/powerpc/boot/dts/akebono.dts index e61d5dc598c1..746779202a12 100644 --- a/arch/powerpc/boot/dts/akebono.dts +++ b/arch/powerpc/boot/dts/akebono.dts @@ -216,7 +216,7 @@ interrupts = <39 2>; }; - IIC0: i2c@ { + IIC0: i2c@0 { compatible = "ibm,iic-476gtr", "ibm,iic"; reg = <0x0 0x0020>; interrupt-parent = <&MPIC>; @@ -229,7 +229,7 @@ }; }; - IIC1: i2c@0100 { + IIC1: i2c@100 { compatible = "ibm,iic-476gtr", "ibm,iic"; reg = <0x100 0x0020>; interrupt-par
[RFC PATCH for 4.16 12/21] powerpc: Wire up cpu_opv system call
Signed-off-by: Mathieu Desnoyers CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Boqun Feng CC: Peter Zijlstra CC: "Paul E. McKenney" CC: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 964321a5799c..f9cdb896fbaa 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -390,3 +390,4 @@ COMPAT_SYS_SPU(pwritev2) SYSCALL(kexec_file_load) SYSCALL(statx) SYSCALL(rseq) +SYSCALL(cpu_opv) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index e76bd5601ea4..48f80f452e31 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include -#define NR_syscalls385 +#define NR_syscalls386 #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index b1980fcd56d5..972a7d68c143 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -396,5 +396,6 @@ #define __NR_kexec_file_load 382 #define __NR_statx 383 #define __NR_rseq 384 +#define __NR_cpu_opv 385 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 2.11.0
[RFC PATCH for 4.16 07/21] powerpc: Add support for restartable sequences
From: Boqun Feng Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Increment the event counter and perform fixup on the pre-signal when a signal is delivered on top of a restartable sequence critical section. Signed-off-by: Boqun Feng Signed-off-by: Mathieu Desnoyers CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Peter Zijlstra CC: "Paul E. McKenney" CC: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/signal.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c51e6ce42e7a..e9992f80819c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -221,6 +221,7 @@ config PPC select HAVE_SYSCALL_TRACEPOINTS select HAVE_VIRT_CPU_ACCOUNTING select HAVE_IRQ_TIME_ACCOUNTING + select HAVE_RSEQ select IRQ_DOMAIN select IRQ_FORCED_THREADING select MODULES_USE_ELF_RELA diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index 3d7539b90010..a7b95f7bcaf1 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) /* Re-enable the breakpoints for the signal stack */ thread_change_pc(tsk, tsk->thread.regs); + rseq_signal_deliver(tsk->thread.regs); + if (is32) { if (ksig.ka.sa.sa_flags & SA_SIGINFO) ret = handle_rt_signal32(&ksig, oldset, tsk); @@ -161,6 +163,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) if (thread_info_flags & _TIF_NOTIFY_RESUME) { clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); + rseq_handle_notify_resume(regs); } if (thread_info_flags & _TIF_PATCH_PENDING) -- 2.11.0
[RFC PATCH for 4.16 08/21] powerpc: Wire up restartable sequences system call
From: Boqun Feng Wire up the rseq system call on powerpc. This provides an ABI improving the speed of a user-space getcpu operation on powerpc by skipping the getcpu system call on the fast path, as well as improving the speed of user-space operations on per-cpu data compared to using load-reservation/store-conditional atomics. Signed-off-by: Boqun Feng Signed-off-by: Mathieu Desnoyers CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Peter Zijlstra CC: "Paul E. McKenney" CC: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 449912f057f6..964321a5799c 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -389,3 +389,4 @@ COMPAT_SYS_SPU(preadv2) COMPAT_SYS_SPU(pwritev2) SYSCALL(kexec_file_load) SYSCALL(statx) +SYSCALL(rseq) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index 9ba11dbcaca9..e76bd5601ea4 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include -#define NR_syscalls384 +#define NR_syscalls385 #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index df8684f31919..b1980fcd56d5 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -395,5 +395,6 @@ #define __NR_pwritev2 381 #define __NR_kexec_file_load 382 #define __NR_statx 383 +#define __NR_rseq 384 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 2.11.0
Re: [PATCH v2] watchdog: core: make sure the watchdog worker always works
On 12/08/2017 02:18 AM, Christophe Leroy wrote: When running a command like 'chrt -f 50 dd if=/dev/zero of=/dev/null', the watchdog_worker fails to service the HW watchdog and the HW watchdog fires long before the watchdog soft timeout. At the moment, the watchdog_worker is invoked as a delayed work. Delayed works are handled by non realtime kernel threads. The WQ_HIGHPRI flag only increases the niceness of that threads. This patch replaces the delayed work logic by kthread delayed work, and sets the associated kernel task to SCHED_FIFO with the highest priority, in order to ensure that the watchdog worker will run as soon as possible. Signed-off-by: Christophe Leroy Reviewed-by: Guenter Roeck --- v2: Use kthread_delayed_work instead of hrtimer drivers/watchdog/watchdog_dev.c | 48 +++-- 1 file changed, 27 insertions(+), 21 deletions(-) diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c index 1e971a50d7fb..3dcb86f18055 100644 --- a/drivers/watchdog/watchdog_dev.c +++ b/drivers/watchdog/watchdog_dev.c @@ -39,6 +39,7 @@ #include /* For timeout functions */ #include/* For printk/panic/... */ #include /* For data references */ +#include/* For kthread_delayed_work */ #include/* For handling misc devices */ #include/* For module stuff/... */ #include /* For mutexes */ @@ -46,9 +47,10 @@ #include /* For memory functions */ #include /* For standard types (like size_t) */ #include /* For watchdog specific items */ -#include /* For workqueue */ #include /* For copy_to_user/put_user/... */ +#include /* For struct sched_param */ + #include "watchdog_core.h" #include "watchdog_pretimeout.h" @@ -67,7 +69,7 @@ struct watchdog_core_data { struct mutex lock; unsigned long last_keepalive; unsigned long last_hw_keepalive; - struct delayed_work work; + struct kthread_delayed_work work; unsigned long status; /* Internal status bits */ #define _WDOG_DEV_OPEN0 /* Opened ? */ #define _WDOG_ALLOW_RELEASE 1 /* Did we receive the magic char ? */ @@ -79,7 +81,7 @@ static dev_t watchdog_devt; /* Reference to watchdog device behind /dev/watchdog */ static struct watchdog_core_data *old_wd_data; -static struct workqueue_struct *watchdog_wq; +static struct kthread_worker *watchdog_kworker; static bool handle_boot_enabled = IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED); @@ -140,9 +142,10 @@ static inline void watchdog_update_worker(struct watchdog_device *wdd) long t = watchdog_next_keepalive(wdd); if (t > 0) - mod_delayed_work(watchdog_wq, &wd_data->work, t); + kthread_mod_delayed_work(watchdog_kworker, +&wd_data->work, t); } else { - cancel_delayed_work(&wd_data->work); + kthread_cancel_delayed_work_sync(&wd_data->work); } } @@ -154,8 +157,8 @@ static int __watchdog_ping(struct watchdog_device *wdd) int err; if (time_is_after_jiffies(earliest_keepalive)) { - mod_delayed_work(watchdog_wq, &wd_data->work, -earliest_keepalive - jiffies); + kthread_mod_delayed_work(watchdog_kworker, &wd_data->work, +earliest_keepalive - jiffies); return 0; } @@ -203,12 +206,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data) return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd)); } -static void watchdog_ping_work(struct work_struct *work) +static void watchdog_ping_work(struct kthread_work *work) { struct watchdog_core_data *wd_data; - wd_data = container_of(to_delayed_work(work), struct watchdog_core_data, - work); + wd_data = container_of(container_of(work, struct kthread_delayed_work, + work), + struct watchdog_core_data, work); mutex_lock(&wd_data->lock); if (watchdog_worker_should_ping(wd_data)) @@ -919,10 +923,10 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno) wd_data->wdd = wdd; wdd->wd_data = wd_data; - if (!watchdog_wq) + if (IS_ERR_OR_NULL(watchdog_kworker)) return -ENODEV; - INIT_DELAYED_WORK(&wd_data->work, watchdog_ping_work); + kthread_init_delayed_work(&wd_data->work, watchdog_ping_work); if (wdd->id == 0) { old_wd_data = wd_data; @@ -968,7 +972,8 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno) if (handle_boot_enabled) { __module_get(wdd->ops->owner);
Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
On Thu, 14 Dec 2017 23:16:26 +1100 Balbir Singh wrote: > On Thu, Dec 14, 2017 at 12:51 PM, Nicholas Piggin wrote: > >> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and > >> I'm worried about calling a stale nmi_ipi_function via the > >> system_reset_exception path, if we are OK with it, I can revisit > >> the code path > > > > You shouldn't get a stale one, that would also be a bug -- we > > have to cope with NMIs coming in at any time that are triggered > > externally (not by smp_send_nmi_ipi), so if you see any bugs > > there those need to be fixed separately. > > > > Yes, I think it's a bug, nothing clears nmi_ipi_function (from what > I can see), so when the next NMI comes in and goes into > pnv_system_reset_exception > it'll execute the stale handler. The CPU won't be in the nmi_ipi_pending_mask though, so it shouldn't get that far. You could add a bit of paranoia to clear the function pointer I suppose, but AFAIKS it's not needed. > I'll respin things based on the > suggestion above > and deal with any bugs as well. Thanks, Nick
Re: [PATCH] powerpc/perf: Dereference bhrb entries safely
On Tue, Dec 12, 2017 at 11:29 PM, Ravi Bangoria wrote: > It may very well happen that branch instructions recorded by > bhrb entries already get unmapped before they get processed by > the kernel. Hence, trying to dereference such memory location > will endup in a crash. Ex, > > Unable to handle kernel paging request for data at address > 0xc00819c41764 > Faulting instruction address: 0xc0084a14 > NIP [c0084a14] branch_target+0x4/0x70 > LR [c00eb828] record_and_restart+0x568/0x5c0 > Call Trace: > [c00eb3b4] record_and_restart+0xf4/0x5c0 (unreliable) > [c00ec378] perf_event_interrupt+0x298/0x460 > [c0027964] performance_monitor_exception+0x54/0x70 > [c0009ba4] performance_monitor_common+0x114/0x120 > > Fix this by deferefencing them safely. > > Suggested-by: Naveen N. Rao > Signed-off-by: Ravi Bangoria > --- > arch/powerpc/perf/core-book3s.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c > index 9e3da168d54c..5a68d2effdf9 100644 > --- a/arch/powerpc/perf/core-book3s.c > +++ b/arch/powerpc/perf/core-book3s.c > @@ -410,8 +410,11 @@ static __u64 power_pmu_bhrb_to(u64 addr) > int ret; > __u64 target; > > - if (is_kernel_addr(addr)) > - return branch_target((unsigned int *)addr); > + if (is_kernel_addr(addr)) { I think __kernel_text_address() is more accurate right? In which case you need to check for is_kernel_addr(addr) and if its not kernel_text_address() then we have an interesting case of a branch from something not text. It would be nice to catch such cases. > + if (probe_kernel_address((void *)addr, instr)) > + return 0; > + return branch_target(&instr); > + } > > /* Userspace: need copy instruction here then translate it */ > pagefault_disable(); Otherwise, Reviewed-by: Balbir Singh
Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
On Thu, Dec 14, 2017 at 12:51 PM, Nicholas Piggin wrote: > On Thu, 14 Dec 2017 11:12:13 +1100 > Balbir Singh wrote: > >> On Wed, 13 Dec 2017 20:51:01 +1000 >> Nicholas Piggin wrote: >> >> > This is looking pretty nice now... >> > >> > On Wed, 13 Dec 2017 19:08:28 +1100 >> > Balbir Singh wrote: >> > >> > > @@ -543,7 +543,25 @@ void smp_send_debugger_break(void) >> > > #ifdef CONFIG_KEXEC_CORE >> > > void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) >> > > { >> > > + int cpu; >> > > + >> > > smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 100); >> > > + if (kdump_in_progress() && crash_wake_offline) { >> > > + for_each_present_cpu(cpu) { >> > > + if (cpu_online(cpu)) >> > > + continue; >> > > + /* >> > > + * crash_ipi_callback will wait for >> > > + * all cpus, including offline CPUs. >> > > + * We don't care about nmi_ipi_function. >> > > + * Offline cpus will jump straight into >> > > + * crash_ipi_callback, we can skip the >> > > + * entire NMI dance and waiting for >> > > + * cpus to clear pending mask, etc. >> > > + */ >> > > + do_smp_send_nmi_ipi(cpu); >> > >> > Still a little bit concerned about using NMI IPI for this. >> > >> >> OK -- for offline CPUs you mean? > > Yes. > >> > If you take an NMI IPI from stop, the idle code should do the >> > right thing and we would just return the system reset wakeup >> > reason in SRR1 here (which does not need to be cleared). >> > >> > If you take the system reset anywhere else in the loop, it's >> > going to go out via system_reset_exception. I guess that >> > would end up doing the right thing, it probably gets to >> > crash_ipi_callback from crash_kexec_secondary? >> >> You mean like if we are online at the time of NMI'ing? If so >> the original loop will NMI us back into crash_ipi_callback >> anyway. We don't expect this to occur for offline CPUs > > No, if the offline CPU is executing any instruction except for > stop when the crash occurs. > OK, yeah >> >> > >> > It's just going to be a very untested code path :( What we >> > gain I suppose is better ability to handle a CPU that's locked >> > up somewhere in the cpu offline path. Assuming the uncommon >> > case works... >> > >> > Actually, if you *always* go via the system reset exception >> > handler, then code paths will be shared. That might be the >> > way to go. So I would check for system reset wakeup SRR1 reason >> > and call replay_system_reset() for it. What do you think? >> > >> >> We could do that, but that would call pnv_system_reset_exception >> and try to call the NMI function, but we've not used that path >> to initiate the NMI, so it should call the stale nmi_ipi_function >> which is crash_ipi_callback and not go via the crash_kexec path. > > It shouldn't, if the CPU is not set in the NMI bitmask, I think > it should go back out and do the rest of the system_reset_exception > handler. > > Anyway we have to get this case right, because it can already hit > as I said if the offline CPU takes the NMI when it is not stopped. > This is why I want to try to use a unified code path. > OK >> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and >> I'm worried about calling a stale nmi_ipi_function via the >> system_reset_exception path, if we are OK with it, I can revisit >> the code path > > You shouldn't get a stale one, that would also be a bug -- we > have to cope with NMIs coming in at any time that are triggered > externally (not by smp_send_nmi_ipi), so if you see any bugs > there those need to be fixed separately. > Yes, I think it's a bug, nothing clears nmi_ipi_function (from what I can see), so when the next NMI comes in and goes into pnv_system_reset_exception it'll execute the stale handler. I'll respin things based on the suggestion above and deal with any bugs as well. Balbir Singh.