Re: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML

2022-04-27 Thread Krzysztof Kozlowski
On 27/04/2022 22:08, Leo Li wrote:
>> Convert the fsl,ls-extirq binding to the new YAML format.
>>
>> In contrast to the original binding documentation, there are three
>> compatibles which are used in their corresponding device trees which have a
>> specific compatible and the (already documented) fallback
>> compatible:
>>  - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq"
>>  - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq"
>>  - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq"
>>
>> Depending on the number of the number of the external IRQs which is
>> usually 12 except for the LS1021A where there are only 6, the interrupt-map-
>> mask was reduced from 0x to 0xf and 0x7 respectively and the number
>> of interrupt-map entries have to match.
> 
> I assume this change won't prevent driver to be compatible with older device 
> trees using the 0x?  The original 0x should work for both 
> 6/12 interrupts or whatever reasonable number of interrupts that maybe used 
> in future SoCs.  So the purpose of this change is to make the binding more 
> specific to catch more errors in device tree?

Yes.

Best regards,
Krzysztof


Re: serial hang in qemu-system-ppc64 -M pseries

2022-04-27 Thread Rob Landley



On 4/28/22 00:41, Rob Landley wrote:
> On 4/27/22 10:27, Thomas Huth wrote:
>> On 26/04/2022 12.26, Rob Landley wrote:
>>> When I cut and paste 80-ish characters of text into the Linux serial 
>>> console, it
>>> reads 16 characters and stops. When I hit space, it reads another 16 
>>> characters,
>>> and if I keep at it will eventually catch up without losing data. If I type,
>>> every character shows up immediately.
>> 
>> That "16" certainly comes from VTERM_BUFSIZE in hw/char/spapr_vty.c in the 
>> QEMU sources, I think.
>> 
>>> (On other qemu targets and kernels I can cut and paste an entire uuencoded
>>> binary and it goes through just fine in one go, but this target hangs with 
>>> big
>>> pastes until I hit keys.)
>>> 
>>> Is this a qemu-side bug, or a kernel-side bug?
>>> 
>>> Kernel config attached (linux 5.18-rc3 or thereabouts), qemu invocation is:
>>> 
>>> qemu-system-ppc64 -M pseries -vga none -nographic -no-reboot -m 256 -kernel
>>> vmlinux -initrd powerpc64leroot.cpio.gz -append "panic=1 HOST=powerpc64le
>>> console=hvc0"
>> 
>> Which version of QEMU are you using?
> 
> $ qemu-system-ppc64 --version
> QEMU emulator version 6.2.92 (v6.2.0-rc2)
> Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

Just confirmed it behaves the same with current git (commit cf6f26d6f9b2).

Rob


[PATCH] powerpc/perf: Add support for caps under sysfs in powerpc

2022-04-27 Thread Athira Rajeev
Add caps support under "/sys/bus/event_source/devices//"
for powerpc. This directory can be used to expose some of the
specific features that powerpc PMU supports to the user.
Example: pmu_name. The name of PMU registered will depend on
platform, say power9 or power10 or it could be Generic Compat
PMU.

Currently the only way to know which is the registered
PMU is from the dmesg logs. But clearing the dmesg will make it
difficult to know exact PMU backend used. And even extracting
from dmesg will be complicated, as we need  to parse the dmesg
logs and add filters for pmu name. Whereas by exposing it via
caps will make it easy as we just need to directly read it from
the sysfs.

Add a caps directory to /sys/bus/event_source/devices/cpu/
for power8, power9, power10 and generic compat PMU.

The information exposed currently:
 - pmu_name : Underlying PMU name from the driver

Example result with power9 pmu:

 # ls /sys/bus/event_source/devices/cpu/caps
pmu_name

 # cat /sys/bus/event_source/devices/cpu/caps/pmu_name
power9

Signed-off-by: Athira Rajeev 
Reviewed-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/generic-compat-pmu.c | 20 
 arch/powerpc/perf/power10-pmu.c| 20 
 arch/powerpc/perf/power8-pmu.c | 20 
 arch/powerpc/perf/power9-pmu.c | 20 
 4 files changed, 80 insertions(+)

diff --git a/arch/powerpc/perf/generic-compat-pmu.c 
b/arch/powerpc/perf/generic-compat-pmu.c
index f3db88aee4dd..7b5fe2d89007 100644
--- a/arch/powerpc/perf/generic-compat-pmu.c
+++ b/arch/powerpc/perf/generic-compat-pmu.c
@@ -151,9 +151,29 @@ static const struct attribute_group 
generic_compat_pmu_format_group = {
.attrs = generic_compat_pmu_format_attr,
 };
 
+static ssize_t pmu_name_show(struct device *cdev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   return snprintf(buf, PAGE_SIZE, "generic_compat_pmu");
+}
+
+static DEVICE_ATTR_RO(pmu_name);
+
+static struct attribute *generic_compat_pmu_caps_attrs[] = {
+   &dev_attr_pmu_name.attr,
+   NULL
+};
+
+static struct attribute_group generic_compat_pmu_caps_group = {
+   .name  = "caps",
+   .attrs = generic_compat_pmu_caps_attrs,
+};
+
 static const struct attribute_group *generic_compat_pmu_attr_groups[] = {
&generic_compat_pmu_format_group,
&generic_compat_pmu_events_group,
+   &generic_compat_pmu_caps_group,
NULL,
 };
 
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index d3398100a60f..a622ff783719 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -258,6 +258,25 @@ static const struct attribute_group 
power10_pmu_format_group = {
.attrs = power10_pmu_format_attr,
 };
 
+static ssize_t pmu_name_show(struct device *cdev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   return snprintf(buf, PAGE_SIZE, "power10");
+}
+
+static DEVICE_ATTR_RO(pmu_name);
+
+static struct attribute *power10_pmu_caps_attrs[] = {
+   &dev_attr_pmu_name.attr,
+   NULL
+};
+
+static struct attribute_group power10_pmu_caps_group = {
+   .name  = "caps",
+   .attrs = power10_pmu_caps_attrs,
+};
+
 static const struct attribute_group *power10_pmu_attr_groups_dd1[] = {
&power10_pmu_format_group,
&power10_pmu_events_group_dd1,
@@ -267,6 +286,7 @@ static const struct attribute_group 
*power10_pmu_attr_groups_dd1[] = {
 static const struct attribute_group *power10_pmu_attr_groups[] = {
&power10_pmu_format_group,
&power10_pmu_events_group,
+   &power10_pmu_caps_group,
NULL,
 };
 
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index e37b1e714d2b..fbcd2bfa05ba 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -187,9 +187,29 @@ static const struct attribute_group 
power8_pmu_events_group = {
.attrs = power8_events_attr,
 };
 
+static ssize_t pmu_name_show(struct device *cdev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   return snprintf(buf, PAGE_SIZE, "power8");
+}
+
+static DEVICE_ATTR_RO(pmu_name);
+
+static struct attribute *power8_pmu_caps_attrs[] = {
+   &dev_attr_pmu_name.attr,
+   NULL
+};
+
+static struct attribute_group power8_pmu_caps_group = {
+   .name  = "caps",
+   .attrs = power8_pmu_caps_attrs,
+};
+
 static const struct attribute_group *power8_pmu_attr_groups[] = {
&isa207_pmu_format_group,
&power8_pmu_events_group,
+   &power8_pmu_caps_group,
NULL,
 };
 
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index c9eb5232e68b..11f290b6287f 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -258,9 +258,29 @@ static const struct attribute_group 
power9_pmu_format_group = {
.attrs = power9_pmu_format_attr,
 };
 
+static ssi

Re: serial hang in qemu-system-ppc64 -M pseries

2022-04-27 Thread Rob Landley
On 4/27/22 10:27, Thomas Huth wrote:
> On 26/04/2022 12.26, Rob Landley wrote:
>> When I cut and paste 80-ish characters of text into the Linux serial 
>> console, it
>> reads 16 characters and stops. When I hit space, it reads another 16 
>> characters,
>> and if I keep at it will eventually catch up without losing data. If I type,
>> every character shows up immediately.
> 
> That "16" certainly comes from VTERM_BUFSIZE in hw/char/spapr_vty.c in the 
> QEMU sources, I think.
> 
>> (On other qemu targets and kernels I can cut and paste an entire uuencoded
>> binary and it goes through just fine in one go, but this target hangs with 
>> big
>> pastes until I hit keys.)
>> 
>> Is this a qemu-side bug, or a kernel-side bug?
>> 
>> Kernel config attached (linux 5.18-rc3 or thereabouts), qemu invocation is:
>> 
>> qemu-system-ppc64 -M pseries -vga none -nographic -no-reboot -m 256 -kernel
>> vmlinux -initrd powerpc64leroot.cpio.gz -append "panic=1 HOST=powerpc64le
>> console=hvc0"
> 
> Which version of QEMU are you using?

$ qemu-system-ppc64 --version
QEMU emulator version 6.2.92 (v6.2.0-rc2)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

>From november. I can pull and rebuild but it'll take a bit. (Hopefully
rebuilding would fix the need to echo -e '\e[?7h' afterwards to undo the "bash
command line history marches up the screen because qemu's x86 bios disabled line
wrap and then left it that way" issue...)

> Which frontend (GTK or terminal?) ... 

The above command line has -nographic, forcing terminal. Running ldd on the
binary doesn't pull up anything gtk. (It pulls up libncursesw though.)

If you want to reproduce my test locally:

wget https://landley.net/toybox/downloads/binaries/mkroot/0.8.5/powerpc64le.tgz
tar xvzf powerpc64le.tgz
cd powerpc64le
./qemu-*.sh

Then paste something longer than 16 characters at the eventual command prompt
once the kernel finishes booting.

If you want to reproduce it all from source:

git clone https://github.com/landley/toybox
cd toybox && mkdir ccc && cd ccc
wget
https://landley.net/toybox/downloads/binaries/toolchains/latest/powerpc64le-linux-musl-cross.tar.xz
-O - | tar xv
cd ..
CROSS=powerpc64le LINUX=~/linux scripts/mkroot.sh
cd root/powerpc64le
./qemu-*.sh

This assumes your linux kernel source directory is in ~/linux of course, and
that qemu-system-ppc64 is in the $PATH...

> this rings a distant bell, but I thought we had fixed these issues long ago 
> in the past... e.g.:
>
> https://yhbt.net/lore/all/1380113886-16845-10-git-send-email-mdr...@linux.vnet.ibm.com/
> 
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8a273cbe53221d28

The qemu I'm running is newer than 2016. :)

Most targets are fine with this: I cut and paste entire uuencoded binaries into
the serial console as an easy way to insert a file into an initramfs. It can
usually take multiple megabytes without dropping a character, so you just
"uudecode" enter, and then paste.

Even my 32 bit powerpc target works fine with this. (Although -M g3beige is a
very different machine from -M pseries...)

Alas this target (and sh4 -m r2d) stop at 16 chars. (On sh4 the extra is
discarded, not delivered progressively as more interrupts are generated.)

> ... but maybe my memory also just fails and this has never been properly 
> fixed.
> 
>   Thomas

Rob


Re: [PATCH] powerpc/time: Always set decrementer in timer_interrupt()

2022-04-27 Thread Nicholas Piggin
Excerpts from Nicholas Piggin's message of April 21, 2022 12:07 pm:
> Excerpts from Michal Suchánek's message of April 21, 2022 12:28 am:
>> Hello,
>> 
>> On Thu, Apr 21, 2022 at 12:16:57AM +1000, Michael Ellerman wrote:
>>> This is a partial revert of commit 0faf20a1ad16 ("powerpc/64s/interrupt:
>>> Don't enable MSR[EE] in irq handlers unless perf is in use").
>>> 
>>> Prior to that commit, we always set the decrementer in
>>> timer_interrupt(), to clear the timer interrupt. Otherwise we could end
>>> up continuously taking timer interrupts.
>>> 
>>> When high res timers are enabled there is no problem seen with leaving
>>> the decrementer untouched in timer_interrupt(), because it will be
>>> programmed via hrtimer_interrupt() -> tick_program_event() ->
>>> clockevents_program_event() -> decrementer_set_next_event().
>>> 
>>> However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we
>> 
>> How difficult is it to detect this condition?
>> 
>> Maybe detecting this could be just added?
> 
> Possibly not too difficult but I'd like to see if we can get this to work
> in core timer code -
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-April/242212.html
> 
> I'll resend as a patch and see what flamage I get...

tglx merged it into his tree, so we could try again after its
upstream.

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=timers/core&id=62c1256d544747b38e77ca9b5bfe3a26f9592576

I'm kind of worried the patch will explode some strange clock event 
device in an obscure way so we may wait for a release or two first.

Thanks,
Nick


Re: [PATCH] KVM: PPC: Book3S HV: Initialize AMOR in nested entry

2022-04-27 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of April 26, 2022 12:21 am:
> The hypervisor always sets AMOR to ~0, but let's ensure we're not
> passing stale values around.
> 

Reviewed-by: Nicholas Piggin 

Looks like our L0 doesn't do anything with hvregs.amor ?

Thanks,
Nick

> Signed-off-by: Fabiano Rosas 
> ---
>  arch/powerpc/kvm/book3s_hv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 6fa518f6501d..b5f504576765 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3967,6 +3967,7 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
> *vcpu, u64 time_limit, uns
>  
>   kvmhv_save_hv_regs(vcpu, &hvregs);
>   hvregs.lpcr = lpcr;
> + hvregs.amor = ~0;
>   vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
>   hvregs.version = HV_GUEST_STATE_VERSION;
>   if (vcpu->arch.nested) {
> -- 
> 2.35.1
> 
> 


Re: [PATCH 20/30] panic: Add the panic informational notifier list

2022-04-27 Thread Paul E. McKenney
On Wed, Apr 27, 2022 at 07:49:14PM -0300, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to
> register callbacks to run earlier in the panic path than they
> currently do. This aims at informational mechanisms, like dumping
> kernel offsets and showing device error data (in case it's simple
> registers reading, for example) as well as mechanisms to disable
> log flooding (like hung_task detector / RCU warnings) and the
> tracing dump_on_oops (when enabled).
> 
> Any (non-invasive) information that should be provided before
> kmsg_dump() as well as log flooding preventing code should fit
> here, as long it offers relatively low risk for kdump.
> 
> For now, the patch is almost a no-op, although it changes a bit
> the ordering in which some panic notifiers are executed - specially
> affected by this are the notifiers responsible for disabling the
> hung_task detector / RCU warnings, which now run first. In a
> subsequent patch, the panic path will be refactored, then the
> panic informational notifiers will effectively run earlier,
> before ksmg_dump() (and usually before kdump as well).
> 
> We also defer documenting it all properly in the subsequent
> refactor patch. Finally, while at it, we removed some useless
> header inclusions too.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Catalin Marinas 
> Cc: Florian Fainelli 
> Cc: Frederic Weisbecker 
> Cc: "H. Peter Anvin" 
> Cc: Hari Bathini 
> Cc: Joel Fernandes 
> Cc: Jonathan Hunter 
> Cc: Josh Triplett 
> Cc: Lai Jiangshan 
> Cc: Leo Yan 
> Cc: Mathieu Desnoyers 
> Cc: Mathieu Poirier 
> Cc: Michael Ellerman 
> Cc: Mike Leach 
> Cc: Mikko Perttunen 
> Cc: Neeraj Upadhyay 
> Cc: Nicholas Piggin 
> Cc: Paul Mackerras 
> Cc: Suzuki K Poulose 
> Cc: Thierry Reding 
> Cc: Thomas Bogendoerfer 
> Signed-off-by: Guilherme G. Piccoli 

>From an RCU perspective:

Acked-by: Paul E. McKenney 

> ---
>  arch/arm64/kernel/setup.c | 2 +-
>  arch/mips/kernel/relocate.c   | 2 +-
>  arch/powerpc/kernel/setup-common.c| 2 +-
>  arch/x86/kernel/setup.c   | 2 +-
>  drivers/bus/brcmstb_gisb.c| 2 +-
>  drivers/hwtracing/coresight/coresight-cpu-debug.c | 4 ++--
>  drivers/soc/tegra/ari-tegra186.c  | 3 ++-
>  include/linux/panic_notifier.h| 1 +
>  kernel/hung_task.c| 3 ++-
>  kernel/panic.c| 4 
>  kernel/rcu/tree.c | 1 -
>  kernel/rcu/tree_stall.h   | 3 ++-
>  kernel/trace/trace.c  | 2 +-
>  13 files changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 3505789cf4bd..ac2c7e8c9c6a 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -444,7 +444,7 @@ static struct notifier_block arm64_panic_block = {
>  
>  static int __init register_arm64_panic_block(void)
>  {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_info_list,
>  &arm64_panic_block);
>   return 0;
>  }
> diff --git a/arch/mips/kernel/relocate.c b/arch/mips/kernel/relocate.c
> index 56b51de2dc51..650811f2436a 100644
> --- a/arch/mips/kernel/relocate.c
> +++ b/arch/mips/kernel/relocate.c
> @@ -459,7 +459,7 @@ static struct notifier_block kernel_location_notifier = {
>  
>  static int __init register_kernel_offset_dumper(void)
>  {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_info_list,
>  &kernel_location_notifier);
>   return 0;
>  }
> diff --git a/arch/powerpc/kernel/setup-common.c 
> b/arch/powerpc/kernel/setup-common.c
> index 1468c3937bf4..d04b8bf8dbc7 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -757,7 +757,7 @@ void __init setup_panic(void)
>  &ppc_fadump_block);
>  
>   if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_info_list,
>  &kernel_offset_notifier);
>  
>   /* Low-level platform-specific routines that should run on panic */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index c95b9ac5a457..599b25346964 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1266,7 +1266,7 @@ static struct notifier_block kernel_offset_notifier = {
>  
>  static int __init register_kernel_offset_dumper(void)
>  {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_info_list,
>   &kernel_offset_notifier);
> 

[PATCH 00/30] The panic notifiers refactor

2022-04-27 Thread Guilherme G. Piccoli
Hey folks, this is an attempt to improve/refactor the dated panic notifiers
infrastructure. This is strongly based in a suggestion made by Pter Mladek [0]
some time ago, and it's finally ready. Below I'll detail the patch ordering,
testing made, etc.
First, a bit about the reason behind this.

The panic notifiers list is an infrastructure that allows callbacks to execute
during panic time. Happens that anybody can add functions there, no ordering
is enforced (by default) and the decision to execute or not such notifiers
before kdump may lead to high risk of failure in crash scenarios - default is
not to execute any of them. There is a parameter acting as a switch for that.
But some architectures require some notifiers, so..it's messy.

The suggestion from Petr came after a patch submission to add a notifiers
filter, allowing the notifiers selection by function name, which was welcomed
by some people, but not by Petr, which claimed the code should indeed have a
refactor - and it made a lot of sense, his suggestion makes code more clear
and reliable.

So, this series might be split in 3 portions:

Part 1: the first 18 patches are mostly fixes (one or two might be considered
improvements), mostly replacing spinlocks/mutexes with safer alternatives for
atomic contexts, like spin_trylock, etc. We also focused on commenting
everything that is possible and clean-up code.

Part 2, the core: patches 19-25 are the main refactor, which splits the panic
notifiers list in three, introduce the concept of panic notifier level and
clean-up and highly comment the code, effectively leading to a more reliable
and clear, yet highly customizable panic path.

Part 3: The remaining 5 patches are fixes that _require the main refactor_
patches, they don't make sense without the core changes - but again, these are
small fixes and not part of the main goal of refactoring the panic code.

I've tried my best to make the patches the more "bisectable" as possible, so
they tend to be self-contained and easy to backport (specially patches from
part 1). Notice that the series is *based on 5.18-rc4* - usually a refactor
like this would be based on linux-next, but since we have many fixes in the
series, I kept it based on mainline tree. Of course I could change that in a
subsequent iteration, if desired.

Since this touches multiple architectures and drivers, it's very difficult to
test it really (by executing all touched code). So, my tests split in two
approaches: build tests and real tests, that involves panic triggering with
and without kdump, changing panic notifiers level, etc.

Build tests (using cross-compilers): alpha, arm, arm64, mips (sgi 22 and 32),
parisc, s390, sparc, um, x86_64 (couldn't get a functional xtensa cross
compiler).

Real/full tests: x86_64 (Hyper-V and QEMU guests) + PowerPC (pseries guest).

Here is the link with the .config files used: 
https://people.igalia.com/gpiccoli/panic_notifiers_configs/
(tried my best to build all the affected code).

Finally, a bit about my CCing strategy: I've included everybody present in the
original thread [0] plus some maintainers and other interested parties as CC
in the full series. But the patches have individual CC lists, for people that
are definitely related to them but might not care much for the whole series;
nevertheless, _everybody_ mentioned at least once in some patch is CCed in this
cover-letter. Hopefully I didn't forget to include anybody - all the mailing
lists were CCed in the whole series. Apologies in advance if (a) you received
emails you didn't want to or, (b) I forgot to include you but it was something
considered interesting by you.

Thanks in advance for reviews / comments / suggestions!
Cheers,

Guilherme

[0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

Guilherme G. Piccoli (30):
  x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart
  ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path
  notifier: Add panic notifiers info and purge trailing whitespaces
  firmware: google: Convert regular spinlock into trylock on panic path
  misc/pvpanic: Convert regular spinlock into trylock on panic path
  soc: bcm: brcmstb: Document panic notifier action and remove useless header
  mips: ip22: Reword PANICED to PANICKED and remove useless header
  powerpc/setup: Refactor/untangle panic notifiers
  coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
  alpha: Clean-up the panic notifier code
  um: Improve panic notifiers consistency and ordering
  parisc: Replace regular spinlock with spin_trylock on panic path
  s390/consoles: Improve panic notifiers reliability
  panic: Properly identify the panic event to the notifiers' callbacks
  bus: brcmstb_gisb: Clean-up panic/die notifiers
  drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic 
notifiers
  tracing: Improve panic/die notifiers
  notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
  panic: Add the panic hypervisor

[PATCH 03/30] notifier: Add panic notifiers info and purge trailing whitespaces

2022-04-27 Thread Guilherme G. Piccoli
Although many notifiers are mentioned in the comments, the panic
notifiers infrastructure is not. Also, the file contains some
trailing whitespaces. This commit fix both issues.

Cc: Arjan van de Ven 
Cc: Cong Wang 
Cc: Sebastian Andrzej Siewior 
Cc: Valentin Schneider 
Cc: Xiaoming Ni 
Signed-off-by: Guilherme G. Piccoli 
---
 include/linux/notifier.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 87069b8459af..0589896fc7bd 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -201,12 +201,12 @@ static inline int notifier_to_errno(int ret)
 
 /*
  * Declared notifiers so far. I can imagine quite a few more chains
- * over time (eg laptop power reset chains, reboot chain (to clean 
+ * over time (eg laptop power reset chains, reboot chain (to clean
  * device units up), device [un]mount chain, module load/unload chain,
- * low memory chain, screenblank chain (for plug in modular 
screenblankers) 
+ * low memory chain, screenblank chain (for plug in modular screenblankers)
  * VC switch chains (for loadable kernel svgalib VC switch helpers) etc...
  */
- 
+
 /* CPU notfiers are defined in include/linux/cpu.h. */
 
 /* netdevice notifiers are defined in include/linux/netdevice.h */
@@ -217,6 +217,8 @@ static inline int notifier_to_errno(int ret)
 
 /* Virtual Terminal events are defined in include/linux/vt.h. */
 
+/* Panic notifiers are defined in include/linux/panic_notifier.h. */
+
 #define NETLINK_URELEASE   0x0001  /* Unicast netlink socket released */
 
 /* Console keyboard events.
-- 
2.36.0



[PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header

2022-04-27 Thread Guilherme G. Piccoli
The panic notifier of this driver is very simple code-wise, just a memory
write to a special position with some numeric code. But this is not clear
from the semantic point-of-view, and there is no public documentation
about that either.

After discussing this in the mailing-lists [0] and having Florian explained
it very well, this patch just document that in the code for the future
generations asking the same questions. Also, it removes a useless header.

[0] https://lore.kernel.org/lkml/781cafb0-8d06-8b56-907a-5175c2da1...@gmail.com

Fixes: 0b741b8234c8 ("soc: bcm: brcmstb: Add support for S2/S3/S5 suspend 
states (ARM)")
Cc: Brian Norris 
Cc: Doug Berger 
Cc: Florian Fainelli 
Cc: Justin Chen 
Cc: Lee Jones 
Cc: Markus Mayer 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/soc/bcm/brcmstb/pm/pm-arm.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/bcm/brcmstb/pm/pm-arm.c 
b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
index 3cbb165d6e30..870686ae042b 100644
--- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
+++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -664,7 +663,20 @@ static void __iomem *brcmstb_ioremap_match(const struct 
of_device_id *matches,
 
return of_io_request_and_map(dn, index, dn->full_name);
 }
-
+/*
+ * The AON is a small domain in the SoC that can retain its state across
+ * various system wide sleep states and specific reset conditions; the
+ * AON DATA RAM is a small RAM of a few words (< 1KB) which can store
+ * persistent information across such events.
+ *
+ * The purpose of the below panic notifier is to help with notifying
+ * the bootloader that a panic occurred and so that it should try its
+ * best to preserve the DRAM contents holding that buffer for recovery
+ * by the kernel as opposed to wiping out DRAM clean again.
+ *
+ * Reference: comment from Florian Fainelli, at
+ * https://lore.kernel.org/lkml/781cafb0-8d06-8b56-907a-5175c2da1...@gmail.com
+ */
 static int brcmstb_pm_panic_notify(struct notifier_block *nb,
unsigned long action, void *data)
 {
-- 
2.36.0



[PATCH 01/30] x86/crash, reboot: Avoid re-disabling VMX in all CPUs on crash/restart

2022-04-27 Thread Guilherme G. Piccoli
In the panic path we have a list of functions to be called, the panic
notifiers - such callbacks perform various actions in the machine's
last breath, and sometimes users want them to run before kdump. We
have the parameter "crash_kexec_post_notifiers" for that. When such
parameter is used, the function "crash_smp_send_stop()" is executed
to poweroff all secondary CPUs through the NMI-shootdown mechanism;
part of this process involves disabling virtualization features in
all CPUs (except the main one).

Now, in the emergency restart procedure we have also a way of
disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
what happens though is that in case we already NMI-disabled all CPUs,
the emergency restart fails due to a second addition of the same items
in the NMI list, as per the following log output:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
[...]
Rebooting in 2 seconds..
list_add double add: new=, prev=, next=.
[ cut here ]
kernel BUG at lib/list_debug.c:29!
invalid opcode:  [#1] PREEMPT SMP PTI

In order to reproduce the problem, users just need to set the kernel
parameter "crash_kexec_post_notifiers" *without* kdump set in any
system with the VMX feature present.

Since there is no benefit in re-disabling VMX in all CPUs in case
it was already done, this patch prevents that by guarding the restart
routine against doubly issuing NMIs unnecessarily. Notice we still
need to disable VMX locally in the emergency restart.

Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is 
supported)
Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly 
version in panic path")
Cc: David P. Reed 
Cc: Hidehiro Kawai 
Cc: Paolo Bonzini 
Cc: Sean Christopherson 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/x86/include/asm/cpu.h |  1 +
 arch/x86/kernel/crash.c|  8 
 arch/x86/kernel/reboot.c   | 14 --
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 86e5e4e26fcb..b6a9062d387f 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -36,6 +36,7 @@ extern int _debug_hotplug_cpu(int cpu, int action);
 #endif
 #endif
 
+extern bool crash_cpus_stopped;
 int mwait_usable(const struct cpuinfo_x86 *);
 
 unsigned int x86_family(unsigned int sig);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e8326a8d1c5d..71dd1a990e8d 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -42,6 +42,8 @@
 #include 
 #include 
 
+bool crash_cpus_stopped;
+
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
struct boot_params *params;
@@ -108,9 +110,7 @@ void kdump_nmi_shootdown_cpus(void)
 /* Override the weak function in kernel/panic.c */
 void crash_smp_send_stop(void)
 {
-   static int cpus_stopped;
-
-   if (cpus_stopped)
+   if (crash_cpus_stopped)
return;
 
if (smp_ops.crash_stop_other_cpus)
@@ -118,7 +118,7 @@ void crash_smp_send_stop(void)
else
smp_send_stop();
 
-   cpus_stopped = 1;
+   crash_cpus_stopped = true;
 }
 
 #else
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index fa700b46588e..2fc42b8402ac 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -589,8 +589,18 @@ static void native_machine_emergency_restart(void)
int orig_reboot_type = reboot_type;
unsigned short mode;
 
-   if (reboot_emergency)
-   emergency_vmx_disable_all();
+   /*
+* We can reach this point in the end of panic path, having
+* NMI-disabled all secondary CPUs. This process involves
+* disabling the CPU virtualization technologies, so if that
+* is the case, we only miss disabling the local CPU VMX...
+*/
+   if (reboot_emergency) {
+   if (!crash_cpus_stopped)
+   emergency_vmx_disable_all();
+   else
+   cpu_emergency_vmxoff();
+   }
 
tboot_shutdown(TB_SHUTDOWN_REBOOT);
 
-- 
2.36.0



[PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-04-27 Thread Guilherme G. Piccoli
Currently the gsmi driver registers a panic notifier as well as
reboot and die notifiers. The callbacks registered are called in
atomic and very limited context - for instance, panic disables
preemption, local IRQs and all other CPUs that aren't running the
current panic function.

With that said, taking a spinlock in this scenario is a
dangerous invitation for a deadlock scenario. So, we fix
that in this commit by changing the regular spinlock with
a trylock, which is a safer approach.

Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
Cc: Ard Biesheuvel 
Cc: David Gow 
Cc: Evan Green 
Cc: Julius Werner 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/firmware/google/gsmi.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index adaa492c3d2d..b01ed02e4a87 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -629,7 +629,10 @@ static int gsmi_shutdown_reason(int reason)
if (saved_reason & (1 << reason))
return 0;
 
-   spin_lock_irqsave(&gsmi_dev.lock, flags);
+   if (!spin_trylock_irqsave(&gsmi_dev.lock, flags)) {
+   rc = -EBUSY;
+   goto out;
+   }
 
saved_reason |= (1 << reason);
 
@@ -646,6 +649,7 @@ static int gsmi_shutdown_reason(int reason)
 
spin_unlock_irqrestore(&gsmi_dev.lock, flags);
 
+out:
if (rc < 0)
printk(KERN_ERR "gsmi: Log Shutdown Reason failed\n");
else
-- 
2.36.0



[PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

2022-04-27 Thread Guilherme G. Piccoli
Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
is responsible for that. This makes sense, since we're turning off
such CPUs, putting them in an endless busy-wait loop.

Problem is that there is an alternative path for disabling CPUs,
in the form of function crash_smp_send_stop(), used for kexec/panic
paths. This functions relies in a SMP call that also triggers a
busy-wait loop [at machine_crash_nonpanic_core()], but *without*
disabling interrupts. This might lead to odd scenarios, like early
interrupts in the boot of kexec'd kernel or even interrupts in
other CPUs while the main one still works in the panic path and
assumes all secondary CPUs are (really!) off.

This patch mimics the ipi_cpu_stop() interrupt disable mechanism
in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
secondary CPUs in the kexec/panic path as well.

Cc: Marc Zyngier 
Cc: Russell King 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/arm/kernel/machine_kexec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..ef788ee00519 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
set_cpu_online(smp_processor_id(), false);
atomic_dec(&waiting_for_crash_ipi);
 
+   local_fiq_disable();
+   local_irq_disable();
+
while (1) {
cpu_relax();
wfe();
-- 
2.36.0



[PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-04-27 Thread Guilherme G. Piccoli
The pvpanic driver relies on panic notifiers to execute a callback
on panic event. Such function is executed in atomic context - the
panic function disables local IRQs, preemption and all other CPUs
that aren't running the panic code.

With that said, it's dangerous to use regular spinlocks in such path,
as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
This patch fixes that by replacing regular spinlocks with the trylock
safer approach.

It also fixes an old comment (about a long gone framebuffer code) and
the notifier priority - we should execute hypervisor notifiers early,
deferring this way the panic action to the hypervisor, as expected by
the users that are setting up pvpanic.

Fixes: b3c0f8774668 ("misc/pvpanic: probe multiple instances")
Cc: Christophe JAILLET 
Cc: Mihai Carabas 
Cc: Shile Zhang 
Cc: Wang ShaoBo 
Cc: zhenwei pi 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/misc/pvpanic/pvpanic.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/pvpanic/pvpanic.c b/drivers/misc/pvpanic/pvpanic.c
index 4b8f1c7d726d..049a12006348 100644
--- a/drivers/misc/pvpanic/pvpanic.c
+++ b/drivers/misc/pvpanic/pvpanic.c
@@ -34,7 +34,9 @@ pvpanic_send_event(unsigned int event)
 {
struct pvpanic_instance *pi_cur;
 
-   spin_lock(&pvpanic_lock);
+   if (!spin_trylock(&pvpanic_lock))
+   return;
+
list_for_each_entry(pi_cur, &pvpanic_list, list) {
if (event & pi_cur->capability & pi_cur->events)
iowrite8(event, pi_cur->base);
@@ -55,9 +57,13 @@ pvpanic_panic_notify(struct notifier_block *nb, unsigned 
long code, void *unused
return NOTIFY_DONE;
 }
 
+/*
+ * Call our notifier very early on panic, deferring the
+ * action taken to the hypervisor.
+ */
 static struct notifier_block pvpanic_panic_nb = {
.notifier_call = pvpanic_panic_notify,
-   .priority = 1, /* let this called before broken drm_fb_helper() */
+   .priority = INT_MAX,
 };
 
 static void pvpanic_remove(void *param)
-- 
2.36.0



[PATCH 30/30] um: Avoid duplicate call to kmsg_dump()

2022-04-27 Thread Guilherme G. Piccoli
Currently the panic notifier panic_exit() calls kmsg_dump() and
some console flushing routines - this makes sense since such
panic notifier exits UserMode Linux and never returns.

Happens that after a panic refactor, kmsg_dump() is now always
called *before* the pre_reboot list of panic notifiers, in which
panic_exit() belongs, leading to a double call situation.

This patch changes that by removing such call from the panic
notifier, but leaving the console flushing calls since the
pre_reboot list still runs before console flushing on panic().

Cc: Anton Ivanov 
Cc: Johannes Berg 
Cc: Richard Weinberger 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/um/kernel/um_arch.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index fc6e443299da..651310e3e86f 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -241,7 +241,6 @@ static void __init uml_postsetup(void)
 static int panic_exit(struct notifier_block *self, unsigned long unused1,
  void *unused2)
 {
-   kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(1);
bust_spinlocks(0);
uml_exitcode = 1;
-- 
2.36.0



[PATCH 29/30] powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic

2022-04-27 Thread Guilherme G. Piccoli
Currently both pseries and ps3 are platforms that define special
panic notifiers that run as callbacks inside powerpc generic panic
notifier. In both cases kmsg_dump() is called, and the reason seems
to be that both of these callbacks aims to effectively stop the
machine, so nothing would execute after that - hence, both force
a series of console flushing related operations, after calling
the kmsg dumpers.

Happens that recently the panic path was refactored, and now
kmsg_dump() is *certainly* called before the pre_reboot panic
notifiers, category in which both pseries/ps3 callbacks belong.
In other words: kmsg_dump() will execute twice in both platforms,
on panic path.

This patch prevents that by disabling the kmsg_dump() for both
platform's notifiers. But worth to notice that PowerNV still
has a legit use for executing kmsg_dump() in its unrecoverable
error path, so we rely in parameter passing to differentiate
both cases. Also, since the pre_reboot notifiers still run
earlier than console flushing routines, we kept that for
both pseries and ps3 platforms, only skipping kmsg_dump().

Fixes: 35adacd6fc48 ("powerpc/pseries, ps3: panic flush kernel messages before 
halting system")
Cc: Benjamin Herrenschmidt 
Cc: Hari Bathini 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Signed-off-by: Guilherme G. Piccoli 
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment.

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/include/asm/bug.h | 2 +-
 arch/powerpc/kernel/traps.c| 6 --
 arch/powerpc/platforms/powernv/opal.c  | 2 +-
 arch/powerpc/platforms/ps3/setup.c | 2 +-
 arch/powerpc/platforms/pseries/setup.c | 2 +-
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index ecbae1832de3..49e5f6f86869 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -166,7 +166,7 @@ extern void die(const char *, struct pt_regs *, long);
 void die_mce(const char *str, struct pt_regs *regs, long err);
 extern bool die_will_crash(void);
 extern void panic_flush_kmsg_start(void);
-extern void panic_flush_kmsg_end(void);
+extern void panic_flush_kmsg_end(bool dump);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a08bb7cefdc5..837a5ed98d62 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -169,9 +169,11 @@ extern void panic_flush_kmsg_start(void)
bust_spinlocks(1);
 }
 
-extern void panic_flush_kmsg_end(void)
+extern void panic_flush_kmsg_end(bool dump)
 {
-   kmsg_dump(KMSG_DUMP_PANIC);
+   if (dump)
+   kmsg_dump(KMSG_DUMP_PANIC);
+
bust_spinlocks(0);
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 55a8fbfdb5b2..d172ceedece2 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -641,7 +641,7 @@ void __noreturn pnv_platform_error_reboot(struct pt_regs 
*regs, const char *msg)
show_regs(regs);
smp_send_stop();
 
-   panic_flush_kmsg_end();
+   panic_flush_kmsg_end(true);
 
/*
 * Don't bother to shut things down because this will
diff --git a/arch/powerpc/platforms/ps3/setup.c 
b/arch/powerpc/platforms/ps3/setup.c
index 3de9145c20bc..7cb78e508fb3 100644
--- a/arch/powerpc/platforms/ps3/setup.c
+++ b/arch/powerpc/platforms/ps3/setup.c
@@ -102,7 +102,7 @@ static void ps3_panic(char *str)
printk("   System does not reboot automatically.\n");
printk("   Please press POWER button.\n");
printk("\n");
-   panic_flush_kmsg_end();
+   panic_flush_kmsg_end(false);
 
while(1)
lv1_pause(1);
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 955ff8aa1644..d6eea473eafd 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -856,7 +856,7 @@ static void __init pSeries_setup_arch(void)
 
 static void pseries_panic(char *str)
 {
-   panic_flush_kmsg_end();
+   panic_flush_kmsg_end(false);
rtas_os_term(str);
 }
 
-- 
2.36.0



[PATCH 28/30] panic: Unexport crash_kexec_post_notifiers

2022-04-27 Thread Guilherme G. Piccoli
There is no users anymore of this variable that requires
it to be "exported" in the headers; also, it was deprecated
by the kernel parameter "panic_notifiers_level".

Signed-off-by: Guilherme G. Piccoli 
---
 include/linux/panic.h  | 2 --
 include/linux/panic_notifier.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/include/linux/panic.h b/include/linux/panic.h
index 34175d0188d0..d301db07a8af 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -34,8 +34,6 @@ extern int sysctl_panic_on_rcu_stall;
 extern int sysctl_max_rcu_stall_to_panic;
 extern int sysctl_panic_on_stackoverflow;
 
-extern bool crash_kexec_post_notifiers;
-
 /*
  * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
  * holds a CPU number which is executing panic() currently. A value of
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index b5041132321d..8fda7045e2f7 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -11,7 +11,6 @@ extern struct atomic_notifier_head panic_pre_reboot_list;
 extern struct atomic_notifier_head panic_post_reboot_list;
 
 bool panic_notifiers_before_kdump(void);
-extern bool crash_kexec_post_notifiers;
 
 enum panic_notifier_val {
PANIC_UNUSED,
-- 
2.36.0



[PATCH 27/30] powerpc: Do not force all panic notifiers to execute before kdump

2022-04-27 Thread Guilherme G. Piccoli
Commit 06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in
vmcore generated with panic") introduced a hardcoded setting of kernel
parameter "crash_kexec_post_notifiers", effectively forcing all the
panic notifiers to execute earlier in the panic path, before kdump.

The reason for that was a fadump issue on collecting data accurately,
due to smp_send_stop() setting all CPUs offline, so the net effect
desired with this change was to avoid calling the regular CPU
shutdown function, and instead rely on crash_smp_send_stop(), which
copes fine with fadump. The collateral effect was to increase the
risk for kdump if fadump is not used, since it forces all panic
notifiers to execute early, before kdump.

Happens that, after a panic refactor, crash_smp_send_stop() is
now used by default in the panic path, so there is no reason to
mess with the notifiers ordering (which was also improved in the
refactor) from within arch code.

Fixes: 06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in vmcore 
generated with panic")
Cc: Benjamin Herrenschmidt 
Cc: Hari Bathini 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Signed-off-by: Guilherme G. Piccoli 
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment.

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/kernel/fadump.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 65562c4a0a69..35ae8c09af66 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1649,14 +1649,6 @@ int __init setup_fadump(void)
register_fadump();
}
 
-   /*
-* In case of panic, fadump is triggered via ppc_panic_event()
-* panic notifier. Setting crash_kexec_post_notifiers to 'true'
-* lets panic() function take crash friendly path before panic
-* notifiers are invoked.
-*/
-   crash_kexec_post_notifiers = true;
-
return 1;
 }
 /*
-- 
2.36.0



[PATCH 26/30] Drivers: hv: Do not force all panic notifiers to execute before kdump

2022-04-27 Thread Guilherme G. Piccoli
Since commit a11589563e96 ("x86/Hyper-V: Report crash register
data or kmsg before running crash kernel") Hyper-V forcibly sets
the kernel parameter "crash_kexec_post_notifiers"; with that, it
did enforce the execution of *all* panic notifiers before kdump.
The main reason behind that is that Hyper-V has an hypervisor
notification mechanism that has the ability of warning the
hypervisor when the guest panics.

Happens that after the panic notifiers refactor, we now have 3 lists
and a level mechanism that defines the ordering of the notifiers
execution with regards to kdump. And for Hyper-V, the specific
notifier to inform the hypervisor about a panic lies in the first
list, which *by default* is set to execute before kdump. Hence,
this patch removes the hardcoded setting, effectively reverting
the aforementioned commit.

One of the problems with the forced approach was greatly exposed by
commit d57d6fe5bf34 ("drivers: hv: log when enabling 
crash_kexec_post_notifiers")
which ended-up confusing the user that didn't expect the notifiers
to execute before kdump, since it's a user setting and wasn't
enabled by such user. With the patch hereby proposed, that kind
of issue doesn't happen anymore, the panic notifiers level is
well-documented and users can expect a predictable behavior.

Fixes: a11589563e96 ("x86/Hyper-V: Report crash register data or kmsg before 
running crash kernel")
Fixes: d57d6fe5bf34 ("drivers: hv: log when enabling crash_kexec_post_notifiers"
Cc: Andrea Parri (Microsoft) 
Cc: Dexuan Cui 
Cc: Haiyang Zhang 
Cc: "K. Y. Srinivasan" 
Cc: Michael Kelley 
Cc: Stephen Brennan 
Cc: Stephen Hemminger 
Cc: Tianyu Lan 
Cc: Wei Liu 
Tested-by: Fabio A M Martins 
Signed-off-by: Guilherme G. Piccoli 
---

Special thanks to Michael Kelley for the good information about the Hyper-V
panic path in email threads some months ago, and to Fabio for the testing
performed.

 drivers/hv/hv_common.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae68298c0dca..af59793de523 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -73,18 +73,6 @@ int __init hv_common_init(void)
 {
int i;
 
-   /*
-* Hyper-V expects to get crash register data or kmsg when
-* crash enlightment is available and system crashes. Set
-* crash_kexec_post_notifiers to be true to make sure that
-* calling crash enlightment interface before running kdump
-* kernel.
-*/
-   if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
-   crash_kexec_post_notifiers = true;
-   pr_info("Hyper-V: enabling crash_kexec_post_notifiers\n");
-   }
-
/*
 * Allocate the per-CPU state for the hypercall input arg.
 * If this allocation fails, we will not be able to setup
-- 
2.36.0



[PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier

2022-04-27 Thread Guilherme G. Piccoli
Currently the parameter "panic_print" relies in a function called
directly on panic path; one of the flags the users can set for
panic_print triggers a console replay mechanism, to show the
entire kernel log buffer (from the beginning) in a panic event.

Two problems with that: the dual nature of the panic_print
isn't really appropriate, the function was originally meant
to allow users dumping system information on panic events,
and was "overridden" to also force a console flush of the full
kernel log buffer. It also turns the code a bit more complex
and duplicate than it needs to be.

This patch proposes 2 changes: first, we decouple panic_print
from the console flushing mechanism, in the form of a new kernel
core parameter (panic_console_replay); we kept the functionality
on panic_print to avoid userspace breakage, although we comment
in both code and documentation that this panic_print usage is
deprecated.

We converted panic_print function to a panic notifier too, adding
it on the panic informational notifier list, executed as the final
callback. This allows a more clear code and makes sense, as
panic_print_sys_info() is really a panic-time only function.
We also moved its code to kernel/printk.c, it seems to make more
sense given it's related to printing stuff.

Suggested-by: Petr Mladek 
Signed-off-by: Guilherme G. Piccoli 
---
 .../admin-guide/kernel-parameters.txt | 12 +++-
 Documentation/admin-guide/sysctl/kernel.rst   |  5 +-
 include/linux/console.h   |  2 +
 include/linux/panic.h |  1 -
 include/linux/printk.h|  1 +
 kernel/panic.c| 51 +++
 kernel/printk/printk.c| 62 +++
 7 files changed, 87 insertions(+), 47 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 8d3524060ce3..c99da8b2b216 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3791,6 +3791,14 @@
timeout < 0: reboot immediately
Format: 
 
+   panic_console_replay
+   [KNL] Force a kernel log replay in the console on
+   panic event. Notice that there is already a flush
+   mechanism for pending messages; this option is meant
+   for users that wish to replay the *full* buffer.
+   It deprecates the bit 5 setting on "panic_print",
+   both having the same functionality.
+
panic_notifiers_level=
[KNL] Set the panic notifiers execution order.
Format: 
@@ -3825,12 +3833,14 @@
bit 2: print timer info
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer
-   bit 5: print all printk messages in buffer
+   bit 5: print all printk messages in buffer (DEPRECATED)
bit 6: print all CPUs backtrace (if available in the 
arch)
*Be aware* that this option may print a _lot_ of lines,
so there are risks of losing older messages in the log.
Use this option carefully, maybe worth to setup a
bigger log buffer with "log_buf_len" along with this.
+   Also, notice that bit 5 was deprecated in favor of the
+   parameter "panic_console_replay".
 
panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
Format: [,nousertaint]
diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
b/Documentation/admin-guide/sysctl/kernel.rst
index 1144ea3229a3..17b293a0e566 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -763,10 +763,13 @@ bit 1  print system memory info
 bit 2  print timer info
 bit 3  print locks info if ``CONFIG_LOCKDEP`` is on
 bit 4  print ftrace buffer
-bit 5  print all printk messages in buffer
+bit 5  print all printk messages in buffer (DEPRECATED)
 bit 6  print all CPUs backtrace (if available in the arch)
 =  
 
+Notice that bit 5 was deprecated in favor of kernel core parameter
+"panic_console_replay" (see kernel-parameters.txt documentation).
+
 So for example to print tasks and memory info on panic, user can::
 
   echo 3 > /proc/sys/kernel/panic_print
diff --git a/include/linux/console.h b/include/linux/console.h
index 7cd758a4f44e..351c14f623ad 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -169,6 +169,8 @@ enum con_flush_mode {
CONSOLE_REPLAY_ALL,
 };
 
+extern bool panic_console_replay;
+
 extern int add_preferred_console(char *name, int idx, 

[PATCH 24/30] panic: Refactor the panic path

2022-04-27 Thread Guilherme G. Piccoli
The panic() function is somewhat convoluted - a lot of changes were
made over the years, adding comments that might be misleading/outdated
now, it has a code structure that is a bit complex to follow, with
lots of conditionals, for example. The panic notifier list is something
else - a single list, with multiple callbacks of different purposes,
that run in a non-deterministic order and may affect hardly kdump
reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.

This patch proposes a major refactor on the panic path based on Petr's
idea [0] - basically we split the notifiers list in three, having a set
of different call points in the panic path. Below a list of changes
proposed in this patch, culminating in the panic notifiers level
concept:

(a) First of all, we improved comments all over the function
and removed useless variables / includes. Also, as part of this
clean-up we concentrate the console flushing functions in a helper.

(b) As mentioned before, there is a split of the panic notifier list
in three, based on the purpose of the callback. The code contains
good documentation in form of comments, but a summary of the three
lists follows:

- the hypervisor list aims low-risk procedures to inform hypervisors
or firmware about the panic event, also includes LED-related functions;

- the informational list contains callbacks that provide more details,
like kernel offset or trace dump (if enabled) and also includes the
callbacks aimed at reducing log pollution or warns, like the RCU and
hung task disable callbacks;

- finally, the pre_reboot list is the old notifier list renamed,
containing the more risky callbacks that didn't fit the previous
lists. There is also a 4th list (the post_reboot one), but it's not
related with the original list - it contains late time architecture
callbacks aimed at stopping the machine, for example.

The 3 notifiers lists execute in different moments, hypervisor being
the first, followed by informational and finally the pre_reboot list.

(c) But then, there is the ordering problem of the notifiers against
the crash_kernel() call - kdump must be as reliable as possible.
For that, a simple binary "switch" as "crash_kexec_post_notifiers"
is not enough, hence we introduce here concept of panic notifier
levels: there are 5 levels, from 0 (no notifier executes before
kdump) until 4 (all notifiers run before kdump); the default level
is 2, in which the hypervisor and (iff we have any kmsg dumper)
the informational notifiers execute before kdump.

The detailed documentation of the levels is present in code comments
and in the kernel-parameters.txt file; as an analogy with the previous
panic() implementation, the level 0 is exactly the same as the old
behavior of notifiers, running all after kdump, and the level 4 is
the same as "crash_kexec_post_notifiers=Y" (we kept this parameter as
a deprecated one).

(d) Finally, an important change made here: we now use only the
function "crash_smp_send_stop()" to shut all the secondary CPUs
in the panic path. Before, there was a case of using the regular
"smp_send_stop()", but the better approach is to simplify the
code and try to use the function which was created exclusively
for the panic path. Experiments showed that it works fine, and
code was very simplified with that.

Functional change is expected from this refactor, since now we
call some notifiers by default before kdump, but the goal here
besides code clean-up is to have a better panic path, more
reliable and deterministic, but also very customizable.

[0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

Suggested-by: Petr Mladek 
Signed-off-by: Guilherme G. Piccoli 
---

Special thanks to Petr and Baoquan for the suggestion and feedback in a previous
email thread. There's some important design decisions that worth mentioning and
discussing:

* The default panic notifiers level is 2, based on Petr Mladek's suggestion,
which makes a lot of sense. Of course, this is customizable through the
parameter, but would be something worthwhile to have a KConfig option to set
the default level? It would help distros that want the old behavior
(no notifiers before kdump) as default.

* The implementation choice was to _avoid_ intricate if conditionals in the
panic path, which would _definitely_ be present with the panic notifiers levels
idea; so, instead of lots of if conditionals, the set/clear bits approach with
functions called in 2 points (but executing only in one of them) is much easier
to follow an was used here; the ordering helper function and the comments also
help a lot to avoid confusion (hopefully).

* Choice was to *always* use crash_smp_send_stop() instead of sometimes making
use of the regular smp_send_stop(); for most architectures they are the same,
including Xen (on x86). For the ones that override it, all should work fine,
in the powerpc case it's even more correct (see the subsequent patch
"powerpc: Do not force all panic notifiers to execute before 

[PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

2022-04-27 Thread Guilherme G. Piccoli
Currently we don't have a way to check if there are dumpers set,
except counting the list members maybe. This patch introduces a very
simple helper to provide this information, by just keeping track of
registered/unregistered kmsg dumpers. It's going to be used on the
panic path in the subsequent patch.

Notice that the spinlock guarding kmsg_dumpers list also guards
increment/decrement of the dumper's counter, but there's no need
for that when reading the counter in the panic path, since that is
an atomic path and there's no other (planned) user.

Signed-off-by: Guilherme G. Piccoli 
---
 include/linux/kmsg_dump.h |  7 +++
 kernel/printk/printk.c| 14 ++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 906521c2329c..abea1974bff8 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -65,6 +65,8 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool 
syslog,
 
 void kmsg_dump_rewind(struct kmsg_dump_iter *iter);
 
+bool kmsg_has_dumpers(void);
+
 int kmsg_dump_register(struct kmsg_dumper *dumper);
 
 int kmsg_dump_unregister(struct kmsg_dumper *dumper);
@@ -91,6 +93,11 @@ static inline void kmsg_dump_rewind(struct kmsg_dump_iter 
*iter)
 {
 }
 
+static inline bool kmsg_has_dumpers(void)
+{
+   return false;
+}
+
 static inline int kmsg_dump_register(struct kmsg_dumper *dumper)
 {
return -EINVAL;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index da03c15ecc89..e3a1c429fbbc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3399,6 +3399,18 @@ EXPORT_SYMBOL(printk_timed_ratelimit);
 
 static DEFINE_SPINLOCK(dump_list_lock);
 static LIST_HEAD(dump_list);
+static int num_dumpers;
+
+/**
+ * kmsg_has_dumpers - inform if there is any kmsg dumper registered.
+ *
+ * Returns true if there's at least one registered dumper, or false
+ * if otherwise.
+ */
+bool kmsg_has_dumpers(void)
+{
+   return num_dumpers ? true : false;
+}
 
 /**
  * kmsg_dump_register - register a kernel log dumper.
@@ -3423,6 +3435,7 @@ int kmsg_dump_register(struct kmsg_dumper *dumper)
dumper->registered = 1;
list_add_tail_rcu(&dumper->list, &dump_list);
err = 0;
+   num_dumpers++;
}
spin_unlock_irqrestore(&dump_list_lock, flags);
 
@@ -3447,6 +3460,7 @@ int kmsg_dump_unregister(struct kmsg_dumper *dumper)
dumper->registered = 0;
list_del_rcu(&dumper->list);
err = 0;
+   num_dumpers--;
}
spin_unlock_irqrestore(&dump_list_lock, flags);
synchronize_rcu();
-- 
2.36.0



[PATCH 22/30] panic: Introduce the panic post-reboot notifier list

2022-04-27 Thread Guilherme G. Piccoli
Currently we have 3 notifier lists in the panic path, which will
be wired in a way to allow the notifier callbacks to run in
different moments at panic time, in a subsequent patch.

But there is also an odd set of architecture calls hardcoded in
the end of panic path, after the restart machinery. They're
responsible for late time tunings / events, like enabling a stop
button (Sparc) or effectively stopping the machine (s390).

This patch introduces yet another notifier list to offer the
architectures a way to add callbacks in such late moment on
panic path without the need of ifdefs / hardcoded approaches.

Cc: Alexander Gordeev 
Cc: Christian Borntraeger 
Cc: "David S. Miller" 
Cc: Heiko Carstens 
Cc: Sven Schnelle 
Cc: Vasily Gorbik 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/s390/kernel/setup.c   | 19 ++-
 arch/sparc/kernel/setup_32.c   | 27 +++
 arch/sparc/kernel/setup_64.c   | 29 -
 include/linux/panic_notifier.h |  1 +
 kernel/panic.c | 19 +++
 5 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index d860ac300919..d816b2045f1e 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -39,7 +39,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -51,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -943,6 +943,20 @@ static void __init log_component_list(void)
}
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void 
*unused)
+{
+   disabled_wait();
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+   .notifier_call = panic_event,
+};
+
 /*
  * Setup function called from init/main.c just after the banner
  * was printed.
@@ -1058,4 +1072,7 @@ void __init setup_arch(char **cmdline_p)
 
/* Add system specific data to the random pool */
setup_randomness();
+
+   atomic_notifier_chain_register(&panic_post_reboot_list,
+  &post_reboot_panic_block);
 }
diff --git a/arch/sparc/kernel/setup_32.c b/arch/sparc/kernel/setup_32.c
index c8e0dd99f370..4e2428972f76 100644
--- a/arch/sparc/kernel/setup_32.c
+++ b/arch/sparc/kernel/setup_32.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 
 #include "kernel.h"
 
+int stop_a_enabled = 1;
 struct screen_info screen_info = {
0, 0,   /* orig-x, orig-y */
0,  /* unused */
@@ -293,6 +295,24 @@ void __init sparc32_start_kernel(struct linux_romvec *rp)
start_kernel();
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void 
*unused)
+{
+   /* Make sure the user can actually press Stop-A (L1-A) */
+   stop_a_enabled = 1;
+   pr_emerg("Press Stop-A (L1-A) from sun keyboard or send break\n"
+   "twice on console to return to the boot prom\n");
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+   .notifier_call = panic_event,
+};
+
 void __init setup_arch(char **cmdline_p)
 {
int i;
@@ -368,9 +388,10 @@ void __init setup_arch(char **cmdline_p)
paging_init();
 
smp_setup_cpu_possible_map();
-}
 
-extern int stop_a_enabled;
+   atomic_notifier_chain_register(&panic_post_reboot_list,
+  &post_reboot_panic_block);
+}
 
 void sun_do_break(void)
 {
@@ -384,8 +405,6 @@ void sun_do_break(void)
 }
 EXPORT_SYMBOL(sun_do_break);
 
-int stop_a_enabled = 1;
-
 static int __init topology_init(void)
 {
int i, ncpus, err;
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 48abee4eee29..9066c25ecc07 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -62,6 +63,8 @@
 #include "entry.h"
 #include "kernel.h"
 
+int stop_a_enabled = 1;
+
 /* Used to synchronize accesses to NatSemi SUPER I/O chip configure
  * operations in asm/ns87303.h
  */
@@ -632,6 +635,24 @@ void __init alloc_irqstack_bootmem(void)
}
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void 
*unused)
+{
+   /* Make sure the user can actually press Stop-A (L1-A) */
+   stop_a_enabled = 1;
+   pr_emerg("Press Stop

[PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-04-27 Thread Guilherme G. Piccoli
This patch renames the panic_notifier_list to panic_pre_reboot_list;
the idea is that a subsequent patch will refactor the panic path
in order to better split the notifiers, running some of them very
early, some of them not so early [but still before kmsg_dump()] and
finally, the rest should execute late, after kdump. The latter ones
are now in the panic pre-reboot list - the name comes from the idea
that these notifiers execute before panic() attempts rebooting the
machine (if that option is set).

We also took the opportunity to clean-up useless header inclusions,
improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
and more important, change some priorities - we hereby set 2 notifiers
to run late in the list [iss_panic_event() and the IPMI panic_event()]
due to the risks they offer (may not return, for example).
Proper documentation is going to be provided in a subsequent patch,
that effectively refactors the panic path.

Cc: Alex Elder 
Cc: Alexander Gordeev 
Cc: Anton Ivanov 
Cc: Benjamin Herrenschmidt 
Cc: Bjorn Andersson 
Cc: Boris Ostrovsky 
Cc: Chris Zankel 
Cc: Christian Borntraeger 
Cc: Corey Minyard 
Cc: Dexuan Cui 
Cc: "H. Peter Anvin" 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Ivan Kokshaysky 
Cc: "James E.J. Bottomley" 
Cc: James Morse 
Cc: Johannes Berg 
Cc: Juergen Gross 
Cc: "K. Y. Srinivasan" 
Cc: Mathieu Poirier 
Cc: Matt Turner 
Cc: Mauro Carvalho Chehab 
Cc: Max Filippov 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Pavel Machek 
Cc: Richard Henderson 
Cc: Richard Weinberger 
Cc: Robert Richter 
Cc: Stefano Stabellini 
Cc: Stephen Hemminger 
Cc: Sven Schnelle 
Cc: Tony Luck 
Cc: Vasily Gorbik 
Cc: Wei Liu 
Signed-off-by: Guilherme G. Piccoli 
---

Notice that, with this name change, out-of-tree code that relies in the global
exported "panic_notifier_list" will fail to build. We could easily keep the
retro-compatibility by making the old symbol to still exist and point to the
pre_reboot list (or even, keep the old naming).

But our design choice was to allow the breakage, making users rethink their
notifiers, adding them in the list that fits best. If that wasn't a good
decision, we're open to change it, of course.
Thanks in advance for the review!

 arch/alpha/kernel/setup.c |  4 ++--
 arch/parisc/kernel/pdc_chassis.c  |  3 +--
 arch/powerpc/kernel/setup-common.c|  2 +-
 arch/s390/kernel/ipl.c|  4 ++--
 arch/um/drivers/mconsole_kern.c   |  2 +-
 arch/um/kernel/um_arch.c  |  2 +-
 arch/x86/xen/enlighten.c  |  2 +-
 arch/xtensa/platforms/iss/setup.c |  4 ++--
 drivers/char/ipmi/ipmi_msghandler.c   | 12 +++-
 drivers/edac/altera_edac.c|  3 +--
 drivers/hv/vmbus_drv.c|  4 ++--
 drivers/leds/trigger/ledtrig-panic.c  |  3 +--
 drivers/misc/ibmasm/heartbeat.c   | 16 +---
 drivers/net/ipa/ipa_smp2p.c   |  5 ++---
 drivers/parisc/power.c|  4 ++--
 drivers/remoteproc/remoteproc_core.c  |  6 --
 drivers/s390/char/con3215.c   |  2 +-
 drivers/s390/char/con3270.c   |  2 +-
 drivers/s390/char/sclp_con.c  |  2 +-
 drivers/s390/char/sclp_vt220.c|  2 +-
 drivers/staging/olpc_dcon/olpc_dcon.c |  6 --
 drivers/video/fbdev/hyperv_fb.c   |  4 ++--
 include/linux/panic_notifier.h|  2 +-
 kernel/panic.c|  9 -
 24 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index d88bdf852753..8ace0d7113b6 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -472,8 +472,8 @@ setup_arch(char **cmdline_p)
}
 
/* Register a call for panic conditions. */
-   atomic_notifier_chain_register(&panic_notifier_list,
-   &alpha_panic_block);
+   atomic_notifier_chain_register(&panic_pre_reboot_list,
+   &alpha_panic_block);
 
 #ifndef alpha_using_srm
/* Assume that we've booted from SRM if we haven't booted from MILO.
diff --git a/arch/parisc/kernel/pdc_chassis.c b/arch/parisc/kernel/pdc_chassis.c
index da154406d368..0fd8d87fb4f9 100644
--- a/arch/parisc/kernel/pdc_chassis.c
+++ b/arch/parisc/kernel/pdc_chassis.c
@@ -22,7 +22,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -135,7 +134,7 @@ void __init parisc_pdc_chassis_init(void)
PDC_CHASSIS_VER);
 
/* initialize panic notifier chain */
-   atomic_notifier_chain_register(&panic_notifier_list,
+   atomic_notifier_chain_register(&panic_pre_reboot_list,
&pdc_chassis_panic_block);
 
/* initialize reboot notifier chain */
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index d04b8bf8dbc7..3518bebc10ad 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/

[PATCH 20/30] panic: Add the panic informational notifier list

2022-04-27 Thread Guilherme G. Piccoli
The goal of this new panic notifier is to allow its users to
register callbacks to run earlier in the panic path than they
currently do. This aims at informational mechanisms, like dumping
kernel offsets and showing device error data (in case it's simple
registers reading, for example) as well as mechanisms to disable
log flooding (like hung_task detector / RCU warnings) and the
tracing dump_on_oops (when enabled).

Any (non-invasive) information that should be provided before
kmsg_dump() as well as log flooding preventing code should fit
here, as long it offers relatively low risk for kdump.

For now, the patch is almost a no-op, although it changes a bit
the ordering in which some panic notifiers are executed - specially
affected by this are the notifiers responsible for disabling the
hung_task detector / RCU warnings, which now run first. In a
subsequent patch, the panic path will be refactored, then the
panic informational notifiers will effectively run earlier,
before ksmg_dump() (and usually before kdump as well).

We also defer documenting it all properly in the subsequent
refactor patch. Finally, while at it, we removed some useless
header inclusions too.

Cc: Benjamin Herrenschmidt 
Cc: Catalin Marinas 
Cc: Florian Fainelli 
Cc: Frederic Weisbecker 
Cc: "H. Peter Anvin" 
Cc: Hari Bathini 
Cc: Joel Fernandes 
Cc: Jonathan Hunter 
Cc: Josh Triplett 
Cc: Lai Jiangshan 
Cc: Leo Yan 
Cc: Mathieu Desnoyers 
Cc: Mathieu Poirier 
Cc: Michael Ellerman 
Cc: Mike Leach 
Cc: Mikko Perttunen 
Cc: Neeraj Upadhyay 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Cc: Suzuki K Poulose 
Cc: Thierry Reding 
Cc: Thomas Bogendoerfer 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/arm64/kernel/setup.c | 2 +-
 arch/mips/kernel/relocate.c   | 2 +-
 arch/powerpc/kernel/setup-common.c| 2 +-
 arch/x86/kernel/setup.c   | 2 +-
 drivers/bus/brcmstb_gisb.c| 2 +-
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 4 ++--
 drivers/soc/tegra/ari-tegra186.c  | 3 ++-
 include/linux/panic_notifier.h| 1 +
 kernel/hung_task.c| 3 ++-
 kernel/panic.c| 4 
 kernel/rcu/tree.c | 1 -
 kernel/rcu/tree_stall.h   | 3 ++-
 kernel/trace/trace.c  | 2 +-
 13 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 3505789cf4bd..ac2c7e8c9c6a 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -444,7 +444,7 @@ static struct notifier_block arm64_panic_block = {
 
 static int __init register_arm64_panic_block(void)
 {
-   atomic_notifier_chain_register(&panic_notifier_list,
+   atomic_notifier_chain_register(&panic_info_list,
   &arm64_panic_block);
return 0;
 }
diff --git a/arch/mips/kernel/relocate.c b/arch/mips/kernel/relocate.c
index 56b51de2dc51..650811f2436a 100644
--- a/arch/mips/kernel/relocate.c
+++ b/arch/mips/kernel/relocate.c
@@ -459,7 +459,7 @@ static struct notifier_block kernel_location_notifier = {
 
 static int __init register_kernel_offset_dumper(void)
 {
-   atomic_notifier_chain_register(&panic_notifier_list,
+   atomic_notifier_chain_register(&panic_info_list,
   &kernel_location_notifier);
return 0;
 }
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 1468c3937bf4..d04b8bf8dbc7 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -757,7 +757,7 @@ void __init setup_panic(void)
   &ppc_fadump_block);
 
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
-   atomic_notifier_chain_register(&panic_notifier_list,
+   atomic_notifier_chain_register(&panic_info_list,
   &kernel_offset_notifier);
 
/* Low-level platform-specific routines that should run on panic */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c95b9ac5a457..599b25346964 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1266,7 +1266,7 @@ static struct notifier_block kernel_offset_notifier = {
 
 static int __init register_kernel_offset_dumper(void)
 {
-   atomic_notifier_chain_register(&panic_notifier_list,
+   atomic_notifier_chain_register(&panic_info_list,
&kernel_offset_notifier);
return 0;
 }
diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 1ea7b015e225..c64e087fba7a 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -486,7 +486,7 @@ static int __init brcmstb_gisb_arb_probe(struct 
platform_device *pdev)
 
if (list_is_singular(&brc

[PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-04-27 Thread Guilherme G. Piccoli
The goal of this new panic notifier is to allow its users to register
callbacks to run very early in the panic path. This aims hypervisor/FW
notification mechanisms as well as simple LED functions, and any other
simple and safe mechanism that should run early in the panic path; more
dangerous callbacks should execute later.

For now, the patch is almost a no-op (although it changes a bit the
ordering in which some panic notifiers are executed). In a subsequent
patch, the panic path will be refactored, then the panic hypervisor
notifiers will effectively run very early in the panic path.

We also defer documenting it all properly in the subsequent refactor
patch. While at it, we removed some useless header inclusions and
fixed some notifiers return too (by using the standard NOTIFY_DONE).

Cc: Alexander Gordeev 
Cc: Andrea Parri (Microsoft) 
Cc: Ard Biesheuvel 
Cc: Benjamin Herrenschmidt 
Cc: Brian Norris 
Cc: Christian Borntraeger 
Cc: Christophe JAILLET 
Cc: David Gow 
Cc: "David S. Miller" 
Cc: Dexuan Cui 
Cc: Doug Berger 
Cc: Evan Green 
Cc: Florian Fainelli 
Cc: Haiyang Zhang 
Cc: Hari Bathini 
Cc: Heiko Carstens 
Cc: Julius Werner 
Cc: Justin Chen 
Cc: "K. Y. Srinivasan" 
Cc: Lee Jones 
Cc: Markus Mayer 
Cc: Michael Ellerman 
Cc: Michael Kelley 
Cc: Mihai Carabas 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Cc: Pavel Machek 
Cc: Scott Branden 
Cc: Sebastian Reichel 
Cc: Shile Zhang 
Cc: Stephen Hemminger 
Cc: Sven Schnelle 
Cc: Thomas Bogendoerfer 
Cc: Tianyu Lan 
Cc: Vasily Gorbik 
Cc: Wang ShaoBo 
Cc: Wei Liu 
Cc: zhenwei pi 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/mips/sgi-ip22/ip22-reset.c  | 2 +-
 arch/mips/sgi-ip32/ip32-reset.c  | 3 +--
 arch/powerpc/kernel/setup-common.c   | 2 +-
 arch/sparc/kernel/sstate.c   | 3 +--
 drivers/firmware/google/gsmi.c   | 4 ++--
 drivers/hv/vmbus_drv.c   | 4 ++--
 drivers/leds/trigger/ledtrig-activity.c  | 4 ++--
 drivers/leds/trigger/ledtrig-heartbeat.c | 4 ++--
 drivers/misc/bcm-vk/bcm_vk_dev.c | 6 +++---
 drivers/misc/pvpanic/pvpanic.c   | 4 ++--
 drivers/power/reset/ltc2952-poweroff.c   | 4 ++--
 drivers/s390/char/zcore.c| 5 +++--
 drivers/soc/bcm/brcmstb/pm/pm-arm.c  | 2 +-
 include/linux/panic_notifier.h   | 1 +
 kernel/panic.c   | 4 
 15 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/arch/mips/sgi-ip22/ip22-reset.c b/arch/mips/sgi-ip22/ip22-reset.c
index 8f0861c58080..3023848acbf1 100644
--- a/arch/mips/sgi-ip22/ip22-reset.c
+++ b/arch/mips/sgi-ip22/ip22-reset.c
@@ -195,7 +195,7 @@ static int __init reboot_setup(void)
}
 
timer_setup(&blink_timer, blink_timeout, 0);
-   atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
+   atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
 
return 0;
 }
diff --git a/arch/mips/sgi-ip32/ip32-reset.c b/arch/mips/sgi-ip32/ip32-reset.c
index 18d1c115cd53..9ee1302c9d13 100644
--- a/arch/mips/sgi-ip32/ip32-reset.c
+++ b/arch/mips/sgi-ip32/ip32-reset.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
pm_power_off = ip32_machine_halt;
 
timer_setup(&blink_timer, blink_timeout, 0);
-   atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
+   atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);
 
return 0;
 }
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 52f96b209a96..1468c3937bf4 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -753,7 +753,7 @@ static struct notifier_block ppc_panic_block = {
 void __init setup_panic(void)
 {
/* Hard-disables IRQs + deal with FW-assisted dump (fadump) */
-   atomic_notifier_chain_register(&panic_notifier_list,
+   atomic_notifier_chain_register(&panic_hypervisor_list,
   &ppc_fadump_block);
 
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
diff --git a/arch/sparc/kernel/sstate.c b/arch/sparc/kernel/sstate.c
index 3bcc4ddc6911..82b7b68e0bdc 100644
--- a/arch/sparc/kernel/sstate.c
+++ b/arch/sparc/kernel/sstate.c
@@ -5,7 +5,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -106,7 +105,7 @@ static int __init sstate_init(void)
 
do_set_sstate(HV_SOFT_STATE_TRANSITION, booting_msg);
 
-   atomic_notifier_chain_register(&panic_notifier_list,
+   atomic_notifier_chain_register(&panic_hypervisor_list,
   &sstate_panic_block);
register_reboot_notifier(&sstate_reboot_notifier);
 
diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index b01ed02e4a87..ff0bebe2f444 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -1034,7 +1034,7 @@ s

[PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

2022-04-27 Thread Guilherme G. Piccoli
Currently we have a debug infrastructure in the notifiers file, but
it's very simple/limited. This patch extends it by:

(a) Showing all registered/unregistered notifiers' callback names;

(b) Adding a dynamic debug tuning to allow showing called notifiers'
function names. Notice that this should be guarded as a tunable since
it can flood the kernel log buffer.

Cc: Arjan van de Ven 
Cc: Cong Wang 
Cc: Sebastian Andrzej Siewior 
Cc: Valentin Schneider 
Cc: Xiaoming Ni 
Signed-off-by: Guilherme G. Piccoli 
---

We have some design decisions that worth discussing here:

(a) First of call, using C99 helps a lot to write clear and concise code, but
due to commit 4d94f910e79a ("Kbuild: use -Wdeclaration-after-statement") we
have a warning if mixing variable declarations with code. For this patch though,
doing that makes the code way clear, so decision was to add the debug code
inside brackets whenever this warning pops up. We can change that, but that'll
cause more ifdefs in the same function.

(b) In the symbol lookup helper function, we modify the parameter passed but
even more, we return it as well! This is unusual and seems unnecessary, but was
the strategy taken to allow embedding such function in the pr_debug() call.

Not doing that would likely requiring 3 symbol_name variables to avoid
concurrency (registering notifier A while calling notifier B) - we rely in
local variables as a serialization mechanism.

We're open for suggestions in case this design is not appropriate;
thanks in advance!

 kernel/notifier.c | 48 +--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index ba005ebf4730..21032ebcde57 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -7,6 +7,22 @@
 #include 
 #include 
 
+#ifdef CONFIG_DEBUG_NOTIFIERS
+#include 
+
+/*
+ * Helper to get symbol names in case DEBUG_NOTIFIERS is set.
+ * Return the modified parameter is a strategy used to achieve
+ * the pr_debug() functionality - with this, function is only
+ * executed if the dynamic debug tuning is effectively set.
+ */
+static inline char *notifier_name(struct notifier_block *nb, char *sym_name)
+{
+   lookup_symbol_name((unsigned long)(nb->notifier_call), sym_name);
+   return sym_name;
+}
+#endif
+
 /*
  * Notifier list for kernel code which wants to be called
  * at shutdown. This is used to stop any idling DMA operations
@@ -34,20 +50,41 @@ static int notifier_chain_register(struct notifier_block 
**nl,
}
n->next = *nl;
rcu_assign_pointer(*nl, n);
+
+#ifdef CONFIG_DEBUG_NOTIFIERS
+   {
+   char sym_name[KSYM_NAME_LEN];
+
+   pr_info("notifiers: registered %s()\n",
+   notifier_name(n, sym_name));
+   }
+#endif
return 0;
 }
 
 static int notifier_chain_unregister(struct notifier_block **nl,
struct notifier_block *n)
 {
+   int ret = -ENOENT;
+
while ((*nl) != NULL) {
if ((*nl) == n) {
rcu_assign_pointer(*nl, n->next);
-   return 0;
+   ret = 0;
+   break;
}
nl = &((*nl)->next);
}
-   return -ENOENT;
+
+#ifdef CONFIG_DEBUG_NOTIFIERS
+   if (!ret) {
+   char sym_name[KSYM_NAME_LEN];
+
+   pr_info("notifiers: unregistered %s()\n",
+   notifier_name(n, sym_name));
+   }
+#endif
+   return ret;
 }
 
 /**
@@ -80,6 +117,13 @@ static int notifier_call_chain(struct notifier_block **nl,
nb = next_nb;
continue;
}
+
+   {
+   char sym_name[KSYM_NAME_LEN];
+
+   pr_debug("notifiers: calling %s()\n",
+notifier_name(nb, sym_name));
+   }
 #endif
ret = nb->notifier_call(nb, val, v);
 
-- 
2.36.0



[PATCH 17/30] tracing: Improve panic/die notifiers

2022-04-27 Thread Guilherme G. Piccoli
Currently the tracing dump_on_oops feature is implemented
through separate notifiers, one for die/oops and the other
for panic. With the addition of panic notifier "id", this
patch makes use of such "id" to unify both functions.

It also comments the function and changes the priority of the
notifier blocks, in order they run early compared to other
notifiers, to prevent useless trace data (like the callback
names for the other notifiers). Finally, we also removed an
unnecessary header inclusion.

Signed-off-by: Guilherme G. Piccoli 
---
 kernel/trace/trace.c | 57 +---
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f4de111fa18f..c1d8a3622ccc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -9767,38 +9766,46 @@ static __init int tracer_init_tracefs(void)
 
 fs_initcall(tracer_init_tracefs);
 
-static int trace_panic_handler(struct notifier_block *this,
-  unsigned long event, void *unused)
+/*
+ * The idea is to execute the following die/panic callback early, in order
+ * to avoid showing irrelevant information in the trace (like other panic
+ * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
+ * warnings get disabled (to prevent potential log flooding).
+ */
+static int trace_die_panic_handler(struct notifier_block *self,
+   unsigned long ev, void *unused)
 {
-   if (ftrace_dump_on_oops)
+   int do_dump;
+
+   if (!ftrace_dump_on_oops)
+   return NOTIFY_DONE;
+
+   switch (ev) {
+   case DIE_OOPS:
+   do_dump = 1;
+   break;
+   case PANIC_NOTIFIER:
+   do_dump = 1;
+   break;
+   default:
+   do_dump = 0;
+   break;
+   }
+
+   if (do_dump)
ftrace_dump(ftrace_dump_on_oops);
-   return NOTIFY_OK;
+
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block trace_panic_notifier = {
-   .notifier_call  = trace_panic_handler,
-   .next   = NULL,
-   .priority   = 150   /* priority: INT_MAX >= x >= 0 */
+   .notifier_call = trace_die_panic_handler,
+   .priority = INT_MAX - 1,
 };
 
-static int trace_die_handler(struct notifier_block *self,
-unsigned long val,
-void *data)
-{
-   switch (val) {
-   case DIE_OOPS:
-   if (ftrace_dump_on_oops)
-   ftrace_dump(ftrace_dump_on_oops);
-   break;
-   default:
-   break;
-   }
-   return NOTIFY_OK;
-}
-
 static struct notifier_block trace_die_notifier = {
-   .notifier_call = trace_die_handler,
-   .priority = 200
+   .notifier_call = trace_die_panic_handler,
+   .priority = INT_MAX - 1,
 };
 
 /*
-- 
2.36.0



[PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers

2022-04-27 Thread Guilherme G. Piccoli
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing a hypervisor <-> guest connection,
disabling a paravirtualized timer) as well as to data collection (sending
panic information to the hypervisor) and framebuffer management.

The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. This patch, as part of an
effort to clean-up the panic notifiers mechanism and better document
things, address some of the issues/complexities of Hyper-V panic handling
through the following changes:

(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.

This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.

(b) The new panic notifier created after (a) is only doing a single thing:
cleaning the vmbus connection. This procedure might cause a delay (due to
hypervisor I/O completion), so we postpone that to run late. But more
relevant: this *same* vmbus unloading happens in the crash_shutdown()
handler, so if kdump is set, we can safely skip this panic notifier and
defer such clean-up to the kexec crash handler.

(c) There is also a Hyper-V framebuffer panic notifier, which relies in
doing a vmbus operation that demands a valid connection. So, we must
order this notifier with the panic notifier from vmbus_drv.c, in order to
guarantee that the framebuffer code executes before the vmbus connection
is unloaded.

Also, this commit removes a useless header.

Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.

Fixes: 792f232d57ff ("Drivers: hv: vmbus: Fix potential crash on module unload")
Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
Cc: Andrea Parri (Microsoft) 
Cc: Dexuan Cui 
Cc: Haiyang Zhang 
Cc: "K. Y. Srinivasan" 
Cc: Michael Kelley 
Cc: Stephen Hemminger 
Cc: Tianyu Lan 
Cc: Wei Liu 
Tested-by: Fabio A M Martins 
Signed-off-by: Guilherme G. Piccoli 
---

Special thanks to Michael Kelley for the good information about the Hyper-V
panic path in email threads some months ago, and to Fabio for the testing
performed.

Michael and all Microsoft folks: a careful analysis to double-check our changes
and assumptions here is really appreciated, this code is complex and intricate,
it is possible some corner case might have been overlooked.

Thanks in advance!

 drivers/hv/vmbus_drv.c  | 109 
 drivers/video/fbdev/hyperv_fb.c |   8 +++
 2 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 14de17087864..f37f12d48001 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -24,11 +24,11 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -68,51 +68,75 @@ static int hyperv_report_reg(void)
return !sysctl_record_panic_msg || !hv_panic_page;
 }
 
-static int hyperv_panic_event(struct notifier_block *nb, unsigned long val,
+/*
+ * The panic notifier below is responsible solely for unloading the
+ * vmbus connection, which is necessary in a panic event. But notice
+ * that this same unloading procedure is executed in the Hyper-V
+ * crash_shutdown() handler [see hv_crash_handler()], which basically
+ * means that we can postpone its execution if we have kdump set,
+ * since it will run the crash_shutdown() handler anyway. Even more
+ * intrincated is the relation of this notifier with Hyper-V framebuffer
+ * panic notifier - we need vmbus connection alive there in order to
+ * succeed, so we need to order both with each other [for reference see
+ * hvfb_on_panic()] - this is done using notifiers' priorities.
+ */
+static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
  void *args)
+{
+   if (!kexec_crash_loaded())
+   vmbus_initiate_unload(true);
+
+   return NOTIFY_DONE;
+}
+static struct notifier_block hyperv_panic_vmbus_unload_block = {
+   .notifier_call  = hv_panic_vmbus_unload,
+   .priority   = INT_MIN + 1, /* almost the latest one to execute */
+};
+
+/*
+ * The following callback works both as die and panic notifier; its
+ * goal is to provide panic informatio

[PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

2022-04-27 Thread Guilherme G. Piccoli
This patch improves the panic/die notifiers in this driver by
making use of a passed "id" instead of comparing pointer
address; also, it removes an useless prototype declaration
and unnecessary header inclusion.

This is part of a panic notifiers refactor - this notifier in
the future will be moved to a new list, that encompass the
information notifiers only.

Fixes: 9eb60880d9a9 ("bus: brcmstb_gisb: add notifier handling")
Cc: Brian Norris 
Cc: Florian Fainelli 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/bus/brcmstb_gisb.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 183d5cc37d42..1ea7b015e225 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #ifdef CONFIG_MIPS
 #include 
@@ -347,25 +346,14 @@ static irqreturn_t brcmstb_gisb_bp_handler(int irq, void 
*dev_id)
 /*
  * Dump out gisb errors on die or panic.
  */
-static int dump_gisb_error(struct notifier_block *self, unsigned long v,
-  void *p);
-
-static struct notifier_block gisb_die_notifier = {
-   .notifier_call = dump_gisb_error,
-};
-
-static struct notifier_block gisb_panic_notifier = {
-   .notifier_call = dump_gisb_error,
-};
-
 static int dump_gisb_error(struct notifier_block *self, unsigned long v,
   void *p)
 {
struct brcmstb_gisb_arb_device *gdev;
-   const char *reason = "panic";
+   const char *reason = "die";
 
-   if (self == &gisb_die_notifier)
-   reason = "die";
+   if (v == PANIC_NOTIFIER)
+   reason = "panic";
 
/* iterate over each GISB arb registered handlers */
list_for_each_entry(gdev, &brcmstb_gisb_arb_device_list, next)
@@ -374,6 +362,14 @@ static int dump_gisb_error(struct notifier_block *self, 
unsigned long v,
return NOTIFY_DONE;
 }
 
+static struct notifier_block gisb_die_notifier = {
+   .notifier_call = dump_gisb_error,
+};
+
+static struct notifier_block gisb_panic_notifier = {
+   .notifier_call = dump_gisb_error,
+};
+
 static DEVICE_ATTR(gisb_arb_timeout, S_IWUSR | S_IRUGO,
gisb_arb_get_timeout, gisb_arb_set_timeout);
 
-- 
2.36.0



[PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-04-27 Thread Guilherme G. Piccoli
The notifiers infrastructure provides a way to pass an "id" to the
callbacks to determine what kind of event happened, i.e., what is
the reason behind they getting called.

The panic notifier currently pass 0, but this is soon to be
used in a multi-targeted notifier, so let's pass a meaningful
"id" over there.

Signed-off-by: Guilherme G. Piccoli 
---
 include/linux/panic_notifier.h | 5 +
 kernel/panic.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 41e32483d7a7..07dced83a783 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -9,4 +9,9 @@ extern struct atomic_notifier_head panic_notifier_list;
 
 extern bool crash_kexec_post_notifiers;
 
+enum panic_notifier_val {
+   PANIC_UNUSED,
+   PANIC_NOTIFIER = 0xDEAD,
+};
+
 #endif /* _LINUX_PANIC_NOTIFIERS_H */
diff --git a/kernel/panic.c b/kernel/panic.c
index eb4dfb932c85..523bc9ccd0e9 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -287,7 +287,7 @@ void panic(const char *fmt, ...)
 * Run any panic handlers, including those that might need to
 * add information to the kmsg dump output.
 */
-   atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
+   atomic_notifier_call_chain(&panic_notifier_list, PANIC_NOTIFIER, buf);
 
panic_print_sys_info(false);
 
-- 
2.36.0



[PATCH 13/30] s390/consoles: Improve panic notifiers reliability

2022-04-27 Thread Guilherme G. Piccoli
Currently many console drivers for s390 rely on panic/reboot notifiers
to invoke callbacks on these events. The panic() function disables local
IRQs, secondary CPUs and preemption, so callbacks invoked on panic are
effectively running in atomic context.

Happens that most of these console callbacks from s390 doesn't take the
proper care with regards to atomic context, like taking spinlocks that
might be taken in other function/CPU and hence will cause a lockup
situation.

The goal for this patch is to improve the notifiers reliability, acting
on 4 console drivers, as detailed below:

(1) con3215: changed a regular spinlock to the trylock alternative.

(2) con3270: also changed a regular spinlock to its trylock counterpart,
but here we also have another problem: raw3270_activate_view() takes a
different spinlock. So, we worked a helper to validate if this other lock
is safe to acquire, and if so, raw3270_activate_view() should be safe.

Notice though that there is a functional change here: it's now possible
to continue the notifier code [reaching con3270_wait_write() and
con3270_rebuild_update()] without executing raw3270_activate_view().

(3) sclp: a global lock is used heavily in the functions called from
the notifier, so we added a check here - if the lock is taken already,
we just bail-out, preventing the lockup.

(4) sclp_vt220: same as (3), a lock validation was added to prevent the
potential lockup problem.

Besides (1)-(4), we also removed useless void functions, adding the
code called from the notifier inside its own body, and changed the
priority of such notifiers to execute late, since they are "heavyweight"
for the panic environment, so we aim to reduce risks here.
Changed return values to NOTIFY_DONE as well, the standard one.

Cc: Alexander Gordeev 
Cc: Christian Borntraeger 
Cc: Heiko Carstens 
Cc: Sven Schnelle 
Cc: Vasily Gorbik 
Signed-off-by: Guilherme G. Piccoli 
---

As a design choice, the option used here to verify a given spinlock is taken
was the function "spin_is_locked()" - but we noticed that it is not often used.
An alternative would to take the lock with a spin_trylock() and if it succeeds,
just release the spinlock and continue the code. But that seemed weird...

Also, we'd like to ask a good validation of case (2) potential functionality
change from the s390 console experts - far from expert here, and in our naive
code observation, that seems fine, but that analysis might be missing some
corner case.

Thanks in advance!

 drivers/s390/char/con3215.c| 36 +++--
 drivers/s390/char/con3270.c| 34 +++
 drivers/s390/char/raw3270.c| 18 +++
 drivers/s390/char/raw3270.h|  1 +
 drivers/s390/char/sclp_con.c   | 28 +--
 drivers/s390/char/sclp_vt220.c | 42 +++---
 6 files changed, 96 insertions(+), 63 deletions(-)

diff --git a/drivers/s390/char/con3215.c b/drivers/s390/char/con3215.c
index f356607835d8..192198bd3dc4 100644
--- a/drivers/s390/char/con3215.c
+++ b/drivers/s390/char/con3215.c
@@ -771,35 +771,37 @@ static struct tty_driver *con3215_device(struct console 
*c, int *index)
 }
 
 /*
- * panic() calls con3215_flush through a panic_notifier
- * before the system enters a disabled, endless loop.
+ * The below function is called as a panic/reboot notifier before the
+ * system enters a disabled, endless loop.
+ *
+ * Notice we must use the spin_trylock() alternative, to prevent lockups
+ * in atomic context (panic routine runs with secondary CPUs, local IRQs
+ * and preemption disabled).
  */
-static void con3215_flush(void)
-{
-   struct raw3215_info *raw;
-   unsigned long flags;
-
-   raw = raw3215[0];  /* console 3215 is the first one */
-   spin_lock_irqsave(get_ccwdev_lock(raw->cdev), flags);
-   raw3215_make_room(raw, RAW3215_BUFFER_SIZE);
-   spin_unlock_irqrestore(get_ccwdev_lock(raw->cdev), flags);
-}
-
 static int con3215_notify(struct notifier_block *self,
  unsigned long event, void *data)
 {
-   con3215_flush();
-   return NOTIFY_OK;
+   struct raw3215_info *raw;
+   unsigned long flags;
+
+   raw = raw3215[0];  /* console 3215 is the first one */
+   if (!spin_trylock_irqsave(get_ccwdev_lock(raw->cdev), flags))
+   return NOTIFY_DONE;
+
+   raw3215_make_room(raw, RAW3215_BUFFER_SIZE);
+   spin_unlock_irqrestore(get_ccwdev_lock(raw->cdev), flags);
+
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block on_panic_nb = {
.notifier_call = con3215_notify,
-   .priority = 0,
+   .priority = INT_MIN + 1, /* run the callback late */
 };
 
 static struct notifier_block on_reboot_nb = {
.notifier_call = con3215_notify,
-   .priority = 0,
+   .priority = INT_MIN + 1, /* run the callback late */
 };
 
 /*
diff --git a/drivers/s390/char/con3270.c b/drivers/s390/char/con3270.c
index e4592890f20a..476202f3d8a0 100644
--- a/drivers/s

[PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path

2022-04-27 Thread Guilherme G. Piccoli
The panic notifiers' callbacks execute in an atomic context, with
interrupts/preemption disabled, and all CPUs not running the panic
function are off, so it's very dangerous to wait on a regular
spinlock, there's a risk of deadlock.

This patch refactors the panic notifier of parisc/power driver
to make use of spin_trylock - for that, we've added a second
version of the soft-power function. Also, some comments were
reorganized and trailing white spaces, useless header inclusion
and blank lines were removed.

Cc: Helge Deller 
Cc: "James E.J. Bottomley" 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/parisc/include/asm/pdc.h |  1 +
 arch/parisc/kernel/firmware.c | 27 +++
 drivers/parisc/power.c| 17 ++---
 3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h
index b643092d4b98..7a106008e258 100644
--- a/arch/parisc/include/asm/pdc.h
+++ b/arch/parisc/include/asm/pdc.h
@@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap);
 int pdc_do_reset(void);
 int pdc_soft_power_info(unsigned long *power_reg);
 int pdc_soft_power_button(int sw_control);
+int pdc_soft_power_button_panic(int sw_control);
 void pdc_io_reset(void);
 void pdc_io_reset_devices(void);
 int pdc_iodc_getc(void);
diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
index 6a7e315bcc2e..0e2f70b592f4 100644
--- a/arch/parisc/kernel/firmware.c
+++ b/arch/parisc/kernel/firmware.c
@@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long *power_reg)
 }
 
 /*
- * pdc_soft_power_button - Control the soft power button behaviour
- * @sw_control: 0 for hardware control, 1 for software control 
+ * pdc_soft_power_button{_panic} - Control the soft power button behaviour
+ * @sw_control: 0 for hardware control, 1 for software control
  *
  *
  * This PDC function places the soft power button under software or
  * hardware control.
- * Under software control the OS may control to when to allow to shut 
- * down the system. Under hardware control pressing the power button 
+ * Under software control the OS may control to when to allow to shut
+ * down the system. Under hardware control pressing the power button
  * powers off the system immediately.
+ *
+ * The _panic version relies in spin_trylock to prevent deadlock
+ * on panic path.
  */
 int pdc_soft_power_button(int sw_control)
 {
@@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
return retval;
 }
 
+int pdc_soft_power_button_panic(int sw_control)
+{
+   int retval;
+   unsigned long flags;
+
+   if (!spin_trylock_irqsave(&pdc_lock, flags)) {
+   pr_emerg("Couldn't enable soft power button\n");
+   return -EBUSY; /* ignored by the panic notifier */
+   }
+
+   retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, 
__pa(pdc_result), sw_control);
+   spin_unlock_irqrestore(&pdc_lock, flags);
+
+   return retval;
+}
+
 /*
  * pdc_io_reset - Hack to avoid overlapping range registers of Bridges devices.
  * Primarily a problem on T600 (which parisc-linux doesn't support) but
diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
index 456776bd8ee6..8512884de2cf 100644
--- a/drivers/parisc/power.c
+++ b/drivers/parisc/power.c
@@ -37,7 +37,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x)
 
 
 
-/* parisc_panic_event() is called by the panic handler.
- * As soon as a panic occurs, our tasklets above will not be
- * executed any longer. This function then re-enables the 
- * soft-power switch and allows the user to switch off the system
+/*
+ * parisc_panic_event() is called by the panic handler.
+ *
+ * As soon as a panic occurs, our tasklets above will not
+ * be executed any longer. This function then re-enables
+ * the soft-power switch and allows the user to switch off
+ * the system. We rely in pdc_soft_power_button_panic()
+ * since this version spin_trylocks (instead of regular
+ * spinlock), preventing deadlocks on panic path.
  */
 static int parisc_panic_event(struct notifier_block *this,
unsigned long event, void *ptr)
 {
/* re-enable the soft-power switch */
-   pdc_soft_power_button(0);
+   pdc_soft_power_button_panic(0);
return NOTIFY_DONE;
 }
 
@@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = {
.priority   = INT_MAX,
 };
 
-
 static int __init power_init(void)
 {
unsigned long ret;
-- 
2.36.0



[PATCH 11/30] um: Improve panic notifiers consistency and ordering

2022-04-27 Thread Guilherme G. Piccoli
Currently the panic notifiers from user mode linux don't follow
the convention for most of the other notifiers present in the
kernel (indentation, priority setting, numeric return).
More important, the priorities could be improved, since it's a
special case (userspace), hence we could run the notifiers earlier;
user mode linux shouldn't care much with other panic notifiers but
the ordering among the mconsole and arch notifier is important,
given that the arch one effectively triggers a core dump.

This patch fixes that by running the mconsole notifier as the first
panic notifier, followed by the architecture one (that coredumps).
Also, we remove a useless header inclusion.

Cc: Anton Ivanov 
Cc: Johannes Berg 
Cc: Richard Weinberger 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/um/drivers/mconsole_kern.c | 8 +++-
 arch/um/kernel/um_arch.c| 8 
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 8ca67a692683..2ea0421bcc3f 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -846,13 +845,12 @@ static int notify_panic(struct notifier_block *self, 
unsigned long unused1,
 
mconsole_notify(notify_socket, MCONSOLE_PANIC, message,
strlen(message) + 1);
-   return 0;
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block panic_exit_notifier = {
-   .notifier_call  = notify_panic,
-   .next   = NULL,
-   .priority   = 1
+   .notifier_call  = notify_panic,
+   .priority   = INT_MAX, /* run as soon as possible */
 };
 
 static int add_notifier(void)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 0760e24f2eba..4485b1a7c8e4 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -246,13 +246,13 @@ static int panic_exit(struct notifier_block *self, 
unsigned long unused1,
bust_spinlocks(0);
uml_exitcode = 1;
os_dump_core();
-   return 0;
+
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block panic_exit_notifier = {
-   .notifier_call  = panic_exit,
-   .next   = NULL,
-   .priority   = 0
+   .notifier_call  = panic_exit,
+   .priority   = INT_MAX - 1, /* run as 2nd notifier, won't return */
 };
 
 void uml_finishsetup(void)
-- 
2.36.0



[PATCH 10/30] alpha: Clean-up the panic notifier code

2022-04-27 Thread Guilherme G. Piccoli
The alpha panic notifier has some code issues, not following
the conventions of other notifiers. Also, it might halt the
machine but still it is set to run as early as possible, which
doesn't seem to be a good idea.

This patch cleans the code, and set the notifier to run as the
latest, following the same approach other architectures are doing.
Also, we remove the unnecessary include of a header already
included indirectly.

Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Richard Henderson 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/alpha/kernel/setup.c | 36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index b4fbbba30aa2..d88bdf852753 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -41,19 +41,11 @@
 #include 
 #include 
 #endif
-#include 
 #include 
 #include 
 #include 
 #include 
 
-static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
-static struct notifier_block alpha_panic_block = {
-   alpha_panic_event,
-NULL,
-INT_MAX /* try to do it first */
-};
-
 #include 
 #include 
 #include 
@@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
 };
 #endif
 
+static int alpha_panic_event(struct notifier_block *this,
+unsigned long event, void *ptr)
+{
+   /* If we are using SRM and serial console, just hard halt here. */
+   if (alpha_using_srm && srmcons_output)
+   __halt();
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block alpha_panic_block = {
+   .notifier_call = alpha_panic_event,
+   .priority = INT_MIN, /* may not return, do it last */
+};
+
 void __init
 setup_arch(char **cmdline_p)
 {
@@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
.show   = show_cpuinfo,
 };
 
-
-static int
-alpha_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
-{
-#if 1
-   /* FIXME FIXME FIXME */
-   /* If we are using SRM and serial console, just hard halt here. */
-   if (alpha_using_srm && srmcons_output)
-   __halt();
-#endif
-return NOTIFY_DONE;
-}
-
 static __init int add_pcspkr(void)
 {
struct platform_device *pd;
-- 
2.36.0



[PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-04-27 Thread Guilherme G. Piccoli
The panic notifier infrastructure executes registered callbacks when
a panic event happens - such callbacks are executed in atomic context,
with interrupts and preemption disabled in the running CPU and all other
CPUs disabled. That said, mutexes in such context are not a good idea.

This patch replaces a regular mutex with a mutex_trylock safer approach;
given the nature of the mutex used in the driver, it should be pretty
uncommon being unable to acquire such mutex in the panic path, hence
no functional change should be observed (and if it is, that would be
likely a deadlock with the regular mutex).

Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
Cc: Leo Yan 
Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Suzuki K Poulose 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 8845ec4b4402..1874df7c6a73 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -380,9 +380,10 @@ static int debug_notifier_call(struct notifier_block *self,
int cpu;
struct debug_drvdata *drvdata;
 
-   mutex_lock(&debug_lock);
+   /* Bail out if we can't acquire the mutex or the functionality is off */
+   if (!mutex_trylock(&debug_lock))
+   return NOTIFY_DONE;
 
-   /* Bail out if the functionality is disabled */
if (!debug_enable)
goto skip_dump;
 
@@ -401,7 +402,7 @@ static int debug_notifier_call(struct notifier_block *self,
 
 skip_dump:
mutex_unlock(&debug_lock);
-   return 0;
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block debug_notifier = {
-- 
2.36.0



[PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers

2022-04-27 Thread Guilherme G. Piccoli
The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.

As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.

Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:

- First one is responsible for hard-disable IRQs and fadump,
should run early;

- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;

- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.

The patch better documents the notifiers and clears the code too,
also removing a useless header.

Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.

Cc: Benjamin Herrenschmidt 
Cc: Hari Bathini 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Signed-off-by: Guilherme G. Piccoli 
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment (there's no need even for adding a credit card, like many "free"
clouds require ¬¬ ).

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/kernel/setup-common.c | 74 ++
 1 file changed, 54 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 518ae5aa9410..52f96b209a96 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -680,8 +679,25 @@ int check_legacy_ioport(unsigned long base_port)
 }
 EXPORT_SYMBOL(check_legacy_ioport);
 
-static int ppc_panic_event(struct notifier_block *this,
- unsigned long event, void *ptr)
+/*
+ * Panic notifiers setup
+ *
+ * We have 3 notifiers for powerpc, each one from a different "nature":
+ *
+ * - ppc_panic_fadump_handler() is a hypervisor notifier, which hard-disables
+ *   IRQs and deal with the Firmware-Assisted dump, when it is configured;
+ *   should run early in the panic path.
+ *
+ * - dump_kernel_offset() is an informative notifier, just showing the KASLR
+ *   offset if we have RANDOMIZE_BASE set.
+ *
+ * - ppc_panic_platform_handler() is a low-level handler that's registered
+ *   only if the platform wishes to perform final actions in the panic path,
+ *   hence it should run late and might not even return. Currently, only
+ *   pseries and ps3 platforms register callbacks.
+ */
+static int ppc_panic_fadump_handler(struct notifier_block *this,
+   unsigned long event, void *ptr)
 {
/*
 * panic does a local_irq_disable, but we really
@@ -691,45 +707,63 @@ static int ppc_panic_event(struct notifier_block *this,
 
/*
 * If firmware-assisted dump has been registered then trigger
-* firmware-assisted dump and let firmware handle everything else.
+* its callback and let the firmware handles everything else.
 */
crash_fadump(NULL, ptr);
-   if (ppc_md.panic)
-   ppc_md.panic(ptr);  /* May not return */
+
return NOTIFY_DONE;
 }
 
-static struct notifier_block ppc_panic_block = {
-   .notifier_call = ppc_panic_event,
-   .priority = INT_MIN /* may not return; must be done last */
-};
-
-/*
- * Dump out kernel offset information on panic.
- */
 static int dump_kernel_offset(struct notifier_block *self, unsigned long v,
  void *p)
 {
pr_emerg("Kernel Offset: 0x%lx from 0x%lx\n",
 kaslr_offset(), KERNELBASE);
 
-   return 0;
+   return NOTIFY_DONE;
 }
 
+static int ppc_panic_platform_handler(struct notifier_block *this,
+ unsigned long event, void *ptr)
+{
+   /*
+* This handler is only registered if we have a panic callback
+* on ppc_md, hence NULL check is not needed.
+* Also, it may not return, so it runs really late on panic path.
+*/
+   ppc_md.panic(ptr);
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_b

[PATCH 07/30] mips: ip22: Reword PANICED to PANICKED and remove useless header

2022-04-27 Thread Guilherme G. Piccoli
Many other place in the kernel prefer the latter, so let's keep
it consistent in MIPS code as well. Also, removes a useless header.

Cc: Thomas Bogendoerfer 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/mips/sgi-ip22/ip22-reset.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/mips/sgi-ip22/ip22-reset.c b/arch/mips/sgi-ip22/ip22-reset.c
index 9028dbbb45dd..8f0861c58080 100644
--- a/arch/mips/sgi-ip22/ip22-reset.c
+++ b/arch/mips/sgi-ip22/ip22-reset.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -41,7 +40,7 @@
 static struct timer_list power_timer, blink_timer, debounce_timer;
 static unsigned long blink_timer_timeout;
 
-#define MACHINE_PANICED1
+#define MACHINE_PANICKED   1
 #define MACHINE_SHUTTING_DOWN  2
 
 static int machine_state;
@@ -112,7 +111,7 @@ static void debounce(struct timer_list *unused)
return;
}
 
-   if (machine_state & MACHINE_PANICED)
+   if (machine_state & MACHINE_PANICKED)
sgimc->cpuctrl0 |= SGIMC_CCTRL0_SYSINIT;
 
enable_irq(SGI_PANEL_IRQ);
@@ -120,7 +119,7 @@ static void debounce(struct timer_list *unused)
 
 static inline void power_button(void)
 {
-   if (machine_state & MACHINE_PANICED)
+   if (machine_state & MACHINE_PANICKED)
return;
 
if ((machine_state & MACHINE_SHUTTING_DOWN) ||
@@ -167,9 +166,9 @@ static irqreturn_t panel_int(int irq, void *dev_id)
 static int panic_event(struct notifier_block *this, unsigned long event,
  void *ptr)
 {
-   if (machine_state & MACHINE_PANICED)
+   if (machine_state & MACHINE_PANICKED)
return NOTIFY_DONE;
-   machine_state |= MACHINE_PANICED;
+   machine_state |= MACHINE_PANICKED;
 
blink_timer_timeout = PANIC_FREQ;
blink_timeout(&blink_timer);
-- 
2.36.0



Re: [PATCH 24/30] panic: Refactor the panic path

2022-04-27 Thread Randy Dunlap



On 4/27/22 15:49, Guilherme G. Piccoli wrote:
> + crash_kexec_post_notifiers
> + This was DEPRECATED - users should always prefer the

This is DEPRECATED - users should always prefer the

> + parameter "panic_notifiers_level" - check its entry
> + in this documentation for details on how it works.
> + Setting this parameter is exactly the same as setting
> + "panic_notifiers_level=4".

-- 
~Randy


Re: [PATCH net-next 13/14] eth: spider: remove a copy of the NAPI_POLL_WEIGHT define

2022-04-27 Thread Geoff Levand
Hi Jakub,

On 4/27/22 08:41, Jakub Kicinski wrote:
> Defining local versions of NAPI_POLL_WEIGHT with the same
> values in the drivers just makes refactoring harder.

> --- a/drivers/net/ethernet/toshiba/spider_net.c
> +++ b/drivers/net/ethernet/toshiba/spider_net.c
>  
>   netif_napi_add(netdev, &card->napi,
> -spider_net_poll, SPIDER_NET_NAPI_WEIGHT);
> +spider_net_poll, NAPI_POLL_WEIGHT);

This seems fine. Both SPIDER_NET_NAPI_WEIGHT and NAPI_POLL_WEIGHT
are defined as 64.  Thanks for your contribution.

Acked-by: Geoff Levand 


RE: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML

2022-04-27 Thread Leo Li



> -Original Message-
> From: Michael Walle 
> Sent: Wednesday, April 27, 2022 2:54 AM
> To: Rob Herring ; Krzysztof Kozlowski
> 
> Cc: Leo Li ; Michael Walle ; Shawn
> Guo ; Thomas Gleixner ; Marc
> Zyngier ; linuxppc-dev@lists.ozlabs.org; linux-arm-
> ker...@lists.infradead.org; devicet...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl,ls-extirq:
> convert to YAML
> 
> Convert the fsl,ls-extirq binding to the new YAML format.
> 
> In contrast to the original binding documentation, there are three
> compatibles which are used in their corresponding device trees which have a
> specific compatible and the (already documented) fallback
> compatible:
>  - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq"
>  - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq"
>  - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq"
> 
> Depending on the number of the number of the external IRQs which is
> usually 12 except for the LS1021A where there are only 6, the interrupt-map-
> mask was reduced from 0x to 0xf and 0x7 respectively and the number
> of interrupt-map entries have to match.

I assume this change won't prevent driver to be compatible with older device 
trees using the 0x?  The original 0x should work for both 6/12 
interrupts or whatever reasonable number of interrupts that maybe used in 
future SoCs.  So the purpose of this change is to make the binding more 
specific to catch more errors in device tree?

> 
> Signed-off-by: Michael Walle 
> ---
> changes since v2:
>  - drop $ref to interrupt-controller.yaml
>  - use a more strict interrupt-map-mask and make it conditional on SoC
> 
> changes since v1:
>  - new patch
> 
>  .../interrupt-controller/fsl,ls-extirq.txt|  53 
>  .../interrupt-controller/fsl,ls-extirq.yaml   | 118 ++
>  2 files changed, 118 insertions(+), 53 deletions(-)  delete mode 100644
> Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt
>  create mode 100644 Documentation/devicetree/bindings/interrupt-
> controller/fsl,ls-extirq.yaml
> 
> diff --git a/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-
> extirq.txt b/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-
> extirq.txt
> deleted file mode 100644
> index 4d47df1a5c91..
> --- a/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-
> extirq.txt
> +++ /dev/null
> @@ -1,53 +0,0 @@
> -* Freescale Layerscape external IRQs
> -
> -Some Layerscape SOCs (LS1021A, LS1043A, LS1046A -LS1088A, LS208xA,
> LX216xA) support inverting -the polarity of certain external interrupt lines.
> -
> -The device node must be a child of the node representing the -
> Supplemental Configuration Unit (SCFG).
> -
> -Required properties:
> -- compatible: should be "fsl,-extirq", e.g. "fsl,ls1021a-extirq".
> -  "fsl,ls1043a-extirq": for LS1043A, LS1046A.
> -  "fsl,ls1088a-extirq": for LS1088A, LS208xA, LX216xA.
> -- #interrupt-cells: Must be 2. The first element is the index of the
> -  external interrupt line. The second element is the trigger type.
> -- #address-cells: Must be 0.
> -- interrupt-controller: Identifies the node as an interrupt controller
> -- reg: Specifies the Interrupt Polarity Control Register (INTPCR) in
> -  the SCFG or the External Interrupt Control Register (IRQCR) in
> -  the ISC.
> -- interrupt-map: Specifies the mapping from external interrupts to GIC
> -  interrupts.
> -- interrupt-map-mask: Must be <0x 0>.
> -
> -Example:
> - scfg: scfg@157 {
> - compatible = "fsl,ls1021a-scfg", "syscon";
> - reg = <0x0 0x157 0x0 0x1>;
> - big-endian;
> - #address-cells = <1>;
> - #size-cells = <1>;
> - ranges = <0x0 0x0 0x157 0x1>;
> -
> - extirq: interrupt-controller@1ac {
> - compatible = "fsl,ls1021a-extirq";
> - #interrupt-cells = <2>;
> - #address-cells = <0>;
> - interrupt-controller;
> - reg = <0x1ac 4>;
> - interrupt-map =
> - <0 0 &gic GIC_SPI 163
> IRQ_TYPE_LEVEL_HIGH>,
> - <1 0 &gic GIC_SPI 164
> IRQ_TYPE_LEVEL_HIGH>,
> - <2 0 &gic GIC_SPI 165
> IRQ_TYPE_LEVEL_HIGH>,
> - <3 0 &gic GIC_SPI 167
> IRQ_TYPE_LEVEL_HIGH>,
> - <4 0 &gic GIC_SPI 168
> IRQ_TYPE_LEVEL_HIGH>,
> - <5 0 &gic GIC_SPI 169
> IRQ_TYPE_LEVEL_HIGH>;
> - interrupt-map-mask = <0x 0x0>;
> - };
> - };
> -
> -
> - interrupts-extended = <&gic GIC_SPI 88 IRQ_TYPE_LEVEL_HIGH>,
> -   <&extirq 1 IRQ_TYPE_LEVEL_LOW>;
> diff --git a/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-
> extirq.yaml b/Documentation/devicetree/bindings/interrupt-
>

[PATCH net-next v5 18/18] team: Remove use of list iterator variable for list_for_each_entry_from()

2022-04-27 Thread Jakob Koschel
In preparation to limit the scope of the list iterator variable to the
list traversal loop, use a dedicated pointer to iterate through the
list [1].

Since that variable should not be used past the loop iteration, a
separate variable is used to 'remember the current location within the
loop'.

To either continue iterating from that position or skip the iteration
(if the previous iteration was complete) list_prepare_entry() is used.

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/net/team/team.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index b07dde6f0abf..688c4393f099 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2425,17 +2425,17 @@ static int team_nl_send_options_get(struct team *team, 
u32 portid, u32 seq,
int flags, team_nl_send_func_t *send_func,
struct list_head *sel_opt_inst_list)
 {
+   struct team_option_inst *opt_inst, *tmp = NULL;
struct nlattr *option_list;
struct nlmsghdr *nlh;
void *hdr;
-   struct team_option_inst *opt_inst;
int err;
struct sk_buff *skb = NULL;
bool incomplete;
int i;
 
-   opt_inst = list_first_entry(sel_opt_inst_list,
-   struct team_option_inst, tmp_list);
+   tmp = list_first_entry(sel_opt_inst_list,
+  struct team_option_inst, tmp_list);
 
 start_again:
err = __send_and_alloc_skb(&skb, team, portid, send_func);
@@ -2456,7 +2456,9 @@ static int team_nl_send_options_get(struct team *team, 
u32 portid, u32 seq,
goto nla_put_failure;
 
i = 0;
+   opt_inst = list_prepare_entry(tmp, sel_opt_inst_list, tmp_list);
incomplete = false;
+   tmp = NULL;
list_for_each_entry_from(opt_inst, sel_opt_inst_list, tmp_list) {
err = team_nl_fill_one_option_get(skb, team, opt_inst);
if (err) {
@@ -2464,6 +2466,7 @@ static int team_nl_send_options_get(struct team *team, 
u32 portid, u32 seq,
if (!i)
goto errout;
incomplete = true;
+   tmp = opt_inst;
break;
}
goto errout;
@@ -2707,14 +2710,14 @@ static int team_nl_send_port_list_get(struct team 
*team, u32 portid, u32 seq,
struct nlattr *port_list;
struct nlmsghdr *nlh;
void *hdr;
-   struct team_port *port;
+   struct team_port *port, *tmp = NULL;
int err;
struct sk_buff *skb = NULL;
bool incomplete;
int i;
 
-   port = list_first_entry_or_null(&team->port_list,
-   struct team_port, list);
+   tmp = list_first_entry_or_null(&team->port_list,
+  struct team_port, list);
 
 start_again:
err = __send_and_alloc_skb(&skb, team, portid, send_func);
@@ -2744,7 +2747,9 @@ static int team_nl_send_port_list_get(struct team *team, 
u32 portid, u32 seq,
err = team_nl_fill_one_port_get(skb, one_port);
if (err)
goto errout;
-   } else if (port) {
+   } else {
+   port = list_prepare_entry(tmp, &team->port_list, list);
+   tmp = NULL;
list_for_each_entry_from(port, &team->port_list, list) {
err = team_nl_fill_one_port_get(skb, port);
if (err) {
@@ -2752,6 +2757,7 @@ static int team_nl_send_port_list_get(struct team *team, 
u32 portid, u32 seq,
if (!i)
goto errout;
incomplete = true;
+   tmp = port;
break;
}
goto errout;
-- 
2.25.1



[PATCH net-next v5 17/18] ipvlan: Remove usage of list iterator variable for the loop body

2022-04-27 Thread Jakob Koschel
In preparation to limit the scope of the list iterator variable to the
list traversal loop, use a dedicated pointer to iterate through the
list [1].

Since that variable should not be used past the loop iteration, a
separate variable is used to 'remember the current location within the
loop'.

To either continue iterating from that position or start a new
iteration (if the previous iteration was complete) list_prepare_entry()
is used.

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/net/ipvlan/ipvlan_main.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 696e245f6d00..063d7c30e944 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -9,7 +9,7 @@
 static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval,
struct netlink_ext_ack *extack)
 {
-   struct ipvl_dev *ipvlan;
+   struct ipvl_dev *ipvlan, *tmp = NULL;
unsigned int flags;
int err;
 
@@ -26,8 +26,10 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 
nval,
   flags & ~IFF_NOARP,
   extack);
}
-   if (unlikely(err))
+   if (unlikely(err)) {
+   tmp = ipvlan;
goto fail;
+   }
}
if (nval == IPVLAN_MODE_L3S) {
/* New mode is L3S */
@@ -43,6 +45,7 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 
nval,
return 0;
 
 fail:
+   ipvlan = list_prepare_entry(tmp, &port->ipvlans, pnode);
/* Undo the flags changes that have been done so far. */
list_for_each_entry_continue_reverse(ipvlan, &port->ipvlans, pnode) {
flags = ipvlan->dev->flags;
-- 
2.25.1



[PATCH net-next v5 16/18] ps3_gelic: Replace usage of found with dedicated list iterator variable

2022-04-27 Thread Jakob Koschel
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 .../net/ethernet/toshiba/ps3_gelic_wireless.c | 30 +--
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c 
b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
index dc14a66583ff..c8a016c902cd 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
@@ -1495,14 +1495,14 @@ static int gelic_wl_start_scan(struct gelic_wl_info 
*wl, int always_scan,
  */
 static void gelic_wl_scan_complete_event(struct gelic_wl_info *wl)
 {
+   struct gelic_wl_scan_info *target = NULL, *iter, *tmp;
struct gelic_eurus_cmd *cmd = NULL;
-   struct gelic_wl_scan_info *target, *tmp;
struct gelic_wl_scan_info *oldest = NULL;
struct gelic_eurus_scan_info *scan_info;
unsigned int scan_info_size;
union iwreq_data data;
unsigned long this_time = jiffies;
-   unsigned int data_len, i, found, r;
+   unsigned int data_len, i, r;
void *buf;
 
pr_debug("%s:start\n", __func__);
@@ -1539,14 +1539,14 @@ static void gelic_wl_scan_complete_event(struct 
gelic_wl_info *wl)
wl->scan_stat = GELIC_WL_SCAN_STAT_GOT_LIST;
 
/* mark all entries are old */
-   list_for_each_entry_safe(target, tmp, &wl->network_list, list) {
-   target->valid = 0;
+   list_for_each_entry_safe(iter, tmp, &wl->network_list, list) {
+   iter->valid = 0;
/* expire too old entries */
-   if (time_before(target->last_scanned + wl->scan_age,
+   if (time_before(iter->last_scanned + wl->scan_age,
this_time)) {
-   kfree(target->hwinfo);
-   target->hwinfo = NULL;
-   list_move_tail(&target->list, &wl->network_free_list);
+   kfree(iter->hwinfo);
+   iter->hwinfo = NULL;
+   list_move_tail(&iter->list, &wl->network_free_list);
}
}
 
@@ -1569,22 +1569,22 @@ static void gelic_wl_scan_complete_event(struct 
gelic_wl_info *wl)
continue;
}
 
-   found = 0;
+   target = NULL;
oldest = NULL;
-   list_for_each_entry(target, &wl->network_list, list) {
-   if (ether_addr_equal(&target->hwinfo->bssid[2],
+   list_for_each_entry(iter, &wl->network_list, list) {
+   if (ether_addr_equal(&iter->hwinfo->bssid[2],
 &scan_info->bssid[2])) {
-   found = 1;
+   target = iter;
pr_debug("%s: same BBS found scanned list\n",
 __func__);
break;
}
if (!oldest ||
-   (target->last_scanned < oldest->last_scanned))
-   oldest = target;
+   (iter->last_scanned < oldest->last_scanned))
+   oldest = iter;
}
 
-   if (!found) {
+   if (!target) {
/* not found in the list */
if (list_empty(&wl->network_free_list)) {
/* expire oldest */
-- 
2.25.1



[PATCH net-next v5 15/18] net: netcp: Remove usage of list iterator for list_add() after loop body

2022-04-27 Thread Jakob Koschel
In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer pointing to the location
where the element should be inserted [1].

Before, the code implicitly used the head when no element was found
when using &next->list. The new 'pos' variable is set to the list head
by default and overwritten if the list exits early, marking the
insertion point for list_add().

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/net/ethernet/ti/netcp_core.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_core.c 
b/drivers/net/ethernet/ti/netcp_core.c
index 16507bff652a..f25104b5a31b 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -471,6 +471,7 @@ struct netcp_hook_list {
 int netcp_register_txhook(struct netcp_intf *netcp_priv, int order,
  netcp_hook_rtn *hook_rtn, void *hook_data)
 {
+   struct list_head *pos = &netcp_priv->txhook_list_head;
struct netcp_hook_list *entry;
struct netcp_hook_list *next;
unsigned long flags;
@@ -485,10 +486,12 @@ int netcp_register_txhook(struct netcp_intf *netcp_priv, 
int order,
 
spin_lock_irqsave(&netcp_priv->lock, flags);
list_for_each_entry(next, &netcp_priv->txhook_list_head, list) {
-   if (next->order > order)
+   if (next->order > order) {
+   pos = &next->list;
break;
+   }
}
-   __list_add(&entry->list, next->list.prev, &next->list);
+   list_add_tail(&entry->list, pos);
spin_unlock_irqrestore(&netcp_priv->lock, flags);
 
return 0;
@@ -520,6 +523,7 @@ EXPORT_SYMBOL_GPL(netcp_unregister_txhook);
 int netcp_register_rxhook(struct netcp_intf *netcp_priv, int order,
  netcp_hook_rtn *hook_rtn, void *hook_data)
 {
+   struct list_head *pos = &netcp_priv->rxhook_list_head;
struct netcp_hook_list *entry;
struct netcp_hook_list *next;
unsigned long flags;
@@ -534,10 +538,12 @@ int netcp_register_rxhook(struct netcp_intf *netcp_priv, 
int order,
 
spin_lock_irqsave(&netcp_priv->lock, flags);
list_for_each_entry(next, &netcp_priv->rxhook_list_head, list) {
-   if (next->order > order)
+   if (next->order > order) {
+   pos = &next->list;
break;
+   }
}
-   __list_add(&entry->list, next->list.prev, &next->list);
+   list_add_tail(&entry->list, pos);
spin_unlock_irqrestore(&netcp_priv->lock, flags);
 
return 0;
-- 
2.25.1



[PATCH net-next v5 14/18] sfc: Remove usage of list iterator for list_add() after the loop body

2022-04-27 Thread Jakob Koschel
In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer pointing to the location
where the element should be inserted [1].

Before, the code implicitly used the head when no element was found
when using &new->list. The new 'pos' variable is set to the list head
by default and overwritten if the list exits early, marking the
insertion point for list_add().

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/net/ethernet/sfc/rx_common.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx_common.c 
b/drivers/net/ethernet/sfc/rx_common.c
index fa8b9aacca11..74c056210e0b 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -560,14 +560,17 @@ struct efx_rss_context 
*efx_alloc_rss_context_entry(struct efx_nic *efx)
 {
struct list_head *head = &efx->rss_context.list;
struct efx_rss_context *ctx, *new;
+   struct list_head *pos = head;
u32 id = 1; /* Don't use zero, that refers to the master RSS context */
 
WARN_ON(!mutex_is_locked(&efx->rss_lock));
 
/* Search for first gap in the numbering */
list_for_each_entry(ctx, head, list) {
-   if (ctx->user_id != id)
+   if (ctx->user_id != id) {
+   pos = &ctx->list;
break;
+   }
id++;
/* Check for wrap.  If this happens, we have nearly 2^32
 * allocated RSS contexts, which seems unlikely.
@@ -585,7 +588,7 @@ struct efx_rss_context *efx_alloc_rss_context_entry(struct 
efx_nic *efx)
 
/* Insert the new entry into the gap */
new->user_id = id;
-   list_add_tail(&new->list, &ctx->list);
+   list_add_tail(&new->list, pos);
return new;
 }
 
-- 
2.25.1



[PATCH net-next v5 13/18] net: qede: Remove check of list iterator against head past the loop body

2022-04-27 Thread Jakob Koschel
When list_for_each_entry() completes the iteration over the whole list
without breaking the loop, the iterator value will be a bogus pointer
computed based on the head element.

While it is safe to use the pointer to determine if it was computed
based on the head element, either with list_entry_is_head() or
&pos->member == head, using the iterator variable after the loop should
be avoided.

In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer to point to the found element [1].

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/net/ethernet/qlogic/qede/qede_filter.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c 
b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index 3010833ddde3..3d167e37e654 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -829,18 +829,21 @@ int qede_configure_vlan_filters(struct qede_dev *edev)
 int qede_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid)
 {
struct qede_dev *edev = netdev_priv(dev);
-   struct qede_vlan *vlan;
+   struct qede_vlan *vlan = NULL;
+   struct qede_vlan *iter;
int rc = 0;
 
DP_VERBOSE(edev, NETIF_MSG_IFDOWN, "Removing vlan 0x%04x\n", vid);
 
/* Find whether entry exists */
__qede_lock(edev);
-   list_for_each_entry(vlan, &edev->vlan_list, list)
-   if (vlan->vid == vid)
+   list_for_each_entry(iter, &edev->vlan_list, list)
+   if (iter->vid == vid) {
+   vlan = iter;
break;
+   }
 
-   if (list_entry_is_head(vlan, &edev->vlan_list, list)) {
+   if (!vlan) {
DP_VERBOSE(edev, (NETIF_MSG_IFUP | NETIF_MSG_IFDOWN),
   "Vlan isn't configured\n");
goto out;
-- 
2.25.1



[PATCH net-next v5 12/18] net: qede: Replace usage of found with dedicated list iterator variable

2022-04-27 Thread Jakob Koschel
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/net/ethernet/qlogic/qede/qede_rdma.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_rdma.c 
b/drivers/net/ethernet/qlogic/qede/qede_rdma.c
index 6304514a6f2c..2eb03ffe2484 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_rdma.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_rdma.c
@@ -246,18 +246,17 @@ static void qede_rdma_change_mtu(struct qede_dev *edev)
 static struct qede_rdma_event_work *
 qede_rdma_get_free_event_node(struct qede_dev *edev)
 {
-   struct qede_rdma_event_work *event_node = NULL;
-   bool found = false;
+   struct qede_rdma_event_work *event_node = NULL, *iter;
 
-   list_for_each_entry(event_node, &edev->rdma_info.rdma_event_list,
+   list_for_each_entry(iter, &edev->rdma_info.rdma_event_list,
list) {
-   if (!work_pending(&event_node->work)) {
-   found = true;
+   if (!work_pending(&iter->work)) {
+   event_node = iter;
break;
}
}
 
-   if (!found) {
+   if (!event_node) {
event_node = kzalloc(sizeof(*event_node), GFP_ATOMIC);
if (!event_node) {
DP_NOTICE(edev,
-- 
2.25.1



[PATCH net-next v5 11/18] qed: Remove usage of list iterator variable after the loop

2022-04-27 Thread Jakob Koschel
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

Since "found" and "p_ent" need to be equal, "found" should be used
consistently to limit the scope of "p_ent" to the list traversal in
the future.

Signed-off-by: Jakob Koschel 
---
 drivers/net/ethernet/qlogic/qed/qed_spq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c 
b/drivers/net/ethernet/qlogic/qed/qed_spq.c
index d01b9245f811..cbaa2abed660 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_spq.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c
@@ -934,10 +934,10 @@ int qed_spq_completion(struct qed_hwfn *p_hwfn,
   u8 fw_return_code,
   union event_ring_data *p_data)
 {
+   struct qed_spq_entry*found = NULL;
struct qed_spq  *p_spq;
-   struct qed_spq_entry*p_ent = NULL;
+   struct qed_spq_entry*p_ent;
struct qed_spq_entry*tmp;
-   struct qed_spq_entry*found = NULL;
 
if (!p_hwfn)
return -EINVAL;
@@ -980,7 +980,7 @@ int qed_spq_completion(struct qed_hwfn *p_hwfn,
DP_VERBOSE(p_hwfn, QED_MSG_SPQ,
   "Complete EQE [echo %04x]: func %p cookie %p)\n",
   le16_to_cpu(echo),
-  p_ent->comp_cb.function, p_ent->comp_cb.cookie);
+  found->comp_cb.function, found->comp_cb.cookie);
if (found->comp_cb.function)
found->comp_cb.function(p_hwfn, found->comp_cb.cookie, p_data,
fw_return_code);
-- 
2.25.1



[PATCH net-next v5 10/18] qed: Replace usage of found with dedicated list iterator variable

2022-04-27 Thread Jakob Koschel
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 26 ++---
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c 
b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 1d1d4caad680..198c9321bf51 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -1630,38 +1630,36 @@ static struct qed_iwarp_listener *
 qed_iwarp_get_listener(struct qed_hwfn *p_hwfn,
   struct qed_iwarp_cm_info *cm_info)
 {
-   struct qed_iwarp_listener *listener = NULL;
+   struct qed_iwarp_listener *listener = NULL, *iter;
static const u32 ip_zero[4] = { 0, 0, 0, 0 };
-   bool found = false;
 
-   list_for_each_entry(listener,
+   list_for_each_entry(iter,
&p_hwfn->p_rdma_info->iwarp.listen_list,
list_entry) {
-   if (listener->port == cm_info->local_port) {
-   if (!memcmp(listener->ip_addr,
+   if (iter->port == cm_info->local_port) {
+   if (!memcmp(iter->ip_addr,
ip_zero, sizeof(ip_zero))) {
-   found = true;
+   listener = iter;
break;
}
 
-   if (!memcmp(listener->ip_addr,
+   if (!memcmp(iter->ip_addr,
cm_info->local_ip,
sizeof(cm_info->local_ip)) &&
-   (listener->vlan == cm_info->vlan)) {
-   found = true;
+   iter->vlan == cm_info->vlan) {
+   listener = iter;
break;
}
}
}
 
-   if (found) {
+   if (listener)
DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "listener found = %p\n",
   listener);
-   return listener;
-   }
+   else
+   DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "listener not found\n");
 
-   DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "listener not found\n");
-   return NULL;
+   return listener;
 }
 
 static int
-- 
2.25.1



[PATCH net-next v5 09/18] qed: Use dedicated list iterator variable

2022-04-27 Thread Jakob Koschel
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable [1].

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 672480c9d195..e920e7dcf66a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -174,7 +174,7 @@ int qed_db_recovery_add(struct qed_dev *cdev,
 int qed_db_recovery_del(struct qed_dev *cdev,
void __iomem *db_addr, void *db_data)
 {
-   struct qed_db_recovery_entry *db_entry = NULL;
+   struct qed_db_recovery_entry *db_entry = NULL, *iter;
struct qed_hwfn *p_hwfn;
int rc = -EINVAL;
 
@@ -190,12 +190,13 @@ int qed_db_recovery_del(struct qed_dev *cdev,
 
/* Protect the list */
spin_lock_bh(&p_hwfn->db_recovery_info.lock);
-   list_for_each_entry(db_entry,
+   list_for_each_entry(iter,
&p_hwfn->db_recovery_info.list, list_entry) {
/* search according to db_data addr since db_addr is not unique 
(roce) */
-   if (db_entry->db_data == db_data) {
-   qed_db_recovery_dp_entry(p_hwfn, db_entry, "Deleting");
-   list_del(&db_entry->list_entry);
+   if (iter->db_data == db_data) {
+   qed_db_recovery_dp_entry(p_hwfn, iter, "Deleting");
+   list_del(&iter->list_entry);
+   db_entry = iter;
rc = 0;
break;
}
-- 
2.25.1



[PATCH net-next v5 08/18] net: sparx5: Replace usage of found with dedicated list iterator variable

2022-04-27 Thread Jakob Koschel
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 .../microchip/sparx5/sparx5_mactable.c| 25 +--
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c 
b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
index a5837dbe0c7e..bb8d9ce79ac2 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
@@ -362,8 +362,7 @@ static void sparx5_mact_handle_entry(struct sparx5 *sparx5,
 unsigned char mac[ETH_ALEN],
 u16 vid, u32 cfg2)
 {
-   struct sparx5_mact_entry *mact_entry;
-   bool found = false;
+   struct sparx5_mact_entry *mact_entry = NULL, *iter;
u16 port;
 
if (LRN_MAC_ACCESS_CFG_2_MAC_ENTRY_ADDR_TYPE_GET(cfg2) !=
@@ -378,28 +377,28 @@ static void sparx5_mact_handle_entry(struct sparx5 
*sparx5,
return;
 
mutex_lock(&sparx5->mact_lock);
-   list_for_each_entry(mact_entry, &sparx5->mact_entries, list) {
-   if (mact_entry->vid == vid &&
-   ether_addr_equal(mac, mact_entry->mac)) {
-   found = true;
-   mact_entry->flags |= MAC_ENT_ALIVE;
-   if (mact_entry->port != port) {
+   list_for_each_entry(iter, &sparx5->mact_entries, list) {
+   if (iter->vid == vid &&
+   ether_addr_equal(mac, iter->mac)) {
+   iter->flags |= MAC_ENT_ALIVE;
+   if (iter->port != port) {
dev_warn(sparx5->dev, "Entry move: %d -> %d\n",
-mact_entry->port, port);
-   mact_entry->port = port;
-   mact_entry->flags |= MAC_ENT_MOVED;
+iter->port, port);
+   iter->port = port;
+   iter->flags |= MAC_ENT_MOVED;
}
/* Entry handled */
+   mact_entry = iter;
break;
}
}
mutex_unlock(&sparx5->mact_lock);
 
-   if (found && !(mact_entry->flags & MAC_ENT_MOVED))
+   if (mact_entry && !(mact_entry->flags & MAC_ENT_MOVED))
/* Present, not moved */
return;
 
-   if (!found) {
+   if (!mact_entry) {
/* Entry not found - now add */
mact_entry = alloc_mact_entry(sparx5, mac, vid, port);
if (!mact_entry)
-- 
2.25.1



[PATCH net-next v5 07/18] net: dsa: Replace usage of found with dedicated list iterator variable

2022-04-27 Thread Jakob Koschel
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
Reviewed-by: Vladimir Oltean 
Reviewed-by: Florian Fainelli 
---
 net/dsa/dsa.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 89c6c86e746f..645522c4dd4a 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -112,22 +112,21 @@ const struct dsa_device_ops 
*dsa_find_tagger_by_name(const char *buf)
 
 const struct dsa_device_ops *dsa_tag_driver_get(int tag_protocol)
 {
-   struct dsa_tag_driver *dsa_tag_driver;
+   struct dsa_tag_driver *dsa_tag_driver = NULL, *iter;
const struct dsa_device_ops *ops;
-   bool found = false;
 
request_module("%s%d", DSA_TAG_DRIVER_ALIAS, tag_protocol);
 
mutex_lock(&dsa_tag_drivers_lock);
-   list_for_each_entry(dsa_tag_driver, &dsa_tag_drivers_list, list) {
-   ops = dsa_tag_driver->ops;
+   list_for_each_entry(iter, &dsa_tag_drivers_list, list) {
+   ops = iter->ops;
if (ops->proto == tag_protocol) {
-   found = true;
+   dsa_tag_driver = iter;
break;
}
}
 
-   if (found) {
+   if (dsa_tag_driver) {
if (!try_module_get(dsa_tag_driver->owner))
ops = ERR_PTR(-ENOPROTOOPT);
} else {
-- 
2.25.1



[PATCH net-next v5 06/18] net: dsa: mv88e6xxx: refactor mv88e6xxx_port_vlan()

2022-04-27 Thread Jakob Koschel
From: Vladimir Oltean 

To avoid bugs and speculative execution exploits due to type-confused
pointers at the end of a list_for_each_entry() loop, one measure is to
restrict code to not use the iterator variable outside the loop block.

In the case of mv88e6xxx_port_vlan(), this isn't a problem, as we never
let the loops exit through "natural causes" anyway, by using a "found"
variable and then using the last "dp" iterator prior to the break, which
is a safe thing to do.

Nonetheless, with the expected new syntax, this pattern will no longer
be possible.

Profit off of the occasion and break the two port finding methods into
smaller sub-functions. Somehow, returning a copy of the iterator pointer
is still accepted.

This change makes it redundant to have a "bool found", since the "dp"
from mv88e6xxx_port_vlan() now holds NULL if we haven't found what we
were looking for.

Signed-off-by: Vladimir Oltean 
Signed-off-by: Jakob Koschel 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 54 ++--
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index b3aa0e5bc842..1f35e89053e6 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1378,42 +1378,50 @@ static int mv88e6xxx_set_mac_eee(struct dsa_switch *ds, 
int port,
return 0;
 }
 
+static struct dsa_port *mv88e6xxx_find_port(struct dsa_switch_tree *dst,
+   int sw_index, int port)
+{
+   struct dsa_port *dp;
+
+   list_for_each_entry(dp, &dst->ports, list)
+   if (dp->ds->index == sw_index && dp->index == port)
+   return dp;
+
+   return NULL;
+}
+
+static struct dsa_port *
+mv88e6xxx_find_port_by_bridge_num(struct dsa_switch_tree *dst,
+ unsigned int bridge_num)
+{
+   struct dsa_port *dp;
+
+   list_for_each_entry(dp, &dst->ports, list)
+   if (dsa_port_bridge_num_get(dp) == bridge_num)
+   return dp;
+
+   return NULL;
+}
+
 /* Mask of the local ports allowed to receive frames from a given fabric port 
*/
 static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port)
 {
struct dsa_switch *ds = chip->ds;
struct dsa_switch_tree *dst = ds->dst;
struct dsa_port *dp, *other_dp;
-   bool found = false;
u16 pvlan;
 
-   /* dev is a physical switch */
if (dev <= dst->last_switch) {
-   list_for_each_entry(dp, &dst->ports, list) {
-   if (dp->ds->index == dev && dp->index == port) {
-   /* dp might be a DSA link or a user port, so it
-* might or might not have a bridge.
-* Use the "found" variable for both cases.
-*/
-   found = true;
-   break;
-   }
-   }
-   /* dev is a virtual bridge */
+   /* dev is a physical switch */
+   dp = mv88e6xxx_find_port(dst, dev, port);
} else {
-   list_for_each_entry(dp, &dst->ports, list) {
-   unsigned int bridge_num = dsa_port_bridge_num_get(dp);
-
-   if (bridge_num + dst->last_switch != dev)
-   continue;
-
-   found = true;
-   break;
-   }
+   /* dev is a virtual bridge */
+   dp = mv88e6xxx_find_port_by_bridge_num(dst,
+  dev - dst->last_switch);
}
 
/* Prevent frames from unknown switch or virtual bridge */
-   if (!found)
+   if (!dp)
return 0;
 
/* Frames from DSA links and CPU ports can egress any local port */
-- 
2.25.1



[PATCH net-next v5 05/18] net: dsa: mv88e6xxx: remove redundant check in mv88e6xxx_port_vlan()

2022-04-27 Thread Jakob Koschel
From: Vladimir Oltean 

We know that "dev > dst->last_switch" in the "else" block.
In other words, that "dev - dst->last_switch" is > 0.

dsa_port_bridge_num_get(dp) can be 0, but the check
"if (bridge_num + dst->last_switch != dev) continue", rewritten as
"if (bridge_num != dev - dst->last_switch) continue", aka
"if (bridge_num != something which cannot be 0) continue",
makes it redundant to have the extra "if (!bridge_num) continue" logic,
since a bridge_num of zero would have been skipped anyway.

Signed-off-by: Vladimir Oltean 
Signed-off-by: Jakob Koschel 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 64f4fdd02902..b3aa0e5bc842 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1404,9 +1404,6 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip 
*chip, int dev, int port)
list_for_each_entry(dp, &dst->ports, list) {
unsigned int bridge_num = dsa_port_bridge_num_get(dp);
 
-   if (!bridge_num)
-   continue;
-
if (bridge_num + dst->last_switch != dev)
continue;
 
-- 
2.25.1



[PATCH net-next v5 04/18] net: dsa: sja1105: use list_add_tail(pos) instead of list_add(pos->prev)

2022-04-27 Thread Jakob Koschel
From: Vladimir Oltean 

When passed a non-head list element, list_add_tail() actually adds the
new element to its left, which is what we want. Despite the slightly
confusing name, use the dedicated function which does the same thing as
the open-coded list_add(pos->prev).

Suggested-by: Jakub Kicinski 
Signed-off-by: Vladimir Oltean 
Signed-off-by: Jakob Koschel 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/sja1105/sja1105_vl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
b/drivers/net/dsa/sja1105/sja1105_vl.c
index e5ea8eb9ec4e..7fe9b18f1cbd 100644
--- a/drivers/net/dsa/sja1105/sja1105_vl.c
+++ b/drivers/net/dsa/sja1105/sja1105_vl.c
@@ -49,7 +49,7 @@ static int sja1105_insert_gate_entry(struct 
sja1105_gating_config *gating_cfg,
e->rule = rule;
e->gate_state = gate_state;
e->interval = entry_time;
-   list_add(&e->list, pos->prev);
+   list_add_tail(&e->list, pos);
 
gating_cfg->num_entries++;
 
-- 
2.25.1



[PATCH net-next v5 03/18] net: dsa: sja1105: reorder sja1105_first_entry_longer_than with memory allocation

2022-04-27 Thread Jakob Koschel
From: Vladimir Oltean 

sja1105_first_entry_longer_than() does not make use of the full struct
sja1105_gate_entry *e, just of e->interval which is set from the passed
entry_time.

This means that if there is a gate conflict, we have allocated e for
nothing, just to free it later. Reorder the memory allocation and the
function call, to avoid that and simplify the error unwind path.

Signed-off-by: Vladimir Oltean 
Signed-off-by: Jakob Koschel 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/sja1105/sja1105_vl.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
b/drivers/net/dsa/sja1105/sja1105_vl.c
index 369be2ac3587..e5ea8eb9ec4e 100644
--- a/drivers/net/dsa/sja1105/sja1105_vl.c
+++ b/drivers/net/dsa/sja1105/sja1105_vl.c
@@ -36,7 +36,11 @@ static int sja1105_insert_gate_entry(struct 
sja1105_gating_config *gating_cfg,
 {
struct sja1105_gate_entry *e;
struct list_head *pos;
-   int rc;
+
+   pos = sja1105_first_entry_longer_than(&gating_cfg->entries,
+ entry_time, extack);
+   if (IS_ERR(pos))
+   return PTR_ERR(pos);
 
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (!e)
@@ -45,22 +49,11 @@ static int sja1105_insert_gate_entry(struct 
sja1105_gating_config *gating_cfg,
e->rule = rule;
e->gate_state = gate_state;
e->interval = entry_time;
-
-   pos = sja1105_first_entry_longer_than(&gating_cfg->entries,
- e->interval, extack);
-   if (IS_ERR(pos)) {
-   rc = PTR_ERR(pos);
-   goto err;
-   }
-
list_add(&e->list, pos->prev);
 
gating_cfg->num_entries++;
 
return 0;
-err:
-   kfree(e);
-   return rc;
 }
 
 /* The gate entries contain absolute times in their e->interval field. Convert
-- 
2.25.1



[PATCH net-next v5 02/18] net: dsa: sja1105: remove use of iterator after list_for_each_entry() loop

2022-04-27 Thread Jakob Koschel
From: Vladimir Oltean 

The link below explains that there is a desire to syntactically change
list_for_each_entry() and list_for_each() such that it becomes
impossible to use the iterator variable outside the scope of the loop.

Although sja1105_insert_gate_entry() makes legitimate use of the
iterator pointer when it breaks out, the pattern it uses may become
illegal, so it needs to change.

It is deemed acceptable to use a copy of the loop iterator, and
sja1105_insert_gate_entry() only needs to know the list_head element
before which the list insertion should be made. So let's profit from the
occasion and refactor the list iteration to a dedicated function.

An additional benefit is given by the fact that with the helper function
in place, we no longer need to special-case the empty list, since it is
equivalent to not having found any gating entry larger than the
specified interval in the list. We just need to insert at the tail of
that list (list_add vs list_add_tail on an empty list does the same
thing).

Link: 
https://patchwork.kernel.org/project/netdevbpf/patch/20220407102900.3086255-3-jakobkosc...@gmail.com/#24810127
Signed-off-by: Vladimir Oltean 
Signed-off-by: Jakob Koschel 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/sja1105/sja1105_vl.c | 46 ++--
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
b/drivers/net/dsa/sja1105/sja1105_vl.c
index b7e95d60a6e4..369be2ac3587 100644
--- a/drivers/net/dsa/sja1105/sja1105_vl.c
+++ b/drivers/net/dsa/sja1105/sja1105_vl.c
@@ -7,6 +7,27 @@
 
 #define SJA1105_SIZE_VL_STATUS 8
 
+static struct list_head *
+sja1105_first_entry_longer_than(struct list_head *entries,
+   s64 interval,
+   struct netlink_ext_ack *extack)
+{
+   struct sja1105_gate_entry *p;
+
+   list_for_each_entry(p, entries, list) {
+   if (p->interval == interval) {
+   NL_SET_ERR_MSG_MOD(extack, "Gate conflict");
+   return ERR_PTR(-EBUSY);
+   }
+
+   if (interval < p->interval)
+   return &p->list;
+   }
+
+   /* Empty list, or specified interval is largest within the list */
+   return entries;
+}
+
 /* Insert into the global gate list, sorted by gate action time. */
 static int sja1105_insert_gate_entry(struct sja1105_gating_config *gating_cfg,
 struct sja1105_rule *rule,
@@ -14,6 +35,7 @@ static int sja1105_insert_gate_entry(struct 
sja1105_gating_config *gating_cfg,
 struct netlink_ext_ack *extack)
 {
struct sja1105_gate_entry *e;
+   struct list_head *pos;
int rc;
 
e = kzalloc(sizeof(*e), GFP_KERNEL);
@@ -24,25 +46,15 @@ static int sja1105_insert_gate_entry(struct 
sja1105_gating_config *gating_cfg,
e->gate_state = gate_state;
e->interval = entry_time;
 
-   if (list_empty(&gating_cfg->entries)) {
-   list_add(&e->list, &gating_cfg->entries);
-   } else {
-   struct sja1105_gate_entry *p;
-
-   list_for_each_entry(p, &gating_cfg->entries, list) {
-   if (p->interval == e->interval) {
-   NL_SET_ERR_MSG_MOD(extack,
-  "Gate conflict");
-   rc = -EBUSY;
-   goto err;
-   }
-
-   if (e->interval < p->interval)
-   break;
-   }
-   list_add(&e->list, p->list.prev);
+   pos = sja1105_first_entry_longer_than(&gating_cfg->entries,
+ e->interval, extack);
+   if (IS_ERR(pos)) {
+   rc = PTR_ERR(pos);
+   goto err;
}
 
+   list_add(&e->list, pos->prev);
+
gating_cfg->num_entries++;
 
return 0;
-- 
2.25.1



[PATCH net-next v5 01/18] connector: Replace usage of found with dedicated list iterator variable

2022-04-27 Thread Jakob Koschel
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/connector/cn_queue.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index 996f025eb63c..ed77599b0b25 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -92,20 +92,19 @@ int cn_queue_add_callback(struct cn_queue_dev *dev, const 
char *name,
 
 void cn_queue_del_callback(struct cn_queue_dev *dev, const struct cb_id *id)
 {
-   struct cn_callback_entry *cbq, *n;
-   int found = 0;
+   struct cn_callback_entry *cbq = NULL, *iter, *n;
 
spin_lock_bh(&dev->queue_lock);
-   list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry) {
-   if (cn_cb_equal(&cbq->id.id, id)) {
-   list_del(&cbq->callback_entry);
-   found = 1;
+   list_for_each_entry_safe(iter, n, &dev->queue_list, callback_entry) {
+   if (cn_cb_equal(&iter->id.id, id)) {
+   list_del(&iter->callback_entry);
+   cbq = iter;
break;
}
}
spin_unlock_bh(&dev->queue_lock);
 
-   if (found)
+   if (cbq)
cn_queue_release_callback(cbq);
 }
 
-- 
2.25.1



[PATCH net-next v5 00/18] Remove use of list iterator after loop body

2022-04-27 Thread Jakob Koschel
When the list iterator loop does not exit early the list iterator variable
contains a type-confused pointer to a 'bogus' list element computed based
on the head [1].

Often a 'found' variable is used to ensure the list iterator
variable is only accessed after the loop body if the loop did exit early
(using a break or goto).

In other cases that list iterator variable is used in
combination to access the list member which reverses the invocation of
container_of() and brings back a "safe" pointer to the head of the list.

Since, due to this code patten, there were quite a few bugs discovered [2],
Linus concluded that the rule should be to never use the list iterator
after the loop and introduce a dedicated pointer for that [3].

With the new gnu11 standard, it will now be possible to limit the scope
of the list iterator variable to the traversal loop itself by defining
the variable within the for loop.
This, however, requires to remove all uses of the list iterator after
the loop.

Based on input from Paolo Abeni [4], Vinicius Costa Gomes [5], and
Jakub Kicinski [6], I've splitted all the net-next related changes into
two patch sets, where this is part 1.a

v4->v5:
- fix reverse-xmas tree in efx_alloc_rss_context_entry() (Martin Habets)

v3->v4:
- fix build issue in efx_alloc_rss_context_entry() (Jakub Kicinski)

v2->v3:
- fix commit authors and signed-off order regarding Vladimir's patches
  (Sorry about that, wasn't intentional.)

v1->v2:
- Fixed commit message for PATCH 14/18 and used dedicated variable
  pointing to the position (Edward Cree)
- Removed redundant check in mv88e6xxx_port_vlan() (Vladimir Oltean)
- Refactor mv88e6xxx_port_vlan() using separate list iterator functions
  (Vladimir Oltean)
- Refactor sja1105_insert_gate_entry() to use separate list iterator
  functions (Vladimir Oltean)
- Allow early return in sja1105_insert_gate_entry() if
  sja1105_first_entry_longer_than() didn't find any element
  (Vladimir Oltean)
- Use list_add_tail() instead of list_add() in sja1105_insert_gate_entry()
  (Jakub Kicinski)
- net: netcp: also use separate 'pos' variable instead of duplicating list_add()

Link: https://lwn.net/Articles/887097/ [1]
Link: 
https://lore.kernel.org/linux-kernel/20220217184829.1991035-4-jakobkosc...@gmail.com/
 [2]
Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [3]
Link: 
https://lore.kernel.org/linux-kernel/7393b673c626fd75f2b4f8509faa5459254fb87c.ca...@redhat.com/
 [4]
Link: https://lore.kernel.org/linux-kernel/877d8a3sww@intel.com/ [5]
Link: https://lore.kernel.org/linux-kernel/20220403205502.1b344...@kernel.org/ 
[6]



[PATCH net-next 13/14] eth: spider: remove a copy of the NAPI_POLL_WEIGHT define

2022-04-27 Thread Jakub Kicinski
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski 
---
CC: kou.ishiz...@toshiba.co.jp
CC: ge...@infradead.org
CC: linuxppc-dev@lists.ozlabs.org
---
 drivers/net/ethernet/toshiba/spider_net.c | 2 +-
 drivers/net/ethernet/toshiba/spider_net.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/spider_net.c 
b/drivers/net/ethernet/toshiba/spider_net.c
index f47b8358669d..c09cd961edbb 100644
--- a/drivers/net/ethernet/toshiba/spider_net.c
+++ b/drivers/net/ethernet/toshiba/spider_net.c
@@ -2270,7 +2270,7 @@ spider_net_setup_netdev(struct spider_net_card *card)
timer_setup(&card->aneg_timer, spider_net_link_phy, 0);
 
netif_napi_add(netdev, &card->napi,
-  spider_net_poll, SPIDER_NET_NAPI_WEIGHT);
+  spider_net_poll, NAPI_POLL_WEIGHT);
 
spider_net_setup_netdev_ops(netdev);
 
diff --git a/drivers/net/ethernet/toshiba/spider_net.h 
b/drivers/net/ethernet/toshiba/spider_net.h
index 05b1a0736835..51948e2b3a34 100644
--- a/drivers/net/ethernet/toshiba/spider_net.h
+++ b/drivers/net/ethernet/toshiba/spider_net.h
@@ -44,7 +44,6 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_RX_CSUM_DEFAULT 1
 
 #define SPIDER_NET_WATCHDOG_TIMEOUT50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
 
 #define SPIDER_NET_FIRMWARE_SEQS   6
 #define SPIDER_NET_FIRMWARE_SEQWORDS   1024
-- 
2.34.1



Re: serial hang in qemu-system-ppc64 -M pseries

2022-04-27 Thread Thomas Huth

On 27/04/2022 17.27, Thomas Huth wrote:

On 26/04/2022 12.26, Rob Landley wrote:
When I cut and paste 80-ish characters of text into the Linux serial 
console, it
reads 16 characters and stops. When I hit space, it reads another 16 
characters,

and if I keep at it will eventually catch up without losing data. If I type,
every character shows up immediately.


That "16" certainly comes from VTERM_BUFSIZE in hw/char/spapr_vty.c in the 
QEMU sources, I think.



(On other qemu targets and kernels I can cut and paste an entire uuencoded
binary and it goes through just fine in one go, but this target hangs with 
big

pastes until I hit keys.)

Is this a qemu-side bug, or a kernel-side bug?

Kernel config attached (linux 5.18-rc3 or thereabouts), qemu invocation is:

qemu-system-ppc64 -M pseries -vga none -nographic -no-reboot -m 256 -kernel
vmlinux -initrd powerpc64leroot.cpio.gz -append "panic=1 HOST=powerpc64le
console=hvc0"


Which version of QEMU are you using? Which frontend (GTK or terminal?) ... 
this rings a distant bell, but I thought we had fixed these issues long ago 
in the past... e.g.:


https://yhbt.net/lore/all/1380113886-16845-10-git-send-email-mdr...@linux.vnet.ibm.com/ 


https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8a273cbe53221d28

... but maybe my memory also just fails and this has never been properly fixed.


In case you're using GTK, I think I just found the problem that I was 
remembering again:


https://lists.gnu.org/archive/html/qemu-ppc/2016-11/msg00142.html
https://lists.gnu.org/archive/html/qemu-ppc/2016-11/msg00143.html

I assume this has never been properly fixed.

 Thomas



Re: serial hang in qemu-system-ppc64 -M pseries

2022-04-27 Thread Thomas Huth

On 26/04/2022 12.26, Rob Landley wrote:

When I cut and paste 80-ish characters of text into the Linux serial console, it
reads 16 characters and stops. When I hit space, it reads another 16 characters,
and if I keep at it will eventually catch up without losing data. If I type,
every character shows up immediately.


That "16" certainly comes from VTERM_BUFSIZE in hw/char/spapr_vty.c in the 
QEMU sources, I think.



(On other qemu targets and kernels I can cut and paste an entire uuencoded
binary and it goes through just fine in one go, but this target hangs with big
pastes until I hit keys.)

Is this a qemu-side bug, or a kernel-side bug?

Kernel config attached (linux 5.18-rc3 or thereabouts), qemu invocation is:

qemu-system-ppc64 -M pseries -vga none -nographic -no-reboot -m 256 -kernel
vmlinux -initrd powerpc64leroot.cpio.gz -append "panic=1 HOST=powerpc64le
console=hvc0"


Which version of QEMU are you using? Which frontend (GTK or terminal?) ... 
this rings a distant bell, but I thought we had fixed these issues long ago 
in the past... e.g.:


https://yhbt.net/lore/all/1380113886-16845-10-git-send-email-mdr...@linux.vnet.ibm.com/

https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8a273cbe53221d28

... but maybe my memory also just fails and this has never been properly fixed.

 Thomas



Re: [PATCH v3 4/4] dt-bindings: fsl: convert fsl, layerscape-scfg to YAML

2022-04-27 Thread Rob Herring
On Wed, 27 Apr 2022 09:53:38 +0200, Michael Walle wrote:
> Convert the fsl,layerscape-scfg binding to the new YAML format.
> 
> In the device trees, the device node always have a "syscon"
> compatible, which wasn't mentioned in the previous binding.
> 
> Also added, compared to the original binding, is the
> interrupt-controller subnode as used in arch/arm/boot/dts/ls1021a.dtsi
> as well as the litte-endian and big-endian properties.
> 
> Signed-off-by: Michael Walle 
> Reviewed-by: Krzysztof Kozlowski 
> ---
> changes since v2:
>  - none
> 
> changes since v1:
>  - moved to soc/fsl/fsl,layerscape-scfg.yaml
>  - generic name for node in example
>  - mention added "syscon" compatible in commit message
>  - reference specific interrupt controller
> 
>  .../arm/freescale/fsl,layerscape-scfg.txt | 19 --
>  .../bindings/soc/fsl/fsl,layerscape-scfg.yaml | 58 +++
>  2 files changed, 58 insertions(+), 19 deletions(-)
>  delete mode 100644 
> Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt
>  create mode 100644 
> Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml
> 

Running 'make dtbs_check' with the schema in this patch gives the
following warnings. Consider if they are expected or the schema is
incorrect. These may not be new warnings.

Note that it is not yet a requirement to have 0 warnings for dtbs_check.
This will change in the future.

Full log is available here: https://patchwork.ozlabs.org/patch/


scfg@157: interrupt-controller@1ac:interrupt-map-mask:0:0: 15 was expected
arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb
arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb
arch/arm64/boot/dts/freescale/fsl-ls1046a-qds.dtb
arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb

scfg@157: interrupt-controller@1ac:interrupt-map-mask:0:0: 7 was expected
arch/arm/boot/dts/ls1021a-moxa-uc-8410a.dtb
arch/arm/boot/dts/ls1021a-qds.dtb
arch/arm/boot/dts/ls1021a-tsn.dtb
arch/arm/boot/dts/ls1021a-twr.dtb



Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols

2022-04-27 Thread Steven Rostedt
On Wed, 27 Apr 2022 15:01:22 +0530
"Naveen N. Rao"  wrote:

> If one or both of these weak functions are overridden in future, in the
> final vmlinux mcount table, references to these will change over to the
> non-weak variant which has its own mcount location entry. As such, there
> will now be two entries for these functions, both pointing to the same
> non-weak location.

But is that really true in all cases? x86 uses fentry these days, and other
archs do things differently too. But the original mcount (-pg) call
happened *after* the frame setup. That means the offset of the mcount call
would be at different offsets wrt the start of the function. If you have
one of these architectures that still use mcount, and the weak function
doesn't have the same size frame setup as the overriding function, then the
addresses will not be the same.

-- Steve


> We don't need the non-weak locations since they will
> never be executed. Ftrace skips the duplicate entries due to a previous
> commit.



Re: [PATCH 1/2] ftrace: Drop duplicate mcount locations

2022-04-27 Thread Steven Rostedt
On Wed, 27 Apr 2022 15:01:21 +0530
"Naveen N. Rao"  wrote:

> In the absence of section symbols [1], objtool (today) and recordmcount
> (with a subsequent patch) generate __mcount_loc relocation records with
> weak symbols as the base. This works fine as long as those weak symbols
> are not overridden, but if they are, these can result in duplicate
> entries in the final vmlinux mcount location table. This will cause
> ftrace to fail when trying to patch the same location twice. Fix this by
> dropping duplicate locations during ftrace init.
> 
> [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Signed-off-by: Naveen N. Rao 
> ---
>  kernel/trace/ftrace.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 4f1d2f5e726341..8bc4f282bb3ff4 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -6496,7 +6496,7 @@ static int ftrace_process_locs(struct module *mod,
>   struct dyn_ftrace *rec;
>   unsigned long count;
>   unsigned long *p;
> - unsigned long addr;
> + unsigned long addr, prev_addr = 0;
>   unsigned long flags = 0; /* Shut up gcc */
>   int ret = -ENOMEM;
>  
> @@ -6550,6 +6550,16 @@ static int ftrace_process_locs(struct module *mod,
>   while (p < end) {
>   unsigned long end_offset;
>   addr = ftrace_call_adjust(*p++);
> +
> + /*
> +  * Drop duplicate entries, which can happen when weak
> +  * functions are overridden, and __mcount_loc relocation
> +  * records were generated against function names due to
> +  * absence of non-weak section symbols
> +  */
> + if (addr == prev_addr)
> + addr = 0;

Please don't use the side effect of addr == 0 causing the loop to continue
for this logic. The two are not related. Simply call continue.

if (addr == prev_addr)
continue;


-- Steve


> +
>   /*
>* Some architecture linkers will pad between
>* the different mcount_loc sections of different
> @@ -6569,6 +6579,7 @@ static int ftrace_process_locs(struct module *mod,
>  
>   rec = &pg->records[pg->index++];
>   rec->ip = addr;
> + prev_addr = addr;
>   }
>  
>   /* We should have used all pages */



[PATCH 1/2] ftrace: Drop duplicate mcount locations

2022-04-27 Thread Naveen N. Rao
In the absence of section symbols [1], objtool (today) and recordmcount
(with a subsequent patch) generate __mcount_loc relocation records with
weak symbols as the base. This works fine as long as those weak symbols
are not overridden, but if they are, these can result in duplicate
entries in the final vmlinux mcount location table. This will cause
ftrace to fail when trying to patch the same location twice. Fix this by
dropping duplicate locations during ftrace init.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao 
---
 kernel/trace/ftrace.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4f1d2f5e726341..8bc4f282bb3ff4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6496,7 +6496,7 @@ static int ftrace_process_locs(struct module *mod,
struct dyn_ftrace *rec;
unsigned long count;
unsigned long *p;
-   unsigned long addr;
+   unsigned long addr, prev_addr = 0;
unsigned long flags = 0; /* Shut up gcc */
int ret = -ENOMEM;
 
@@ -6550,6 +6550,16 @@ static int ftrace_process_locs(struct module *mod,
while (p < end) {
unsigned long end_offset;
addr = ftrace_call_adjust(*p++);
+
+   /*
+* Drop duplicate entries, which can happen when weak
+* functions are overridden, and __mcount_loc relocation
+* records were generated against function names due to
+* absence of non-weak section symbols
+*/
+   if (addr == prev_addr)
+   addr = 0;
+
/*
 * Some architecture linkers will pad between
 * the different mcount_loc sections of different
@@ -6569,6 +6579,7 @@ static int ftrace_process_locs(struct module *mod,
 
rec = &pg->records[pg->index++];
rec->ip = addr;
+   prev_addr = addr;
}
 
/* We should have used all pages */
-- 
2.35.1



[PATCH 0/2] ftrace/recordmcount: Handle object files without section symbols

2022-04-27 Thread Naveen N. Rao
This solves a build issue on powerpc with binutils v2.36 and newer [1].
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought
were unused. Due to this, in certain scenarios, recordmcount is unable 
to find a non-weak symbol to generate a relocation record against.

Clang integrated assembler is also aggressive in dropping section 
symbols [3].

In the past, there have been various workarounds to address this. See 
commits 55d5b7dd6451b5 ("initramfs: fix clang build failure") and 
6e7b64b9dd6d96 ("elfcore: fix building with clang") and a recent patch:
https://lore.kernel.org/linuxppc-dev/20220425174128.11455-1-naveen.n@linux.vnet.ibm.com/T/#u

Fix this issue by using the weak symbol in the relocation record. This 
can result in duplicate locations in the mcount table if those weak 
functions are overridden, so have ftrace skip dupicate entries.

Objtool already follows this approach, so patch 2 updates recordmcount 
to do the same. Patch 1 updates ftrace to skip duplicate entries.

- Naveen


[1] https://github.com/linuxppc/issues/issues/388
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
[3] https://github.com/ClangBuiltLinux/linux/issues/981


Naveen N. Rao (2):
  ftrace: Drop duplicate mcount locations
  recordmcount: Handle sections with no non-weak symbols

 kernel/trace/ftrace.c  | 13 ++-
 scripts/recordmcount.h | 86 +++---
 2 files changed, 85 insertions(+), 14 deletions(-)


base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
-- 
2.35.1



[PATCH 2/2] recordmcount: Handle sections with no non-weak symbols

2022-04-27 Thread Naveen N. Rao
Kernel builds on powerpc are failing with the below error [1]:
  CC  kernel/kexec_file.o
Cannot find symbol for section 9: .text.unlikely.
kernel/kexec_file.o: failed

Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought
were unused.  This isn't an issue in general, but with kexec_file.c, gcc
is placing kexec_arch_apply_relocations[_add] into a separate
.text.unlikely section and the section symbol ".text.unlikely" is being
dropped. Due to this, recordmcount is unable to find a non-weak symbol
in .text.unlikely to generate a relocation record against.

Handle this by falling back to a weak symbol, similar to what objtool
does in commit 44f6a7c0755d8d ("objtool: Fix seg fault with Clang
non-section symbols"). Note that this approach can result in duplicate
addresses in the final vmlinux mcount location table. A previous commit
updated ftrace to skip such duplicate entries.

After this commit, relocation records for __mcount_loc for kexec_file.o
now include two entries with the weak functions
arch_kexec_apply_relocations() and arch_kexec_apply_relocation_add() as
the relocation bases:

  ...
  0080 R_PPC64_ADDR64.text+0x1d34
  0088 R_PPC64_ADDR64.text+0x1fec
  0090 R_PPC64_ADDR64
arch_kexec_apply_relocations_add+0x000c
  0098 R_PPC64_ADDR64
arch_kexec_apply_relocations+0x000c

Powerpc does not override these functions today, so these get converted
to correct offsets in the mcount location table in vmlinux.

If one or both of these weak functions are overridden in future, in the
final vmlinux mcount table, references to these will change over to the
non-weak variant which has its own mcount location entry. As such, there
will now be two entries for these functions, both pointing to the same
non-weak location. We don't need the non-weak locations since they will
never be executed. Ftrace skips the duplicate entries due to a previous
commit.

[1] https://github.com/linuxppc/issues/issues/388
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao 
---
 scripts/recordmcount.h | 86 +++---
 1 file changed, 73 insertions(+), 13 deletions(-)

diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
index 1e9baa5c4fc6ef..0c79a2d2b47493 100644
--- a/scripts/recordmcount.h
+++ b/scripts/recordmcount.h
@@ -25,6 +25,7 @@
 #undef sift_rel_mcount
 #undef nop_mcount
 #undef find_secsym_ndx
+#undef find_sym_ndx
 #undef __has_rel_mcount
 #undef has_rel_mcount
 #undef tot_relsize
@@ -60,6 +61,7 @@
 # define sift_rel_mcount   sift64_rel_mcount
 # define nop_mcountnop_mcount_64
 # define find_secsym_ndx   find64_secsym_ndx
+# define find_sym_ndx  find64_sym_ndx
 # define __has_rel_mcount  __has64_rel_mcount
 # define has_rel_mcounthas64_rel_mcount
 # define tot_relsize   tot64_relsize
@@ -98,6 +100,7 @@
 # define sift_rel_mcount   sift32_rel_mcount
 # define nop_mcountnop_mcount_32
 # define find_secsym_ndx   find32_secsym_ndx
+# define find_sym_ndx  find32_sym_ndx
 # define __has_rel_mcount  __has32_rel_mcount
 # define has_rel_mcounthas32_rel_mcount
 # define tot_relsize   tot32_relsize
@@ -392,6 +395,51 @@ static void get_sym_str_and_relp(Elf_Shdr const *const 
relhdr,
*relp = rel0;
 }
 
+/*
+ * Find a symbol in the given section containing the instruction offset passed
+ * in r_offset, to be used in generating the relocation record for the mcount
+ * location. This is used if there were no local/global symbols in the given
+ * section to be used as the base. Weak symbols are ok, and are expected to
+ * result in duplicate mcount locations which get dropped by ftrace.
+ */
+static int find_sym_ndx(unsigned const txtndx,
+char const *const txtname,
+uint_t *const recvalp,
+unsigned int *sym_index,
+Elf_Shdr const *const symhdr,
+Elf32_Word const *symtab,
+Elf32_Word const *symtab_shndx,
+unsigned const r_offset,
+Elf_Ehdr const *const ehdr)
+{
+   Elf_Sym const *const sym0 = (Elf_Sym const *)(_w(symhdr->sh_offset)
+   + (void *)ehdr);
+   unsigned const nsym = _w(symhdr->sh_size) / _w(symhdr->sh_entsize);
+   Elf_Sym const *symp;
+   unsigned t;
+
+   for (symp = sym0, t = nsym; t; --t, ++symp) {
+   if (txtndx == get_symindex(symp, symtab, symtab_shndx)) {
+   /* function symbols on ARM have quirks, avoid them */
+   if (w2(ehdr->e_machine) == EM_ARM &&
+   ELF_ST_TYPE(symp->st_info) == STT_FUNC)
+  

[PATCH v6 05/17] powerpc: define get_cycles macro for arch-override

2022-04-27 Thread Jason A. Donenfeld
PowerPC defines a get_cycles() function, but it forgot to do the usual
`#define get_cycles get_cycles` dance, making it impossible for generic
code to see if an arch-specific function was defined.

Cc: Thomas Gleixner 
Cc: Arnd Bergmann 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Signed-off-by: Jason A. Donenfeld 
---
 arch/powerpc/include/asm/timex.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/asm/timex.h b/arch/powerpc/include/asm/timex.h
index fa2e76e4093a..14b4489de52c 100644
--- a/arch/powerpc/include/asm/timex.h
+++ b/arch/powerpc/include/asm/timex.h
@@ -19,6 +19,7 @@ static inline cycles_t get_cycles(void)
 {
return mftb();
 }
+#define get_cycles get_cycles
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TIMEX_H */
-- 
2.35.1



[PATCH v3 0/4] dt-bindings: convert freescale extirq and scfg schemas

2022-04-27 Thread Michael Walle
The first two patches will be resend to the soc tree once the schema is
approved/picked up.

Please note, I'm still getting these weird "is too short" errors for
for interrupt-map entries, but it seems to work for you, so.. ;)

Michael Walle (4):
  ARM: dts: ls1021a: reduce the interrupt-map-mask
  arm64: dts: freescale: reduce the interrup-map-mask
  dt-bindings: interrupt-controller: fsl,ls-extirq: convert to YAML
  dt-bindings: fsl: convert fsl,layerscape-scfg to YAML

 .../arm/freescale/fsl,layerscape-scfg.txt |  19 ---
 .../interrupt-controller/fsl,ls-extirq.txt|  53 
 .../interrupt-controller/fsl,ls-extirq.yaml   | 118 ++
 .../bindings/soc/fsl/fsl,layerscape-scfg.yaml |  58 +
 arch/arm/boot/dts/ls1021a.dtsi|   2 +-
 .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi |   2 +-
 .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi |   2 +-
 .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi |   2 +-
 .../arm64/boot/dts/freescale/fsl-ls208xa.dtsi |   2 +-
 .../arm64/boot/dts/freescale/fsl-lx2160a.dtsi |   2 +-
 10 files changed, 182 insertions(+), 78 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt
 delete mode 100644 
Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml
 create mode 100644 
Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml

-- 
2.30.2



[PATCH v3 1/4] ARM: dts: ls1021a: reduce the interrupt-map-mask

2022-04-27 Thread Michael Walle
Reduce the interrupt-map-mask of the external interrupt controller to
7 to align with the devicetree schema.

Signed-off-by: Michael Walle 
---
 arch/arm/boot/dts/ls1021a.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
index 2e69d6eab4d1..5354104cae12 100644
--- a/arch/arm/boot/dts/ls1021a.dtsi
+++ b/arch/arm/boot/dts/ls1021a.dtsi
@@ -192,7 +192,7 @@ extirq: interrupt-controller@1ac {
<3 0 &gic GIC_SPI 167 
IRQ_TYPE_LEVEL_HIGH>,
<4 0 &gic GIC_SPI 168 
IRQ_TYPE_LEVEL_HIGH>,
<5 0 &gic GIC_SPI 169 
IRQ_TYPE_LEVEL_HIGH>;
-   interrupt-map-mask = <0x 0x0>;
+   interrupt-map-mask = <0x7 0x0>;
};
};
 
-- 
2.30.2



[PATCH v3 2/4] arm64: dts: freescale: reduce the interrup-map-mask

2022-04-27 Thread Michael Walle
Reduce the interrupt-map-mask of the external interrupt controller to
0xf to align with the devicetree schema.

Signed-off-by: Michael Walle 
---
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 2 +-
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 2 +-
 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 2 +-
 arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 2 +-
 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 35d1939e690b..46cc8d45ca65 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -335,7 +335,7 @@ extirq: interrupt-controller@1ac {
<9 0 &gic GIC_SPI 149 
IRQ_TYPE_LEVEL_HIGH>,
<10 0 &gic GIC_SPI 150 
IRQ_TYPE_LEVEL_HIGH>,
<11 0 &gic GIC_SPI 151 
IRQ_TYPE_LEVEL_HIGH>;
-   interrupt-map-mask = <0x 0x0>;
+   interrupt-map-mask = <0xf 0x0>;
};
};
 
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 4e7bd04d9798..3e8def8fe1b4 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -341,7 +341,7 @@ extirq: interrupt-controller@1ac {
<9 0 &gic GIC_SPI 149 
IRQ_TYPE_LEVEL_HIGH>,
<10 0 &gic GIC_SPI 150 
IRQ_TYPE_LEVEL_HIGH>,
<11 0 &gic GIC_SPI 151 
IRQ_TYPE_LEVEL_HIGH>;
-   interrupt-map-mask = <0x 0x0>;
+   interrupt-map-mask = <0xf 0x0>;
};
};
 
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
index 18e529118476..33c5ad1b9b96 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
@@ -265,7 +265,7 @@ extirq: interrupt-controller@14 {
<9 0 &gic GIC_SPI 9 
IRQ_TYPE_LEVEL_HIGH>,
<10 0 &gic GIC_SPI 10 
IRQ_TYPE_LEVEL_HIGH>,
<11 0 &gic GIC_SPI 11 
IRQ_TYPE_LEVEL_HIGH>;
-   interrupt-map-mask = <0x 0x0>;
+   interrupt-map-mask = <0xf 0x0>;
};
};
 
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
index 1282b61da8a5..3f767994b02d 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
@@ -305,7 +305,7 @@ extirq: interrupt-controller@14 {
<9 0 &gic GIC_SPI 9 
IRQ_TYPE_LEVEL_HIGH>,
<10 0 &gic GIC_SPI 10 
IRQ_TYPE_LEVEL_HIGH>,
<11 0 &gic GIC_SPI 11 
IRQ_TYPE_LEVEL_HIGH>;
-   interrupt-map-mask = <0x 0x0>;
+   interrupt-map-mask = <0xf 0x0>;
};
};
 
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
index 82bd8c0f318b..47ea854720ce 100644
--- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
@@ -698,7 +698,7 @@ extirq: interrupt-controller@14 {
<9 0 &gic GIC_SPI 9 
IRQ_TYPE_LEVEL_HIGH>,
<10 0 &gic GIC_SPI 10 
IRQ_TYPE_LEVEL_HIGH>,
<11 0 &gic GIC_SPI 11 
IRQ_TYPE_LEVEL_HIGH>;
-   interrupt-map-mask = <0x 0x0>;
+   interrupt-map-mask = <0xf 0x0>;
};
};
 
-- 
2.30.2



[PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML

2022-04-27 Thread Michael Walle
Convert the fsl,ls-extirq binding to the new YAML format.

In contrast to the original binding documentation, there are three
compatibles which are used in their corresponding device trees which
have a specific compatible and the (already documented) fallback
compatible:
 - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq"
 - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq"
 - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq"

Depending on the number of the number of the external IRQs which is
usually 12 except for the LS1021A where there are only 6, the
interrupt-map-mask was reduced from 0x to 0xf and 0x7
respectively and the number of interrupt-map entries have to
match.

Signed-off-by: Michael Walle 
---
changes since v2:
 - drop $ref to interrupt-controller.yaml
 - use a more strict interrupt-map-mask and make it conditional on SoC

changes since v1:
 - new patch

 .../interrupt-controller/fsl,ls-extirq.txt|  53 
 .../interrupt-controller/fsl,ls-extirq.yaml   | 118 ++
 2 files changed, 118 insertions(+), 53 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml

diff --git 
a/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt 
b/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt
deleted file mode 100644
index 4d47df1a5c91..
--- a/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt
+++ /dev/null
@@ -1,53 +0,0 @@
-* Freescale Layerscape external IRQs
-
-Some Layerscape SOCs (LS1021A, LS1043A, LS1046A
-LS1088A, LS208xA, LX216xA) support inverting
-the polarity of certain external interrupt lines.
-
-The device node must be a child of the node representing the
-Supplemental Configuration Unit (SCFG).
-
-Required properties:
-- compatible: should be "fsl,-extirq", e.g. "fsl,ls1021a-extirq".
-  "fsl,ls1043a-extirq": for LS1043A, LS1046A.
-  "fsl,ls1088a-extirq": for LS1088A, LS208xA, LX216xA.
-- #interrupt-cells: Must be 2. The first element is the index of the
-  external interrupt line. The second element is the trigger type.
-- #address-cells: Must be 0.
-- interrupt-controller: Identifies the node as an interrupt controller
-- reg: Specifies the Interrupt Polarity Control Register (INTPCR) in
-  the SCFG or the External Interrupt Control Register (IRQCR) in
-  the ISC.
-- interrupt-map: Specifies the mapping from external interrupts to GIC
-  interrupts.
-- interrupt-map-mask: Must be <0x 0>.
-
-Example:
-   scfg: scfg@157 {
-   compatible = "fsl,ls1021a-scfg", "syscon";
-   reg = <0x0 0x157 0x0 0x1>;
-   big-endian;
-   #address-cells = <1>;
-   #size-cells = <1>;
-   ranges = <0x0 0x0 0x157 0x1>;
-
-   extirq: interrupt-controller@1ac {
-   compatible = "fsl,ls1021a-extirq";
-   #interrupt-cells = <2>;
-   #address-cells = <0>;
-   interrupt-controller;
-   reg = <0x1ac 4>;
-   interrupt-map =
-   <0 0 &gic GIC_SPI 163 IRQ_TYPE_LEVEL_HIGH>,
-   <1 0 &gic GIC_SPI 164 IRQ_TYPE_LEVEL_HIGH>,
-   <2 0 &gic GIC_SPI 165 IRQ_TYPE_LEVEL_HIGH>,
-   <3 0 &gic GIC_SPI 167 IRQ_TYPE_LEVEL_HIGH>,
-   <4 0 &gic GIC_SPI 168 IRQ_TYPE_LEVEL_HIGH>,
-   <5 0 &gic GIC_SPI 169 IRQ_TYPE_LEVEL_HIGH>;
-   interrupt-map-mask = <0x 0x0>;
-   };
-   };
-
-
-   interrupts-extended = <&gic GIC_SPI 88 IRQ_TYPE_LEVEL_HIGH>,
- <&extirq 1 IRQ_TYPE_LEVEL_LOW>;
diff --git 
a/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml 
b/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml
new file mode 100644
index ..887e565b9573
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml
@@ -0,0 +1,118 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/fsl,ls-extirq.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Freescale Layerscape External Interrupt Controller
+
+maintainers:
+  - Shawn Guo 
+  - Li Yang 
+
+description: |
+  Some Layerscape SOCs (LS1021A, LS1043A, LS1046A LS1088A, LS208xA,
+  LX216xA) support inverting the polarity of certain external interrupt
+  lines.
+
+properties:
+  compatible:
+oneOf:
+  - enum:
+  - fsl,ls1021a-extirq
+  - fsl,ls1043a-extirq
+  - fsl,ls1088a-extirq
+  - items:
+  - enum:
+  - fsl,ls1046a-extirq
+  - const: fsl,ls1043a-extirq
+  - ite

[PATCH v3 4/4] dt-bindings: fsl: convert fsl,layerscape-scfg to YAML

2022-04-27 Thread Michael Walle
Convert the fsl,layerscape-scfg binding to the new YAML format.

In the device trees, the device node always have a "syscon"
compatible, which wasn't mentioned in the previous binding.

Also added, compared to the original binding, is the
interrupt-controller subnode as used in arch/arm/boot/dts/ls1021a.dtsi
as well as the litte-endian and big-endian properties.

Signed-off-by: Michael Walle 
Reviewed-by: Krzysztof Kozlowski 
---
changes since v2:
 - none

changes since v1:
 - moved to soc/fsl/fsl,layerscape-scfg.yaml
 - generic name for node in example
 - mention added "syscon" compatible in commit message
 - reference specific interrupt controller

 .../arm/freescale/fsl,layerscape-scfg.txt | 19 --
 .../bindings/soc/fsl/fsl,layerscape-scfg.yaml | 58 +++
 2 files changed, 58 insertions(+), 19 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt
 create mode 100644 
Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml

diff --git 
a/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt 
b/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt
deleted file mode 100644
index 0ab67b0b216d..
--- a/Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-Freescale SCFG
-
-SCFG is the supplemental configuration unit, that provides SoC specific
-configuration and status registers for the chip. Such as getting PEX port
-status.
-
-Required properties:
-  - compatible: Should contain a chip-specific compatible string,
-   Chip-specific strings are of the form "fsl,-scfg",
-   The following s are known to be supported:
-   ls1012a, ls1021a, ls1043a, ls1046a, ls2080a.
-
-  - reg: should contain base address and length of SCFG memory-mapped registers
-
-Example:
-   scfg: scfg@157 {
-   compatible = "fsl,ls1021a-scfg";
-   reg = <0x0 0x157 0x0 0x1>;
-   };
diff --git a/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml 
b/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml
new file mode 100644
index ..8d088b5fe823
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml
@@ -0,0 +1,58 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/soc/fsl/fsl,layerscape-scfg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Freescale Layerscape Supplemental Configuration Unit
+
+maintainers:
+  - Shawn Guo 
+  - Li Yang 
+
+description: |
+  SCFG is the supplemental configuration unit, that provides SoC specific
+  configuration and status registers for the chip. Such as getting PEX port
+  status.
+
+properties:
+  compatible:
+items:
+  - enum:
+  - fsl,ls1012a-scfg
+  - fsl,ls1021a-scfg
+  - fsl,ls1028a-scfg
+  - fsl,ls1043a-scfg
+  - fsl,ls1046a-scfg
+  - const: syscon
+
+  reg:
+maxItems: 1
+
+  little-endian: true
+  big-endian: true
+
+  '#address-cells':
+const: 1
+
+  '#size-cells':
+const: 1
+
+  ranges: true
+
+patternProperties:
+  "^interrupt-controller@[a-z0-9]+$":
+$ref: /schemas/interrupt-controller/fsl,ls-extirq.yaml#
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+examples:
+  - |
+syscon@157 {
+compatible = "fsl,ls1021a-scfg", "syscon";
+reg = <0x157 0x1>;
+};
-- 
2.30.2