Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path

2022-04-28 Thread Helge Deller
On 4/28/22 00:49, Guilherme G. Piccoli wrote:
> The panic notifiers' callbacks execute in an atomic context, with
> interrupts/preemption disabled, and all CPUs not running the panic
> function are off, so it's very dangerous to wait on a regular
> spinlock, there's a risk of deadlock.
>
> This patch refactors the panic notifier of parisc/power driver
> to make use of spin_trylock - for that, we've added a second
> version of the soft-power function. Also, some comments were
> reorganized and trailing white spaces, useless header inclusion
> and blank lines were removed.
>
> Cc: Helge Deller 
> Cc: "James E.J. Bottomley" 
> Signed-off-by: Guilherme G. Piccoli 

You may add:
Acked-by: Helge Deller  # parisc

Helge


> ---
>  arch/parisc/include/asm/pdc.h |  1 +
>  arch/parisc/kernel/firmware.c | 27 +++
>  drivers/parisc/power.c| 17 ++---
>  3 files changed, 34 insertions(+), 11 deletions(-)
>
> diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h
> index b643092d4b98..7a106008e258 100644
> --- a/arch/parisc/include/asm/pdc.h
> +++ b/arch/parisc/include/asm/pdc.h
> @@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap);
>  int pdc_do_reset(void);
>  int pdc_soft_power_info(unsigned long *power_reg);
>  int pdc_soft_power_button(int sw_control);
> +int pdc_soft_power_button_panic(int sw_control);
>  void pdc_io_reset(void);
>  void pdc_io_reset_devices(void);
>  int pdc_iodc_getc(void);
> diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
> index 6a7e315bcc2e..0e2f70b592f4 100644
> --- a/arch/parisc/kernel/firmware.c
> +++ b/arch/parisc/kernel/firmware.c
> @@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long 
> *power_reg)
>  }
>
>  /*
> - * pdc_soft_power_button - Control the soft power button behaviour
> - * @sw_control: 0 for hardware control, 1 for software control
> + * pdc_soft_power_button{_panic} - Control the soft power button behaviour
> + * @sw_control: 0 for hardware control, 1 for software control
>   *
>   *
>   * This PDC function places the soft power button under software or
>   * hardware control.
> - * Under software control the OS may control to when to allow to shut
> - * down the system. Under hardware control pressing the power button
> + * Under software control the OS may control to when to allow to shut
> + * down the system. Under hardware control pressing the power button
>   * powers off the system immediately.
> + *
> + * The _panic version relies in spin_trylock to prevent deadlock
> + * on panic path.
>   */
>  int pdc_soft_power_button(int sw_control)
>  {
> @@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
>   return retval;
>  }
>
> +int pdc_soft_power_button_panic(int sw_control)
> +{
> + int retval;
> + unsigned long flags;
> +
> + if (!spin_trylock_irqsave(_lock, flags)) {
> + pr_emerg("Couldn't enable soft power button\n");
> + return -EBUSY; /* ignored by the panic notifier */
> + }
> +
> + retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, 
> __pa(pdc_result), sw_control);
> + spin_unlock_irqrestore(_lock, flags);
> +
> + return retval;
> +}
> +
>  /*
>   * pdc_io_reset - Hack to avoid overlapping range registers of Bridges 
> devices.
>   * Primarily a problem on T600 (which parisc-linux doesn't support) but
> diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
> index 456776bd8ee6..8512884de2cf 100644
> --- a/drivers/parisc/power.c
> +++ b/drivers/parisc/power.c
> @@ -37,7 +37,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x)
>
>
>
> -/* parisc_panic_event() is called by the panic handler.
> - * As soon as a panic occurs, our tasklets above will not be
> - * executed any longer. This function then re-enables the
> - * soft-power switch and allows the user to switch off the system
> +/*
> + * parisc_panic_event() is called by the panic handler.
> + *
> + * As soon as a panic occurs, our tasklets above will not
> + * be executed any longer. This function then re-enables
> + * the soft-power switch and allows the user to switch off
> + * the system. We rely in pdc_soft_power_button_panic()
> + * since this version spin_trylocks (instead of regular
> + * spinlock), preventing deadlocks on panic path.
>   */
>  static int parisc_panic_event(struct notifier_block *this,
>   unsigned long event, void *ptr)
>  {
>   /* re-enable the soft-power switch */
> - pdc_soft_power_button(0);
> + pdc_soft_power_button_panic(0);
>   return NOTIFY_DONE;
>  }
>
> @@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = {
>   .priority   = INT_MAX,
>  };
>
> -
>  static int __init power_init(void)
>  {
>   unsigned long ret;



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-04-28 Thread Corey Minyard
On Wed, Apr 27, 2022 at 07:49:15PM -0300, Guilherme G. Piccoli wrote:
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
> 
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.

For the IPMI portion:

Acked-by: Corey Minyard 

Note that the IPMI panic_event() should always return, but it may take
some time, especially if the IPMI controller is no longer functional.
So the risk of a long delay is there and it makes sense to move it very
late.

-corey

> 
> Cc: Alex Elder 
> Cc: Alexander Gordeev 
> Cc: Anton Ivanov 
> Cc: Benjamin Herrenschmidt 
> Cc: Bjorn Andersson 
> Cc: Boris Ostrovsky 
> Cc: Chris Zankel 
> Cc: Christian Borntraeger 
> Cc: Corey Minyard 
> Cc: Dexuan Cui 
> Cc: "H. Peter Anvin" 
> Cc: Haiyang Zhang 
> Cc: Heiko Carstens 
> Cc: Helge Deller 
> Cc: Ivan Kokshaysky 
> Cc: "James E.J. Bottomley" 
> Cc: James Morse 
> Cc: Johannes Berg 
> Cc: Juergen Gross 
> Cc: "K. Y. Srinivasan" 
> Cc: Mathieu Poirier 
> Cc: Matt Turner 
> Cc: Mauro Carvalho Chehab 
> Cc: Max Filippov 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: Pavel Machek 
> Cc: Richard Henderson 
> Cc: Richard Weinberger 
> Cc: Robert Richter 
> Cc: Stefano Stabellini 
> Cc: Stephen Hemminger 
> Cc: Sven Schnelle 
> Cc: Tony Luck 
> Cc: Vasily Gorbik 
> Cc: Wei Liu 
> Signed-off-by: Guilherme G. Piccoli 
> ---
> 
> Notice that, with this name change, out-of-tree code that relies in the global
> exported "panic_notifier_list" will fail to build. We could easily keep the
> retro-compatibility by making the old symbol to still exist and point to the
> pre_reboot list (or even, keep the old naming).
> 
> But our design choice was to allow the breakage, making users rethink their
> notifiers, adding them in the list that fits best. If that wasn't a good
> decision, we're open to change it, of course.
> Thanks in advance for the review!
> 
>  arch/alpha/kernel/setup.c |  4 ++--
>  arch/parisc/kernel/pdc_chassis.c  |  3 +--
>  arch/powerpc/kernel/setup-common.c|  2 +-
>  arch/s390/kernel/ipl.c|  4 ++--
>  arch/um/drivers/mconsole_kern.c   |  2 +-
>  arch/um/kernel/um_arch.c  |  2 +-
>  arch/x86/xen/enlighten.c  |  2 +-
>  arch/xtensa/platforms/iss/setup.c |  4 ++--
>  drivers/char/ipmi/ipmi_msghandler.c   | 12 +++-
>  drivers/edac/altera_edac.c|  3 +--
>  drivers/hv/vmbus_drv.c|  4 ++--
>  drivers/leds/trigger/ledtrig-panic.c  |  3 +--
>  drivers/misc/ibmasm/heartbeat.c   | 16 +---
>  drivers/net/ipa/ipa_smp2p.c   |  5 ++---
>  drivers/parisc/power.c|  4 ++--
>  drivers/remoteproc/remoteproc_core.c  |  6 --
>  drivers/s390/char/con3215.c   |  2 +-
>  drivers/s390/char/con3270.c   |  2 +-
>  drivers/s390/char/sclp_con.c  |  2 +-
>  drivers/s390/char/sclp_vt220.c|  2 +-
>  drivers/staging/olpc_dcon/olpc_dcon.c |  6 --
>  drivers/video/fbdev/hyperv_fb.c   |  4 ++--
>  include/linux/panic_notifier.h|  2 +-
>  kernel/panic.c|  9 -
>  24 files changed, 54 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
> index d88bdf852753..8ace0d7113b6 100644
> --- a/arch/alpha/kernel/setup.c
> +++ b/arch/alpha/kernel/setup.c
> @@ -472,8 +472,8 @@ setup_arch(char **cmdline_p)
>   }
>  
>   /* Register a call for panic conditions. */
> - atomic_notifier_chain_register(_notifier_list,
> - _panic_block);
> + atomic_notifier_chain_register(_pre_reboot_list,
> + _panic_block);
>  
>  #ifndef alpha_using_srm
>   /* Assume that we've booted from SRM if we haven't booted from MILO.
> diff --git a/arch/parisc/kernel/pdc_chassis.c 
> b/arch/parisc/kernel/pdc_chassis.c
> index da154406d368..0fd8d87fb4f9 100644
> --- a/arch/parisc/kernel/pdc_chassis.c
> +++ b/arch/parisc/kernel/pdc_chassis.c
> @@ -22,7 +22,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -135,7 +134,7 @@ void __init 

Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-04-28 Thread Alex Elder

On 4/27/22 5:49 PM, Guilherme G. Piccoli wrote:

This patch renames the panic_notifier_list to panic_pre_reboot_list;
the idea is that a subsequent patch will refactor the panic path
in order to better split the notifiers, running some of them very
early, some of them not so early [but still before kmsg_dump()] and
finally, the rest should execute late, after kdump. The latter ones
are now in the panic pre-reboot list - the name comes from the idea
that these notifiers execute before panic() attempts rebooting the
machine (if that option is set).

We also took the opportunity to clean-up useless header inclusions,
improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
and more important, change some priorities - we hereby set 2 notifiers
to run late in the list [iss_panic_event() and the IPMI panic_event()]
due to the risks they offer (may not return, for example).
Proper documentation is going to be provided in a subsequent patch,
that effectively refactors the panic path.

Cc: Alex Elder 


For "drivers/net/ipa/ipa_smp2p.c":

Acked-by: Alex Elder 


Cc: Alexander Gordeev 
Cc: Anton Ivanov 
Cc: Benjamin Herrenschmidt 
Cc: Bjorn Andersson 
Cc: Boris Ostrovsky 
Cc: Chris Zankel 
Cc: Christian Borntraeger 
Cc: Corey Minyard 
Cc: Dexuan Cui 
Cc: "H. Peter Anvin" 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Ivan Kokshaysky 
Cc: "James E.J. Bottomley" 
Cc: James Morse 
Cc: Johannes Berg 
Cc: Juergen Gross 
Cc: "K. Y. Srinivasan" 
Cc: Mathieu Poirier 
Cc: Matt Turner 
Cc: Mauro Carvalho Chehab 
Cc: Max Filippov 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Pavel Machek 
Cc: Richard Henderson 
Cc: Richard Weinberger 
Cc: Robert Richter 
Cc: Stefano Stabellini 
Cc: Stephen Hemminger 
Cc: Sven Schnelle 
Cc: Tony Luck 
Cc: Vasily Gorbik 
Cc: Wei Liu 
Signed-off-by: Guilherme G. Piccoli 
---



. . .


Re: [PATCH v3 4/4] dt-bindings: fsl: convert fsl,layerscape-scfg to YAML

2022-04-28 Thread Rob Herring
On Wed, Apr 27, 2022 at 09:53:38AM +0200, Michael Walle wrote:
> Convert the fsl,layerscape-scfg binding to the new YAML format.
> 
> In the device trees, the device node always have a "syscon"
> compatible, which wasn't mentioned in the previous binding.
> 
> Also added, compared to the original binding, is the
> interrupt-controller subnode as used in arch/arm/boot/dts/ls1021a.dtsi
> as well as the litte-endian and big-endian properties.
> 
> Signed-off-by: Michael Walle 
> Reviewed-by: Krzysztof Kozlowski 
> ---
> changes since v2:
>  - none
> 
> changes since v1:
>  - moved to soc/fsl/fsl,layerscape-scfg.yaml
>  - generic name for node in example
>  - mention added "syscon" compatible in commit message
>  - reference specific interrupt controller
> 
>  .../arm/freescale/fsl,layerscape-scfg.txt | 19 --
>  .../bindings/soc/fsl/fsl,layerscape-scfg.yaml | 58 +++
>  2 files changed, 58 insertions(+), 19 deletions(-)
>  delete mode 100644 
> Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt
>  create mode 100644 
> Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml

Applied, thanks!


Re: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML

2022-04-28 Thread Rob Herring
On Wed, 27 Apr 2022 09:53:37 +0200, Michael Walle wrote:
> Convert the fsl,ls-extirq binding to the new YAML format.
> 
> In contrast to the original binding documentation, there are three
> compatibles which are used in their corresponding device trees which
> have a specific compatible and the (already documented) fallback
> compatible:
>  - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq"
>  - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq"
>  - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq"
> 
> Depending on the number of the number of the external IRQs which is
> usually 12 except for the LS1021A where there are only 6, the
> interrupt-map-mask was reduced from 0x to 0xf and 0x7
> respectively and the number of interrupt-map entries have to
> match.
> 
> Signed-off-by: Michael Walle 
> ---
> changes since v2:
>  - drop $ref to interrupt-controller.yaml
>  - use a more strict interrupt-map-mask and make it conditional on SoC
> 
> changes since v1:
>  - new patch
> 
>  .../interrupt-controller/fsl,ls-extirq.txt|  53 
>  .../interrupt-controller/fsl,ls-extirq.yaml   | 118 ++
>  2 files changed, 118 insertions(+), 53 deletions(-)
>  delete mode 100644 
> Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt
>  create mode 100644 
> Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml
> 

Applied, thanks!


Re: [PATCH v2 2/2] ftrace: recordmcount: Handle sections with no non-weak symbols

2022-04-28 Thread Steven Rostedt
On Thu, 28 Apr 2022 22:49:52 +0530
"Naveen N. Rao"  wrote:

> But, with ppc64 elf abi v1 which only supports the old -pg flag, mcount
> location can differ between the weak and non-weak variants of a
> function. In such scenarios, one of the two mcount entries will be
> invalid. Such architectures need to validate mcount locations by
> ensuring that the instruction(s) at those locations are as expected. On
> powerpc, this can be a simple check to ensure that the instruction is a
> 'bl'. This check can be further tightened as necessary.

I was thinking about this more, and I was thinking that we could create
another section; Perhaps __mcount_loc_weak. And place these in that
section. That way, we could check if these symbols to see if there's
already a symbol for it, and if there is, then drop it.

-- Steve


Re: [PATCH v6] PCI hotplug: rpaphp: Error out on busy status from get-sensor-state

2022-04-28 Thread Nathan Lynch
Bjorn Helgaas  writes:
> On Tue, Apr 26, 2022 at 11:07:39PM +0530, Mahesh Salgaonkar wrote:
>> +/*
>> + * RTAS call get-sensor-state(DR_ENTITY_SENSE) return values as per PAPR:
>> + *-1: Hardware Error
>> + *-2: RTAS_BUSY
>> + *-3: Invalid sensor. RTAS Parameter Error.
>> + * -9000: Need DR entity to be powered up and unisolated before RTAS call
>> + * -9001: Need DR entity to be powered up, but not unisolated, before RTAS 
>> call
>> + * -9002: DR entity unusable
>> + *  990x: Extended delay - where x is a number in the range of 0-5
>> + */
>> +#define RTAS_HARDWARE_ERROR (-1)
>> +#define RTAS_INVALID_SENSOR (-3)
>> +#define SLOT_UNISOLATED (-9000)
>> +#define SLOT_NOT_UNISOLATED (-9001)
>
> I would say "isolated" instead of "not unisolated", but I suppose this
> follows language in the spec.  If so, you should follow the spec.

"not unisolated" is the spec language.


>> +#define SLOT_NOT_USABLE (-9002)
>> +
>> +static int rtas_to_errno(int rtas_rc)
>> +{
>> +int rc;
>> +
>> +switch (rtas_rc) {
>> +case RTAS_HARDWARE_ERROR:
>> +rc = -EIO;
>> +break;
>> +case RTAS_INVALID_SENSOR:
>> +rc = -EINVAL;
>> +break;
>> +case SLOT_UNISOLATED:
>> +case SLOT_NOT_UNISOLATED:
>> +rc = -EFAULT;
>> +break;
>> +case SLOT_NOT_USABLE:
>> +rc = -ENODEV;
>> +break;
>> +case RTAS_BUSY:
>> +case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
>> +rc = -EBUSY;
>> +break;
>> +default:
>> +err("%s: unexpected RTAS error %d\n", __func__, rtas_rc);
>> +rc = -ERANGE;
>> +break;
>> +}
>> +return rc;
>
> This basically duplicates rtas_error_rc().  Why do we need two copies?

It treats RTAS_BUSY, RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX
differently, which is part of the point of this change.

Aside: rtas_error_rc() (from powerpc's rtas.c) is badly named. Its
conversions make sense for only a handful of RTAS calls. RTAS error
codes have function-specific interpretations.


[PATCH net-next v2 13/15] eth: spider: remove a copy of the NAPI_POLL_WEIGHT define

2022-04-28 Thread Jakub Kicinski
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Acked-by: Geoff Levand 
Signed-off-by: Jakub Kicinski 
---
CC: kou.ishiz...@toshiba.co.jp
CC: linuxppc-dev@lists.ozlabs.org
---
 drivers/net/ethernet/toshiba/spider_net.c | 2 +-
 drivers/net/ethernet/toshiba/spider_net.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/spider_net.c 
b/drivers/net/ethernet/toshiba/spider_net.c
index f47b8358669d..c09cd961edbb 100644
--- a/drivers/net/ethernet/toshiba/spider_net.c
+++ b/drivers/net/ethernet/toshiba/spider_net.c
@@ -2270,7 +2270,7 @@ spider_net_setup_netdev(struct spider_net_card *card)
timer_setup(>aneg_timer, spider_net_link_phy, 0);
 
netif_napi_add(netdev, >napi,
-  spider_net_poll, SPIDER_NET_NAPI_WEIGHT);
+  spider_net_poll, NAPI_POLL_WEIGHT);
 
spider_net_setup_netdev_ops(netdev);
 
diff --git a/drivers/net/ethernet/toshiba/spider_net.h 
b/drivers/net/ethernet/toshiba/spider_net.h
index 05b1a0736835..51948e2b3a34 100644
--- a/drivers/net/ethernet/toshiba/spider_net.h
+++ b/drivers/net/ethernet/toshiba/spider_net.h
@@ -44,7 +44,6 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_RX_CSUM_DEFAULT 1
 
 #define SPIDER_NET_WATCHDOG_TIMEOUT50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
 
 #define SPIDER_NET_FIRMWARE_SEQS   6
 #define SPIDER_NET_FIRMWARE_SEQWORDS   1024
-- 
2.34.1



Re: [PATCH v6] PCI hotplug: rpaphp: Error out on busy status from get-sensor-state

2022-04-28 Thread Bjorn Helgaas
On Tue, Apr 26, 2022 at 11:07:39PM +0530, Mahesh Salgaonkar wrote:
> When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> state as temporarily unavailable until recovery is complete. This also
> triggers an EEH handler in Linux which needs to notify drivers, and perform
> recovery. But before notifying the driver about the PCI error it uses
> get_adapter_state()->get-sensor-state() operation of the hotplug_slot to
> determine if the slot contains a device or not. if the slot is empty, the
  If
> recovery is skipped entirely.
> 
> However on certain PHB failures, the rtas call get-sensor-state() returns
> extended busy error (9902) until PHB is recovered by phyp. Once PHB is
> recovered, the get-sensor-state() returns success with correct presence
> status. The RTAS call interface rtas_get_sensor() loops over the rtas call
> on extended delay return code (9902) until the return value is either
> success (0) or error (-1). This causes the EEH handler to get stuck for ~6
> seconds before it could notify that the pci error has been detected and
> stop any active operations. Hence with running I/O traffic, during this 6
> seconds, the network driver continues its operation and hits a timeout
> (netdev watchdog). On timeouts, network driver go into ffdc capture mode

I assume ffdc == First Failure Data Capture (please expand and remove
the redundant "capture")  Is this a powerpc thing?  "ffdc" doesn't
occur in drivers/net, so I don't know what network driver this refers
to.

> and reset path assuming the PCI device is in fatal condition. This
> sometimes causes EEH recovery to fail. This impacts the ssh connection and
> leads to the system being inaccessible.
> 
> 
> [52732.244731] DEBUG: ibm_read_slot_reset_state2()
> [52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
> [52732.244798] DEBUG: in eeh_slot_presence_check
> [52732.244804] DEBUG: error state check
> [52732.244807] DEBUG: Is slot hotpluggable
> [52732.244810] DEBUG: hotpluggable ops ?
> [52732.244953] DEBUG: Calling ops->get_adapter_status
> [52732.244958] DEBUG: calling rpaphp_get_sensor_state
> [52736.564262] [ cut here ]
> [52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
> [52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
> [...]
> [52736.564505] NIP [c0c32368] dev_watchdog+0x438/0x440
> [52736.564513] LR [c0c32364] dev_watchdog+0x434/0x440
> 
> 
> To avoid this issue, fix the pci hotplug driver (rpaphp) to return an error
> if the slot presence state can not be detected immediately while PE is in
> EEH recovery state. Current implementation uses rtas_get_sensor() API which
> blocks the slot check state until rtas call returns success. Change
> rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state) directly
> only if the respective pe is in EEH recovery state, and take actions based
> on rtas return status.

I'm not too clear on what the problem is.  I guess you don't want the
netdev watchdog timeout.  Is the NIC still operating?  It's just the
PHB leading to the NIC that has an issue?

Apparently the remedy is to return -ENODEV (from SLOT_NOT_USABLE ==
-9002) from rpaphp_get_sensor_state() instead of doing the retries.
It would be good to explain why *that* is safe.

> In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
> invoke rtas_get_sensor() as it was earlier with no change in existing
> behavior.

Nits:
Follow historical convention in subject line.
s/phyp/pHyp/   (or whatever the normal styling is)
s/pe/PE/   (used inconsistently above and in comment)
s/rtas/RTAS/   (Michael mentioned this already, but I guess you missed some)
s/pci/PCI/
s/ffdc/First Failure Data Capture/   (or the correct expansion)
Make similar changes in the comment below.

> Signed-off-by: Mahesh Salgaonkar 
> Reviewed-by: Nathan Lynch 
> ---
> Change in v6:
> - Fixed typo's in the patch description as per review comments.
> 
> Change in v5:
> - Fixup #define macros with parentheses around the values.
> 
> Change in V4:
> - Error out on sensor busy only if pe is going through EEH recovery instead
>   of always error out.
> 
> Change in V3:
> - Invoke rtas_call(get-sensor-state) directly from
>   rpaphp_get_sensor_state() directly and do special handling.
> - See v2 at
>   https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237336.html
> 
> Change in V2:
> - Alternate approach to fix the EEH issue instead of delaying slot presence
>   check proposed at
>   https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/236956.html
> 
> Also refer:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237027.html
> ---
>  drivers/pci/hotplug/rpaphp_pci.c |  100 
> +-
>  1 file changed, 97 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpaphp_pci.c 
> 

Re: [PATCH V12 00/20] riscv: Add COMPAT mode support for 64BIT

2022-04-28 Thread Palmer Dabbelt

On Thu, 28 Apr 2022 05:25:19 PDT (-0700), guo...@kernel.org wrote:

Hi Palmer,

I see you have taken v12 into your riscv-compat branch and added
asm/signal32.h. Do you need me help put compat_sigcontext &
compat_ucontext & compat_rt_sigframe into signal32.h? And could we
rename signal32.h to compat_signal.h to match compat_signal.c?

In the end, thx for taking care of compat patch series.


No problem.  I was just trying to get something clean through all the 
autobuilders before making it look good, I think it didn't fail this 
time so I'll do a bit more refactoring.  Shouldn't be too much longer at 
this point.





On Tue, Apr 5, 2022 at 3:13 PM  wrote:


From: Guo Ren 

Currently, most 64-bit architectures (x86, parisc, powerpc, arm64,
s390, mips, sparc) have supported COMPAT mode. But they all have
history issues and can't use standard linux unistd.h. RISC-V would
be first standard __SYSCALL_COMPAT user of include/uapi/asm-generic
/unistd.h.

The patchset are based on v5.18-rc1, you can compare rv64-compat
v.s. rv32-native in qemu with following steps:

 - Prepare rv32 rootfs & fw_jump.bin by buildroot.org
   $ git clone git://git.busybox.net/buildroot
   $ cd buildroot
   $ make qemu_riscv32_virt_defconfig O=qemu_riscv32_virt_defconfig
   $ make -C qemu_riscv32_virt_defconfig
   $ make qemu_riscv64_virt_defconfig O=qemu_riscv64_virt_defconfig
   $ make -C qemu_riscv64_virt_defconfig
   (Got fw_jump.bin & rootfs.ext2 in qemu_riscvXX_virt_defconfig/images)

 - Prepare Linux rv32 & rv64 Image
   $ git clone g...@github.com:c-sky/csky-linux.git -b riscv_compat_v12 linux
   $ cd linux
   $ echo "CONFIG_STRICT_KERNEL_RWX=n" >> arch/riscv/configs/defconfig
   $ echo "CONFIG_STRICT_MODULE_RWX=n" >> arch/riscv/configs/defconfig
   $ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- 
O=../build-rv32/ rv32_defconfig
   $ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- 
O=../build-rv32/ Image
   $ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- 
O=../build-rv64/ defconfig
   $ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- 
O=../build-rv64/ Image

 - Prepare Qemu:
   $ git clone https://gitlab.com/qemu-project/qemu.git -b master linux
   $ cd qemu
   $ ./configure --target-list="riscv64-softmmu riscv32-softmmu"
   $ make

Now let's compare rv64-compat with rv32-native memory footprint with almost the 
same
defconfig, rootfs, opensbi in one qemu.

 - Run rv64 with rv32 rootfs in compat mode:
   $ ./build/qemu-system-riscv64 -cpu rv64 -M virt -m 64m -nographic -bios 
qemu_riscv64_virt_defconfig/images/fw_jump.bin -kernel build-rv64/Image -drive file 
qemu_riscv32_virt_defconfig/images/rootfs.ext2,format=raw,id=hd0 -device 
virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro console=ttyS0 
earlycon=sbi" -netdev user,id=net0 -device virtio-net-device,netdev=net0

QEMU emulator version 6.2.50 (v6.2.0-29-g196d7182c8)
OpenSBI v0.9
[0.00] Linux version 5.16.0-rc6-00017-g750f87086bdd-dirty 
(guoren@guoren-Z87-HD3) (riscv64-unknown-linux-gnu-gcc (GCC) 10.2.0, GNU ld 
(GNU Binutils) 2.37) #96 SMP Tue Dec 28 21:01:55 CST 2021
[0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8020
[0.00] Machine model: riscv-virtio,qemu
[0.00] earlycon: sbi0 at I/O port 0x0 (options '')
[0.00] printk: bootconsole [sbi0] enabled
[0.00] efi: UEFI not found.
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x8020-0x83ff]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x8020-0x83ff]
[0.00] Initmem setup node 0 [mem 0x8020-0x83ff]
[0.00] SBI specification v0.2 detected
[0.00] SBI implementation ID=0x1 Version=0x9
[0.00] SBI TIME extension detected
[0.00] SBI IPI extension detected
[0.00] SBI RFENCE extension detected
[0.00] SBI v0.2 HSM extension detected
[0.00] riscv: ISA extensions acdfhimsu
[0.00] riscv: ELF capabilities acdfim
[0.00] percpu: Embedded 17 pages/cpu s30696 r8192 d30744 u69632
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 15655
[0.00] Kernel command line: rootwait root=/dev/vda ro console=ttyS0 
earlycon=sbi
[0.00] Dentry cache hash table entries: 8192 (order: 4, 65536 bytes, 
linear)
[0.00] Inode-cache hash table entries: 4096 (order: 3, 32768 bytes, 
linear)
[0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
[0.00] Virtual kernel memory layout:
[0.00]   fixmap : 0xffcefee0 - 0xffceff00   (2048 
kB)
[0.00]   pci io : 0xffceff00 - 0xffcf   (  16 
MB)
[0.00]  vmemmap : 0xffcf - 0xffcf   (4095 
MB)
[0.00]  vmalloc : 0xffd0 - 0xffdf   (65535 
MB)
[

Any technical information for Wind River 7457 board?

2022-04-28 Thread Steven J. Hill
Below is the serial output at power on. Does anyone have any information at 
all? I know the processor is a single 7457 with Marvell/Galileo GT64260A 
host bridge. I think the board was made by Motorola or NXP. It has been 
difficult to track anything without Wind River support.


-Steve



VxWorks 653 System Boot


Copyright 1984-2006  Wind River Systems, Inc.





CPU: wrSbc7457 Power PC
Version: 1.8
BSP version: 1.3/9
Creation date: Jun  9 2006, 11:38:14


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols

2022-04-28 Thread Naveen N. Rao

Steven Rostedt wrote:

On Thu, 28 Apr 2022 13:15:22 +0530
"Naveen N. Rao"  wrote:

Indeed, plain old -pg will be a problem. I'm not sure there is a generic 
way to address this. I suppose architectures will have to validate the 
mcount locations, something like this?


Perhaps another solution is to make the mcount locations after the linking
is done. The main downside to that is that it takes time to go over the
entire vmlinux, and will slow down a compile that only modified a couple of
files.


Yes, and I think that is also very useful with LTO. So, that would be 
good to consider in the longer term.


For now, I have posted a v2 of this series with your comments addressed.  
It is working well in my tests on powerpc in the different 
configurations, including the older elf v1 abi with -pg. If it looks ok 
to you, we can go with this approach for now.



Thanks,
Naveen


Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols

2022-04-28 Thread Naveen N. Rao

Steven Rostedt wrote:

On Thu, 28 Apr 2022 13:15:22 +0530
"Naveen N. Rao"  wrote:

Indeed, plain old -pg will be a problem. I'm not sure there is a generic 
way to address this. I suppose architectures will have to validate the 
mcount locations, something like this?


Perhaps another solution is to make the mcount locations after the linking
is done. The main downside to that is that it takes time to go over the
entire vmlinux, and will slow down a compile that only modified a couple of
files.


Yes, and I think that is also very useful with LTO. So, that would be a 
good one to consider in the longer term.


For now, I have posted a v2 of this series with your comments addressed.  
It is working well in my tests on powerpc in the different 
configurations, including the older elf abi v1 that uses -pg. If it 
looks ok to you, we can use this approach for now.



Thanks,
Naveen


[PATCH v2 1/2] ftrace: Drop duplicate mcount locations

2022-04-28 Thread Naveen N. Rao
In the absence of section symbols [1], objtool (today) and recordmcount
(with a subsequent patch) generate __mcount_loc relocation records with
weak symbols as the base. This works fine as long as those weak symbols
are not overridden, but if they are, these can result in duplicate
entries in the final vmlinux mcount location table. This will cause
ftrace to fail when trying to patch the same location twice. Fix this by
dropping duplicate locations during ftrace init.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao 
---
 kernel/trace/ftrace.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4f1d2f5e726341..038610f1803987 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6496,7 +6496,7 @@ static int ftrace_process_locs(struct module *mod,
struct dyn_ftrace *rec;
unsigned long count;
unsigned long *p;
-   unsigned long addr;
+   unsigned long addr, prev_addr = 0;
unsigned long flags = 0; /* Shut up gcc */
int ret = -ENOMEM;
 
@@ -6550,6 +6550,7 @@ static int ftrace_process_locs(struct module *mod,
while (p < end) {
unsigned long end_offset;
addr = ftrace_call_adjust(*p++);
+
/*
 * Some architecture linkers will pad between
 * the different mcount_loc sections of different
@@ -6559,6 +6560,15 @@ static int ftrace_process_locs(struct module *mod,
if (!addr)
continue;
 
+   /*
+* Drop duplicate entries, which can happen when weak
+* functions are overridden, and __mcount_loc relocation
+* records were generated against function names due to
+* absence of non-weak section symbols.
+*/
+   if (addr == prev_addr)
+   continue;
+
end_offset = (pg->index+1) * sizeof(pg->records[0]);
if (end_offset > PAGE_SIZE << pg->order) {
/* We should have allocated enough */
@@ -6569,6 +6579,7 @@ static int ftrace_process_locs(struct module *mod,
 
rec = >records[pg->index++];
rec->ip = addr;
+   prev_addr = addr;
}
 
/* We should have used all pages */
-- 
2.35.1



[PATCH v2 2/2] ftrace: recordmcount: Handle sections with no non-weak symbols

2022-04-28 Thread Naveen N. Rao
Kernel builds on powerpc are failing with the below error [1]:
  CC  kernel/kexec_file.o
Cannot find symbol for section 9: .text.unlikely.
kernel/kexec_file.o: failed

Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought
were unused.  This isn't an issue in general, but with kexec_file.c, gcc
is placing kexec_arch_apply_relocations[_add] into a separate
.text.unlikely section and the section symbol ".text.unlikely" is being
dropped. Due to this, recordmcount is unable to find a non-weak symbol
in .text.unlikely to generate a relocation record against.

Handle this by falling back to a weak symbol, similar to what objtool
does in commit 44f6a7c0755d8d ("objtool: Fix seg fault with Clang
non-section symbols"). This approach however can result in duplicate
and/or invalid addresses in the final vmlinux mcount location table.

As an example, with this commit, relocation records for __mcount_loc for
kexec_file.o now include two entries with the weak functions
arch_kexec_apply_relocations() and arch_kexec_apply_relocation_add() as
the relocation bases:

  ...
  0080 R_PPC64_ADDR64.text+0x1d34
  0088 R_PPC64_ADDR64.text+0x1fec
  0090 R_PPC64_ADDR64
arch_kexec_apply_relocations_add+0x000c
  0098 R_PPC64_ADDR64
arch_kexec_apply_relocations+0x000c

Powerpc does not override these functions today, so these get converted
to correct offsets in the mcount location table in vmlinux.

If one or both of these weak functions are overridden in future, in the
final vmlinux mcount table, references to these will change over to the
non-weak variant which has its own mcount location entry. As such, there
will now be two entries for these functions.

On ppc32, mcount location is always the third instruction in a function.
On ppc64 with elf abi v2 (ppc64le), mcount location depends on whether
the function has a global entry (fourth instruction) or not (second
instruction), but this is expected to be the same across weak/non-weak
implementations of a function. As such, in both these scenarios, as well
as with other architectures where mcount location is at the same offset
into a function, the two mcount entries will point to the same address.
Ftrace skips the duplicate entries due to a previous commit.

But, with ppc64 elf abi v1 which only supports the old -pg flag, mcount
location can differ between the weak and non-weak variants of a
function. In such scenarios, one of the two mcount entries will be
invalid. Such architectures need to validate mcount locations by
ensuring that the instruction(s) at those locations are as expected. On
powerpc, this can be a simple check to ensure that the instruction is a
'bl'. This check can be further tightened as necessary.

Introduce a config option HAVE_MCOUNT_LOC_VALIDATION that architectures
can select to indicate support for validating the mcount locations
during ftrace initialization. Add a flag (-a) to recordmcount which can
then be passed to allow recordmcount to emit relocation records using
weak symbols as the base.

[1] https://github.com/linuxppc/issues/issues/388
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao 
---
 Makefile   |  4 ++
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/ftrace.h  |  8 +--
 arch/powerpc/kernel/trace/ftrace.c | 11 
 kernel/trace/Kconfig   |  6 ++
 scripts/Makefile.build |  3 +
 scripts/recordmcount.c |  6 +-
 scripts/recordmcount.h | 94 ++
 8 files changed, 113 insertions(+), 20 deletions(-)

diff --git a/Makefile b/Makefile
index 29e273d3f8ccbf..b2a9fdb49815fb 100644
--- a/Makefile
+++ b/Makefile
@@ -858,6 +858,10 @@ ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
 BUILD_C_RECORDMCOUNT := y
 export BUILD_C_RECORDMCOUNT
   endif
+  ifdef CONFIG_HAVE_MCOUNT_LOC_VALIDATION
+HAVE_MCOUNT_LOC_VALIDATION := y
+export HAVE_MCOUNT_LOC_VALIDATION
+  endif
 endif
 ifdef CONFIG_HAVE_FENTRY
   # s390-linux-gnu-gcc did not support -mfentry until gcc-9.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 174edabb74fa11..acae4085aa6d6b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -229,6 +229,7 @@ config PPC
select HAVE_KRETPROBES
select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
select HAVE_LIVEPATCH   if HAVE_DYNAMIC_FTRACE_WITH_REGS
+   select HAVE_MCOUNT_LOC_VALIDATION
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI if PERF_EVENTS || (PPC64 && 
PPC_BOOK3S)
select HAVE_OPTPROBES
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index d83758acd1c7c3..d8b104ed2fdf38 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ 

[PATCH v2 0/2] ftrace/recordmcount: Handle object files without section symbols

2022-04-28 Thread Naveen N. Rao
This is v2 of the series posted at:
http://lkml.kernel.org/r/cover.1651047542.git.naveen.n@linux.vnet.ibm.com

For v2, the first patch is slightly modified to skip the loop, rather 
than depending on addr == 0 to do so. The second patch is updated to 
make this behavior be opt-in by architectures so that they can validate 
the read mcount locations.

- Naveen


Naveen N. Rao (2):
  ftrace: Drop duplicate mcount locations
  ftrace: recordmcount: Handle sections with no non-weak symbols

 Makefile   |  4 ++
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/ftrace.h  |  8 +--
 arch/powerpc/kernel/trace/ftrace.c | 11 
 kernel/trace/Kconfig   |  6 ++
 kernel/trace/ftrace.c  | 13 -
 scripts/Makefile.build |  3 +
 scripts/recordmcount.c |  6 +-
 scripts/recordmcount.h | 94 ++
 9 files changed, 125 insertions(+), 21 deletions(-)


base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
-- 
2.35.1



[powerpc] kernel BUG at mm/mmap.c:3164! w/ltp(mmapstress03)

2022-04-28 Thread Sachin Sant
While running LTP tests (mmapstress03 specifically) against 
5.18.0-rc4-next-20220428
booted on IBM Power server mentioned BUG is encountered.

# ./mmapstress03
mmapstress030  TINFO  :  uname.machine=ppc64le kernel is 64bit
mmapstress03: errno = 12: failed to fiddle with brk at the end
mmapstress031  TFAIL  :  mmapstress03.c:212: Test failed
[   32.396145] mmap: mmapstress03 (3023): VmData 18446744073706799104 exceed 
data ulimit 18446744073709551615. Update limits or use boot option 
ignore_rlimit_data.
[   32.396192] [ cut here ]
[   32.396193] kernel BUG at mm/mmap.c:3164!
[   32.396195] Oops: Exception in kernel mode, sig: 5 [#1]
[   32.396210] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[   32.396213] Modules linked in: dm_mod mptcp_diag xsk_diag tcp_diag udp_diag 
raw_diag inet_diag unix_diag af_packet_diag netlink_diag nft_fib_inet 
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 rfkill ip_set bonding tls nf_tables nfnetlink sunrpc binfmt_misc 
pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sr_mod 
crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi 
scsi_transport_srp ibmveth xts vmx_crypto fuse
[   32.396262] CPU: 5 PID: 3023 Comm: mmapstress03 Not tainted 
5.18.0-rc4-next-20220428 #16
[   32.396267] NIP:  c04c4750 LR: c04c4730 CTR: c04bf5d0
[   32.396270] REGS: c0001abeb810 TRAP: 0700   Not tainted  
(5.18.0-rc4-next-20220428)
[   32.396274] MSR:  80029033   CR: 22002224  
XER: 
[   32.396283] CFAR: c08af740 IRQMASK: 0 
[   32.396283] GPR00: c04c4730 c0001abebab0 c2a71300 
 
[   32.396283] GPR04: c00079dcd000  0008 
 
[   32.396283] GPR08: 0008 0001  
c00079dcd040 
[   32.396283] GPR12: c00079dcd008 c0087fffa300  
 
[   32.396283] GPR16:    
 
[   32.396283] GPR20:    
c2aaae85 
[   32.396283] GPR24:   7fffaa5c1200 
c00020de3660 
[   32.396283] GPR28: 000c c00020de3600 000d 
 
[   32.396320] NIP [c04c4750] exit_mmap+0x190/0x390
[   32.396327] LR [c04c4730] exit_mmap+0x170/0x390
[   32.396332] Call Trace:
[   32.396334] [c0001abebab0] [c04c4730] exit_mmap+0x170/0x390 
(unreliable)
[   32.396340] [c0001abebbd0] [c01700f4] __mmput+0x54/0x200
[   32.396344] [c0001abebc10] [c017fe5c] exit_mm+0xfc/0x190
[   32.396348] [c0001abebc50] [c018016c] do_exit+0x27c/0x5a0
[   32.396352] [c0001abebcf0] [c018063c] do_group_exit+0x4c/0xd0
[   32.396356] [c0001abebd30] [c01806e4] sys_exit_group+0x24/0x30
[   32.396360] [c0001abebd50] [c0037084] 
system_call_exception+0x254/0x550
[   32.396364] [c0001abebe10] [c000bfe8] 
system_call_vectored_common+0xe8/0x278
[   32.396369] --- interrupt: 3000 at 0x7fffaa318d04
[   32.396374] NIP:  7fffaa318d04 LR:  CTR: 
[   32.396377] REGS: c0001abebe80 TRAP: 3000   Not tainted  
(5.18.0-rc4-next-20220428)
[   32.396380] MSR:  8280f033   CR: 
4200  XER: 
[   32.396389] IRQMASK: 0 
[   32.396389] GPR00: 00ea 7fffe43f3420 7fffaa457100 
0001 
[   32.396389] GPR04:  11a602a0 7fffaa5c1200 
 
[   32.396389] GPR08:    
 
[   32.396389] GPR12:  7fffaa5ca500  
 
[   32.396389] GPR16:    
 
[   32.396389] GPR20:    
0001 
[   32.396389] GPR24: 7fffaa450938  0001 
7fffaa4529f8 
[   32.396389] GPR28: 0001 7fffaa5c3510 f000 
0001 
[   32.396425] NIP [7fffaa318d04] 0x7fffaa318d04
[   32.396427] LR [] 0x0
[   32.396429] --- interrupt: 3000
[   32.396431] Instruction dump:
[   32.396433] 6000 3880 38610020 483eff5d 6000 7c7f1b79 4082ffb8 
813d0058 
[   32.396439] 7d29f278 7d290034 5529d97e 69290001 <0b09> 6000 7fa3eb78 
483e328d 
[   32.396447] ---[ end trace  ]---
[   32.398759] 
[   33.398760] Kernel panic - not syncing: Fatal exception

This problem was introduced with 5.18.0-rc4-next-20220427. I am unable to
complete the git bisect due to build failure related to mapletree-vs-khugepaged
issue.

Thanks
-Sachin

Re: [PATCH] KVM: PPC: Book3S HV: Initialize AMOR in nested entry

2022-04-28 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Excerpts from Fabiano Rosas's message of April 26, 2022 12:21 am:
>> The hypervisor always sets AMOR to ~0, but let's ensure we're not
>> passing stale values around.
>> 
>
> Reviewed-by: Nicholas Piggin 
>
> Looks like our L0 doesn't do anything with hvregs.amor ?

It doesn't. And if the HV ever starts clearing bits from AMOR, then we
would need to change any kernel code that writes and reads from AMR (
such as the KUAP) to take into consideration that we might read a
different value from what we wrote.


[PATCH 2/2] tools/perf/tests: Fix session topology test to skip the test in guest environment

2022-04-28 Thread Athira Rajeev
The session topology test fails in powerpc pSeries platform.
Test logs:
<<>>
Session topology : FAILED!
<<>>

This testcases tests cpu topology by checking the core_id and
socket_id stored in perf_env from perf session. The data from
perf session is compared with the cpu topology information
from "/sys/devices/system/cpu/cpuX/topology" like core_id,
physical_package_id. In case of virtual environment, detail
like physical_package_id is restricted to be exposed. Hence
physical_package_id is set to -1. The testcase fails on such
platforms since socket_id can't be fetched from topology info.

Skip the testcase in powerpc for pSeries. Use the utility
function "cpuinfo_field" to check platform from /proc/cpuinfo.

Signed-off-by: Athira Rajeev 
---
 tools/perf/tests/topology.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index ee1e3dcbc0bd..0ddcafa158db 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -109,6 +109,23 @@ static int check_cpu_topology(char *path, struct 
perf_cpu_map *map)
&& strncmp(session->header.env.arch, "aarch64", 7))
return TEST_SKIP;
 
+   /*
+* In powerpc pSeries platform, not all the topology information
+* are exposed via sysfs. Due to restriction, detail like
+* physical_package_id will be set to -1. Hence skip this
+* test for pSeries.
+*/
+   if (strncmp(session->header.env.arch, "powerpc", 7)) {
+   char *cpuinfo_platform = NULL;
+
+   cpuinfo_platform = cpuinfo_field("platform");
+   if (!strcmp(cpuinfo_platform, "pSeries")) {
+   free(cpuinfo_platform);
+   return TEST_SKIP;
+   }
+   free(cpuinfo_platform);
+   }
+
TEST_ASSERT_VAL("Session header CPU map not set", 
session->header.env.cpu);
 
for (i = 0; i < session->header.env.nr_cpus_avail; i++) {
-- 
2.35.1



[PATCH 1/2] tools/perf: Add utility function to read /proc/cpuinfo for any field

2022-04-28 Thread Athira Rajeev
/proc/cpuinfo provides information about type of processor, number
of CPU's etc. Reading /proc/cpuinfo file outputs useful information
by field name like cpu, platform, model (depending on architecture)
and its value separated by colon.

Add new utility function "cpuinfo_field" in "util/header.c" which
accepts field name as input string to search in /proc/cpuinfo content.
This returns the first matching value as resulting string. Example,
calling the function "cpuinfo_field(platform)" in powerpc returns
the platform value. This can be used to fetch processor information
from "cpuinfo" by other utilities/testcases.

Signed-off-by: Athira Rajeev 
---
 tools/perf/util/header.c | 54 
 tools/perf/util/header.h |  1 +
 2 files changed, 55 insertions(+)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index a27132e5a5ef..0c8dfd0c1e78 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -983,6 +983,60 @@ static int write_dir_format(struct feat_fd *ff,
return do_write(ff, >dir.version, sizeof(data->dir.version));
 }
 
+/*
+ * Return entry from /proc/cpuinfo
+ * indicated by "search" parameter.
+ */
+char *cpuinfo_field(const char *search)
+{
+   FILE *file;
+   char *buf = NULL;
+   char *copy_buf = NULL, *p;
+   size_t len = 0;
+   int ret = -1;
+
+   if (!search)
+   return NULL;
+
+   file = fopen("/proc/cpuinfo", "r");
+   if (!file)
+   return NULL;
+
+   while (getline(, , file) > 0) {
+   ret = strncmp(buf, search, strlen(search));
+   if (!ret)
+   break;
+   }
+
+   if (ret)
+   goto done;
+
+   /*
+* Trim the new line and separate
+* value for search field from ":"
+* in cpuinfo line output.
+* Example output line:
+* platform : 
+*/
+   copy_buf = buf;
+   p = strchr(copy_buf, ':');
+   if (p && *(p+1) == ' ' && *(p+2))
+   copy_buf = p + 2;
+   p = strchr(copy_buf, '\n');
+   if (p)
+   *p = '\0';
+
+   /* Copy the filtered string to buf */
+   strcpy(buf, copy_buf);
+
+   fclose(file);
+   return buf;
+
+done:
+   free(buf);
+   fclose(file);
+   return NULL;
+}
 /*
  * Check whether a CPU is online
  *
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 0eb4bc29a5a4..b0f754364bd4 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -166,4 +166,5 @@ int get_cpuid(char *buffer, size_t sz);
 
 char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused);
 int strcmp_cpuid_str(const char *s1, const char *s2);
+char *cpuinfo_field(const char *search);
 #endif /* __PERF_HEADER_H */
-- 
2.35.1



[PATCH 0/2] Fix session topology test for powerpc and add utility function to get cpuinfo entries

2022-04-28 Thread Athira Rajeev
The session topology test fails in powerpc pSeries platform.
Test logs:
<<>>
Session topology : FAILED!
<<>>

This test uses cpu topology information and in powerpc,
some of the topology info is restricted in environment
like virtualized platform. Hence this test needs to be
skipped in pSeries platform for powerpc. The information
about platform is available in /proc/cpuinfo.

Patch 1 adds generic utility function in "util/header.c"
to read /proc/cpuinfo for any entry. Though the testcase
fix needs value from "platform" entry, making this as a
generic function to return value for any entry from the
/proc/cpuinfo file which can be used commonly in future
usecases.

Patch 2 uses the newly added utility function to look for
platform and skip the test in pSeries platform for powerpc.

Athira Rajeev (2):
  tools/perf: Add utility function to read /proc/cpuinfo for any field
  tools/perf/tests: Fix session topology test to skip the test in guest
environment

 tools/perf/tests/topology.c | 17 
 tools/perf/util/header.c| 54 +
 tools/perf/util/header.h|  1 +
 3 files changed, 72 insertions(+)

-- 
2.35.1



Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols

2022-04-28 Thread Steven Rostedt
On Thu, 28 Apr 2022 13:15:22 +0530
"Naveen N. Rao"  wrote:

> Indeed, plain old -pg will be a problem. I'm not sure there is a generic 
> way to address this. I suppose architectures will have to validate the 
> mcount locations, something like this?

Perhaps another solution is to make the mcount locations after the linking
is done. The main downside to that is that it takes time to go over the
entire vmlinux, and will slow down a compile that only modified a couple of
files.

-- Steve


Re: [PATCH 0/3] perf tools: Tidy up symbol end fixup (v3)

2022-04-28 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 25, 2022 at 01:59:03PM -0700, Ian Rogers escreveu:
> On Fri, Apr 15, 2022 at 5:40 PM Namhyung Kim  wrote:
> >
> > Hello,
> >
> > This work is a follow-up of Ian's previous one at
> >   https://lore.kernel.org/all/20220412154817.2728324-1-irog...@google.com/
> >
> > Fixing up more symbol ends as introduced in:
> >   https://lore.kernel.org/lkml/20220317135536.805-1-mpet...@redhat.com/
> >
> > it caused perf annotate to run into memory limits - every symbol holds
> > all the disassembled code in the annotation, and so making symbols
> > ends further away dramatically increased memory usage (40MB to >1GB).
> >
> > Modify the symbol end fixup logic so that special kernel cases aren't
> > applied in the common case.
> >
> > v3 changes)
> >  * rename is_kernel to is_kallsyms
> >  * move the logic to generic function
> >  * remove arch-specific functions
> >
> > Thanks,
> > Namhyung
> 
> Thanks Namhyung! The series:
> 
> Acked-by: Ian Rogers 

Thanks, applied to perf/urgent.

- Arnaldo

 
> > Namhyung Kim (3):
> >   perf symbol: Pass is_kallsyms to symbols__fixup_end()
> >   perf symbol: Update symbols__fixup_end()
> >   perf symbol: Remove arch__symbols__fixup_end()
> >
> >  tools/perf/arch/arm64/util/machine.c   | 21 ---
> >  tools/perf/arch/powerpc/util/Build |  1 -
> >  tools/perf/arch/powerpc/util/machine.c | 25 -
> >  tools/perf/arch/s390/util/machine.c| 16 ---
> >  tools/perf/util/symbol-elf.c   |  2 +-
> >  tools/perf/util/symbol.c   | 37 +++---
> >  tools/perf/util/symbol.h   |  3 +--
> >  7 files changed, 29 insertions(+), 76 deletions(-)
> >  delete mode 100644 tools/perf/arch/powerpc/util/machine.c
> >
> >
> > base-commit: 41204da4c16071be9090940b18f566832d46becc
> > --
> > 2.36.0.rc0.470.gd361397f0d-goog
> >

-- 

- Arnaldo


[PATCH v4.19 0/2] Custom backports for powerpc SLB issues

2022-04-28 Thread Michael Ellerman
Hi Greg,

Here are two custom backports to v4.19 for some powerpc issues we've discovered.
Both were fixed upstream as part of a large non-backportable rewrite. Other 
stable
kernel versions are not affected.

cheers

Michael Ellerman (1):
  powerpc/64s: Unmerge EX_LR and EX_DAR

Nicholas Piggin (1):
  powerpc/64/interrupt: Temporarily save PPR on stack to fix register
corruption due to SLB miss

 arch/powerpc/include/asm/exception-64s.h | 37 ++--
 1 file changed, 22 insertions(+), 15 deletions(-)

-- 
2.35.1



[PATCH v4.19 2/2] powerpc/64s: Unmerge EX_LR and EX_DAR

2022-04-28 Thread Michael Ellerman
The SLB miss handler is not fully re-entrant, it is able to work because
we ensure that the SLB entries for the kernel text and data segment, as
well as the kernel stack are pinned in the SLB. Accesses to kernel data
outside of those areas has to be carefully managed and can only occur in
certain parts of the code. One way we deal with that is by storing some
values in temporary slots in the paca.

In v4.13 in commit dbeea1d6b4bd ("powerpc/64s/paca: EX_LR can be merged
with EX_DAR") we merged the storage for two temporary slots for register
storage during SLB miss handling. That was safe at the time because the
two slots were never used at the same time.

Unfortunately in v4.17 in commit c2b4d8b7417a ("powerpc/mm/hash64:
Increase the VA range") we broke that condition, and introduced a case
where the two slots could be in use at the same time, leading to one
being corrupted.

Specifically in slb_miss_common() when we detect that we're handling a
fault for a large virtual address (> 512TB) we go to the "8" label,
there we store the original fault address into paca->exslb[EX_DAR],
before jumping to large_addr_slb() (using rfid).

We then use the EXCEPTION_PROLOG_COMMON and RECONCILE_IRQ_STATE macros
to do exception setup, before reloading the fault address from
paca->exslb[EX_DAR] and storing it into pt_regs->dar (Data Address
Register).

However the code generated by those macros can cause a recursive SLB
miss on a kernel address in three places.

Firstly is the saving of the PPR (Program Priority Register), which
happens on all CPUs since Power7, the PPR is saved to the thread struct
which can be anywhere in memory. There is also the call to
accumulate_stolen_time() if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y and
CONFIG_PPC_SPLPAR=y, and also the call to trace_hardirqs_off() if
CONFIG_TRACE_IRQFLAGS=y. The latter two call into generic C code and can
lead to accesses anywhere in memory.

On modern 64-bit CPUs we have 1TB segments, so for any of those accesses
to cause an SLB fault they must access memory more than 1TB away from
the kernel text, data and kernel stack. That typically only happens on
machines with more than 1TB of RAM. However it is possible on multi-node
Power9 systems, because memory on the 2nd node begins at 32TB in the
linear mapping.

If we take a recursive SLB fault then we will corrupt the original fault
address with the LR (Link Register) value, because the EX_DAR and EX_LR
slots share storage. Subsequently we will think we're trying to fault
that LR address, which is the wrong address, and will also mostly likely
lead to a segfault because the LR address will be < 512TB and so will be
rejected by slb_miss_large_addr().

This appears as a spurious segfault to userspace, and if
show_unhandled_signals is enabled you will see a fault reported in dmesg
with the LR address, not the expected fault address, eg:

  prog[123]: segfault (11) at 128a61808 nip 128a618cc lr 128a61808 code 3 in 
prog[128a6+1]
  prog[123]: code: 4ba4 39200040 3ce4 7d2903a6 3c000200 78e707c6 
780083e4 7d3b4b78
  prog[123]: code: 7d455378 7d7d5b78 7d9f6378 7da46b78  7d3a4b78 
7d465378 7d7c5b78

Notice that the fault address == the LR, and the faulting instruction is
a simple store that should never use LR.

In upstream this was fixed in v4.20 in commit
48e7b7695745 ("powerpc/64s/hash: Convert SLB miss handlers to C"),
however that is a huge rewrite and not backportable.

The minimal fix for stable is to just unmerge the EX_LR and EX_DAR slots
again, avoiding the corruption of the DAR value. This uses an extra 8
bytes per CPU, which is negligble.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/exception-64s.h | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index f0424c6fdeca..4fdae1c182df 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -48,11 +48,12 @@
 #define EX_CCR 52
 #define EX_CFAR56
 #define EX_PPR 64
+#define EX_LR  72
 #if defined(CONFIG_RELOCATABLE)
-#define EX_CTR 72
-#define EX_SIZE10  /* size in u64 units */
+#define EX_CTR 80
+#define EX_SIZE11  /* size in u64 units */
 #else
-#define EX_SIZE9   /* size in u64 units */
+#define EX_SIZE10  /* size in u64 units */
 #endif
 
 /*
@@ -60,14 +61,6 @@
  */
 #define MAX_MCE_DEPTH  4
 
-/*
- * EX_LR is only used in EXSLB and where it does not overlap with EX_DAR
- * EX_CCR similarly with DSISR, but being 4 byte registers there is a hole
- * in the save area so it's not necessary to overlap them. Could be used
- * for future savings though if another 4 byte register was to be saved.
- */
-#define EX_LR  EX_DAR
-
 /*
  * EX_R3 is only used by the bad_stack handler. bad_stack reloads and
  * saves DAR from SPRN_DAR, and 

[PATCH v4.19 1/2] powerpc/64/interrupt: Temporarily save PPR on stack to fix register corruption due to SLB miss

2022-04-28 Thread Michael Ellerman
From: Nicholas Piggin 

This is a minimal stable kernel fix for the problem solved by
4c2de74cc869 ("powerpc/64: Interrupts save PPR on stack rather than
thread_struct").

Upstream kernels between 4.17-4.20 have this bug, so I propose this
patch for 4.19 stable.

Longer description from mpe:

In commit f384796c4 ("powerpc/mm: Add support for handling > 512TB
address in SLB miss") we added support for using multiple context ids
per process. Previously accessing past the first context id was a fatal
error for the process. With the new support it became non-fatal, and so
the previous "bad_addr_slb" handler was changed to be the
"large_addr_slb" handler.

That handler uses the EXCEPTION_PROLOG_COMMON() macro, which in-turn
calls the SAVE_PPR() macro. At the point where SAVE_PPR() is used, the
r9-13 register values from the original user fault are saved in
paca->exslb. It's not until later in EXCEPTION_PROLOG_COMMON_2() that
they are saved from paca->exslb onto the kernel stack.

The PPR is saved into current->thread.ppr, which is notably not on the
kernel stack the way pt_regs are. This means we can take an SLB miss on
current->thread.ppr. If that happens in the "large_addr_slb" case we
will clobber the saved user r9-r13 in paca->exslb with kernel values.
Later we will save those clobbered values into the pt_regs on the stack,
and when we return to userspace those kernel values will be restored.

Typically this appears as some sort of segfault in userspace, with an
address that looks like a kernel address. In dmesg it can appear as:

  [19117.440331] some_program[1869625]: unhandled signal 11 at cf6bda10 
nip 7fff780d559c lr 7fff781ae56c code 30001

The upstream fix for this issue was to move PPR into pt_regs, on the
kernel stack, avoiding the possibility of an SLB fault when saving it.

However changing the size of pt_regs is an intrusive change, and has
side effects in other parts of the kernel. A minimal fix is to
temporarily save the PPR in an unused part of pt_regs, then save the
user register values from paca->exslb into pt_regs, and then move the
saved PPR into thread.ppr.

Fixes: f384796c40dc ("powerpc/mm: Add support for handling > 512TB address in 
SLB miss")
Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20220316033235.903657-1-npig...@gmail.com
---
 arch/powerpc/include/asm/exception-64s.h | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 35fb5b11955a..f0424c6fdeca 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -243,10 +243,22 @@
  * PPR save/restore macros used in exceptions_64s.S  
  * Used for P7 or later processors
  */
-#define SAVE_PPR(area, ra, rb) \
+#define SAVE_PPR(area, ra) \
+BEGIN_FTR_SECTION_NESTED(940)  \
+   ld  ra,area+EX_PPR(r13);/* Read PPR from paca */\
+   std ra,RESULT(r1);  /* Store PPR in RESULT for now */ \
+END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940)
+
+/*
+ * This is called after we are finished accessing 'area', so we can now take
+ * SLB faults accessing the thread struct, which will use PACA_EXSLB area.
+ * This is required because the large_addr_slb handler uses EXSLB and it also
+ * uses the common exception macros including this PPR saving.
+ */
+#define MOVE_PPR_TO_THREAD(ra, rb) \
 BEGIN_FTR_SECTION_NESTED(940)  \
ld  ra,PACACURRENT(r13);\
-   ld  rb,area+EX_PPR(r13);/* Read PPR from paca */\
+   ld  rb,RESULT(r1);  /* Read PPR from stack */   \
std rb,TASKTHREADPPR(ra);   \
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940)
 
@@ -515,9 +527,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 3: EXCEPTION_PROLOG_COMMON_1();   \
beq 4f; /* if from kernel mode  */ \
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10);  \
-   SAVE_PPR(area, r9, r10);   \
+   SAVE_PPR(area, r9);\
 4: EXCEPTION_PROLOG_COMMON_2(area)\
-   EXCEPTION_PROLOG_COMMON_3(n)   \
+   beq 5f; /* if from kernel mode  */ \
+   MOVE_PPR_TO_THREAD(r9, r10);   \
+5: EXCEPTION_PROLOG_COMMON_3(n)   \
ACCOUNT_STOLEN_TIME
 
 /* Save original regs values from save area to stack 

Re: [PATCH V12 00/20] riscv: Add COMPAT mode support for 64BIT

2022-04-28 Thread Guo Ren
Hi Palmer,

I see you have taken v12 into your riscv-compat branch and added
asm/signal32.h. Do you need me help put compat_sigcontext &
compat_ucontext & compat_rt_sigframe into signal32.h? And could we
rename signal32.h to compat_signal.h to match compat_signal.c?

In the end, thx for taking care of compat patch series.


On Tue, Apr 5, 2022 at 3:13 PM  wrote:
>
> From: Guo Ren 
>
> Currently, most 64-bit architectures (x86, parisc, powerpc, arm64,
> s390, mips, sparc) have supported COMPAT mode. But they all have
> history issues and can't use standard linux unistd.h. RISC-V would
> be first standard __SYSCALL_COMPAT user of include/uapi/asm-generic
> /unistd.h.
>
> The patchset are based on v5.18-rc1, you can compare rv64-compat
> v.s. rv32-native in qemu with following steps:
>
>  - Prepare rv32 rootfs & fw_jump.bin by buildroot.org
>$ git clone git://git.busybox.net/buildroot
>$ cd buildroot
>$ make qemu_riscv32_virt_defconfig O=qemu_riscv32_virt_defconfig
>$ make -C qemu_riscv32_virt_defconfig
>$ make qemu_riscv64_virt_defconfig O=qemu_riscv64_virt_defconfig
>$ make -C qemu_riscv64_virt_defconfig
>(Got fw_jump.bin & rootfs.ext2 in qemu_riscvXX_virt_defconfig/images)
>
>  - Prepare Linux rv32 & rv64 Image
>$ git clone g...@github.com:c-sky/csky-linux.git -b riscv_compat_v12 linux
>$ cd linux
>$ echo "CONFIG_STRICT_KERNEL_RWX=n" >> arch/riscv/configs/defconfig
>$ echo "CONFIG_STRICT_MODULE_RWX=n" >> arch/riscv/configs/defconfig
>$ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- 
> O=../build-rv32/ rv32_defconfig
>$ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- 
> O=../build-rv32/ Image
>$ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- 
> O=../build-rv64/ defconfig
>$ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- 
> O=../build-rv64/ Image
>
>  - Prepare Qemu:
>$ git clone https://gitlab.com/qemu-project/qemu.git -b master linux
>$ cd qemu
>$ ./configure --target-list="riscv64-softmmu riscv32-softmmu"
>$ make
>
> Now let's compare rv64-compat with rv32-native memory footprint with almost 
> the same
> defconfig, rootfs, opensbi in one qemu.
>
>  - Run rv64 with rv32 rootfs in compat mode:
>$ ./build/qemu-system-riscv64 -cpu rv64 -M virt -m 64m -nographic -bios 
> qemu_riscv64_virt_defconfig/images/fw_jump.bin -kernel build-rv64/Image 
> -drive file qemu_riscv32_virt_defconfig/images/rootfs.ext2,format=raw,id=hd0 
> -device virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro 
> console=ttyS0 earlycon=sbi" -netdev user,id=net0 -device 
> virtio-net-device,netdev=net0
>
> QEMU emulator version 6.2.50 (v6.2.0-29-g196d7182c8)
> OpenSBI v0.9
> [0.00] Linux version 5.16.0-rc6-00017-g750f87086bdd-dirty 
> (guoren@guoren-Z87-HD3) (riscv64-unknown-linux-gnu-gcc (GCC) 10.2.0, GNU ld 
> (GNU Binutils) 2.37) #96 SMP Tue Dec 28 21:01:55 CST 2021
> [0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8020
> [0.00] Machine model: riscv-virtio,qemu
> [0.00] earlycon: sbi0 at I/O port 0x0 (options '')
> [0.00] printk: bootconsole [sbi0] enabled
> [0.00] efi: UEFI not found.
> [0.00] Zone ranges:
> [0.00]   DMA32[mem 0x8020-0x83ff]
> [0.00]   Normal   empty
> [0.00] Movable zone start for each node
> [0.00] Early memory node ranges
> [0.00]   node   0: [mem 0x8020-0x83ff]
> [0.00] Initmem setup node 0 [mem 
> 0x8020-0x83ff]
> [0.00] SBI specification v0.2 detected
> [0.00] SBI implementation ID=0x1 Version=0x9
> [0.00] SBI TIME extension detected
> [0.00] SBI IPI extension detected
> [0.00] SBI RFENCE extension detected
> [0.00] SBI v0.2 HSM extension detected
> [0.00] riscv: ISA extensions acdfhimsu
> [0.00] riscv: ELF capabilities acdfim
> [0.00] percpu: Embedded 17 pages/cpu s30696 r8192 d30744 u69632
> [0.00] Built 1 zonelists, mobility grouping on.  Total pages: 15655
> [0.00] Kernel command line: rootwait root=/dev/vda ro console=ttyS0 
> earlycon=sbi
> [0.00] Dentry cache hash table entries: 8192 (order: 4, 65536 bytes, 
> linear)
> [0.00] Inode-cache hash table entries: 4096 (order: 3, 32768 bytes, 
> linear)
> [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
> [0.00] Virtual kernel memory layout:
> [0.00]   fixmap : 0xffcefee0 - 0xffceff00   (2048 
> kB)
> [0.00]   pci io : 0xffceff00 - 0xffcf   (  16 
> MB)
> [0.00]  vmemmap : 0xffcf - 0xffcf   (4095 
> MB)
> [0.00]  vmalloc : 0xffd0 - 0xffdf   
> (65535 MB)
> [0.00]   lowmem : 0xffe0 - 0xffe003e0   (  62 
> MB)
> [0.00]   kernel : 

Re: [PATCH net-next v5 08/18] net: sparx5: Replace usage of found with dedicated list iterator variable

2022-04-28 Thread Paolo Abeni
Hello,

On Wed, 2022-04-27 at 18:06 +0200, Jakob Koschel wrote:
> To move the list iterator variable into the list_for_each_entry_*()
> macro in the future it should be avoided to use the list iterator
> variable after the loop body.
> 
> To *never* use the list iterator variable after the loop it was
> concluded to use a separate iterator variable instead of a
> found boolean [1].
> 
> This removes the need to use a found variable and simply checking if
> the variable was set, can determine if the break/goto was hit.
> 
> Link: 
> https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
>  [1]
> Signed-off-by: Jakob Koschel 
> ---
>  .../microchip/sparx5/sparx5_mactable.c| 25 +--
>  1 file changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c 
> b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
> index a5837dbe0c7e..bb8d9ce79ac2 100644
> --- a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
> +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
> @@ -362,8 +362,7 @@ static void sparx5_mact_handle_entry(struct sparx5 
> *sparx5,
>unsigned char mac[ETH_ALEN],
>u16 vid, u32 cfg2)
>  {
> - struct sparx5_mact_entry *mact_entry;
> - bool found = false;
> + struct sparx5_mact_entry *mact_entry = NULL, *iter;
>   u16 port;
>  
>   if (LRN_MAC_ACCESS_CFG_2_MAC_ENTRY_ADDR_TYPE_GET(cfg2) !=
> @@ -378,28 +377,28 @@ static void sparx5_mact_handle_entry(struct sparx5 
> *sparx5,
>   return;
>  
>   mutex_lock(>mact_lock);
> - list_for_each_entry(mact_entry, >mact_entries, list) {
> - if (mact_entry->vid == vid &&
> - ether_addr_equal(mac, mact_entry->mac)) {
> - found = true;
> - mact_entry->flags |= MAC_ENT_ALIVE;
> - if (mact_entry->port != port) {
> + list_for_each_entry(iter, >mact_entries, list) {
> + if (iter->vid == vid &&
> + ether_addr_equal(mac, iter->mac)) {

I'm sorry for the late feedback.

If you move the 'mact_entry = iter;' statement here, the diffstat will
be slightly smaller and the patch more readable, IMHO.

There is similar situation in the next patch.

Cheers,

Paolo



Re: [PATCH 20/30] panic: Add the panic informational notifier list

2022-04-28 Thread Suzuki K Poulose

On 27/04/2022 23:49, Guilherme G. Piccoli wrote:

The goal of this new panic notifier is to allow its users to
register callbacks to run earlier in the panic path than they
currently do. This aims at informational mechanisms, like dumping
kernel offsets and showing device error data (in case it's simple
registers reading, for example) as well as mechanisms to disable
log flooding (like hung_task detector / RCU warnings) and the
tracing dump_on_oops (when enabled).

Any (non-invasive) information that should be provided before
kmsg_dump() as well as log flooding preventing code should fit
here, as long it offers relatively low risk for kdump.

For now, the patch is almost a no-op, although it changes a bit
the ordering in which some panic notifiers are executed - specially
affected by this are the notifiers responsible for disabling the
hung_task detector / RCU warnings, which now run first. In a
subsequent patch, the panic path will be refactored, then the
panic informational notifiers will effectively run earlier,
before ksmg_dump() (and usually before kdump as well).

We also defer documenting it all properly in the subsequent
refactor patch. Finally, while at it, we removed some useless
header inclusions too.

Cc: Benjamin Herrenschmidt 
Cc: Catalin Marinas 
Cc: Florian Fainelli 
Cc: Frederic Weisbecker 
Cc: "H. Peter Anvin" 
Cc: Hari Bathini 
Cc: Joel Fernandes 
Cc: Jonathan Hunter 
Cc: Josh Triplett 
Cc: Lai Jiangshan 
Cc: Leo Yan 
Cc: Mathieu Desnoyers 
Cc: Mathieu Poirier 
Cc: Michael Ellerman 
Cc: Mike Leach 
Cc: Mikko Perttunen 
Cc: Neeraj Upadhyay 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Cc: Suzuki K Poulose 
Cc: Thierry Reding 
Cc: Thomas Bogendoerfer 
Signed-off-by: Guilherme G. Piccoli 
---
  arch/arm64/kernel/setup.c | 2 +-
  arch/mips/kernel/relocate.c   | 2 +-
  arch/powerpc/kernel/setup-common.c| 2 +-
  arch/x86/kernel/setup.c   | 2 +-
  drivers/bus/brcmstb_gisb.c| 2 +-
  drivers/hwtracing/coresight/coresight-cpu-debug.c | 4 ++--
  drivers/soc/tegra/ari-tegra186.c  | 3 ++-
  include/linux/panic_notifier.h| 1 +
  kernel/hung_task.c| 3 ++-
  kernel/panic.c| 4 
  kernel/rcu/tree.c | 1 -
  kernel/rcu/tree_stall.h   | 3 ++-
  kernel/trace/trace.c  | 2 +-
  13 files changed, 19 insertions(+), 12 deletions(-)



...


diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 1874df7c6a73..7b1012454525 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -535,7 +535,7 @@ static int debug_func_init(void)
_func_knob_fops);
  
  	/* Register function to be called for panic */

-   ret = atomic_notifier_chain_register(_notifier_list,
+   ret = atomic_notifier_chain_register(_info_list,
 _notifier);
if (ret) {
pr_err("%s: unable to register notifier: %d\n",
@@ -552,7 +552,7 @@ static int debug_func_init(void)
  
  static void debug_func_exit(void)

  {
-   atomic_notifier_chain_unregister(_notifier_list,
+   atomic_notifier_chain_unregister(_info_list,
 _notifier);
debugfs_remove_recursive(debug_debugfs_dir);
  }


Acked-by: Suzuki K Poulose 



Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-04-28 Thread Suzuki K Poulose

Hi Guilherme,

On 27/04/2022 23:49, Guilherme G. Piccoli wrote:

The panic notifier infrastructure executes registered callbacks when
a panic event happens - such callbacks are executed in atomic context,
with interrupts and preemption disabled in the running CPU and all other
CPUs disabled. That said, mutexes in such context are not a good idea.

This patch replaces a regular mutex with a mutex_trylock safer approach;
given the nature of the mutex used in the driver, it should be pretty
uncommon being unable to acquire such mutex in the panic path, hence
no functional change should be observed (and if it is, that would be
likely a deadlock with the regular mutex).

Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
Cc: Leo Yan 
Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Suzuki K Poulose 
Signed-off-by: Guilherme G. Piccoli 


How would you like to proceed with queuing this ? I am happy
either way. In case you plan to push this as part of this
series (I don't see any potential conflicts) :

Reviewed-by: Suzuki K Poulose 


Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols

2022-04-28 Thread Naveen N. Rao

Steven Rostedt wrote:

On Wed, 27 Apr 2022 15:01:22 +0530
"Naveen N. Rao"  wrote:


If one or both of these weak functions are overridden in future, in the
final vmlinux mcount table, references to these will change over to the
non-weak variant which has its own mcount location entry. As such, there
will now be two entries for these functions, both pointing to the same
non-weak location.


But is that really true in all cases? x86 uses fentry these days, and other
archs do things differently too. But the original mcount (-pg) call
happened *after* the frame setup. That means the offset of the mcount call
would be at different offsets wrt the start of the function. If you have
one of these architectures that still use mcount, and the weak function
doesn't have the same size frame setup as the overriding function, then the
addresses will not be the same.


Indeed, plain old -pg will be a problem. I'm not sure there is a generic 
way to address this. I suppose architectures will have to validate the 
mcount locations, something like this?


diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index d83758acd1c7c3..d8b104ed2fdf38 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -12,13 +12,7 @@

#ifndef __ASSEMBLY__
extern void _mcount(void);
-
-static inline unsigned long ftrace_call_adjust(unsigned long addr)
-{
-   /* relocation of mcount call site is the same as the address */
-   return addr;
-}
-
+unsigned long ftrace_call_adjust(unsigned long addr);
unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
   unsigned long sp);

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 4ee04aacf9f13c..976c08cd0573f7 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -858,6 +858,17 @@ void arch_ftrace_update_code(int command)
   ftrace_modify_all_code(command);
}

+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+   ppc_inst_t op = ppc_inst_read((u32 *)addr);
+
+   if (!is_bl_op(op))
+   return 0;
+
+   /* relocation of mcount call site is the same as the address */
+   return addr;
+}
+
#ifdef CONFIG_PPC64
#define PACATOC offsetof(struct paca_struct, kernel_toc)


We can tighten those checks as necessary, but it will be upto the 
architectures to validate the mcount locations. This all will have to be 
opt-in so that only architectures doing necessary validation will allow 
mcount relocations against weak symbols.



- Naveen


[PATCH kernel] KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers

2022-04-28 Thread Alexey Kardashevskiy
LoPAPR defines guest visible IOMMU with hypercalls to use it -
H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap
in the KVM in the real mode (with MMU off). The problem with the real mode
is some memory is not available and some API usage crashed the host but
enabling MMU was an expensive operation.

The problems with the real mode handlers are:
1. Occasionally these cannot complete the request so the code is
copied+modified to work in the virtual mode, very little is shared;
2. The real mode handlers have to be linked into vmlinux to work;
3. An exception in real mode immediately reboots the machine.

If the small DMA window is used, the real mode handlers bring better
performance. However since POWER8, there has always been a bigger DMA
window which VMs use to map the entire VM memory to avoid calling
H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT
(a bulk version of H_PUT_TCE) which virtual mode handler is even closer
to its real mode version.

On POWER9 hypercalls trap straight to the virtual mode so the real mode
handlers never execute on POWER9 and later CPUs.

So with the current use of the DMA windows and MMU improvements in
POWER9 and later, there is no point in duplicating the code.
The 32bit passed through devices may slow down but we do not have many
of these in practice. For example, with this applied, a 1Gbit ethernet
adapter still demostrates above 800Mbit/s of actual throughput.

This removes the real mode handlers from KVM and related code from
the powernv platform.

This changes ABI - kvmppc_h_get_tce() moves to the KVM module and
kvmppc_find_table() is static now.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kvm/Makefile |   3 -
 arch/powerpc/include/asm/iommu.h  |   6 +-
 arch/powerpc/include/asm/kvm_ppc.h|   2 -
 arch/powerpc/include/asm/mmu_context.h|   5 -
 arch/powerpc/platforms/powernv/pci.h  |   3 +-
 arch/powerpc/kernel/iommu.c   |   4 +-
 arch/powerpc/kvm/book3s_64_vio.c  |  43 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c   | 672 --
 arch/powerpc/mm/book3s64/iommu_api.c  |  68 --
 arch/powerpc/platforms/powernv/pci-ioda-tce.c |   5 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  46 +-
 arch/powerpc/platforms/pseries/iommu.c|   3 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  10 -
 13 files changed, 69 insertions(+), 801 deletions(-)
 delete mode 100644 arch/powerpc/kvm/book3s_64_vio_hv.c

diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 9bdfc8b50899..8e3681a86074 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -37,9 +37,6 @@ kvm-e500mc-objs := \
e500_emulate.o
 kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs)
 
-kvm-book3s_64-builtin-objs-$(CONFIG_SPAPR_TCE_IOMMU) := \
-   book3s_64_vio_hv.o
-
 kvm-pr-y := \
fpu.o \
emulate.o \
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index d7912b66c874..7e29c73e3dd4 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -51,13 +51,11 @@ struct iommu_table_ops {
int (*xchg_no_kill)(struct iommu_table *tbl,
long index,
unsigned long *hpa,
-   enum dma_data_direction *direction,
-   bool realmode);
+   enum dma_data_direction *direction);
 
void (*tce_kill)(struct iommu_table *tbl,
unsigned long index,
-   unsigned long pages,
-   bool realmode);
+   unsigned long pages);
 
__be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
 #endif
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 838d4cb460b7..44200a27371b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -177,8 +177,6 @@ extern void kvmppc_setup_partition_table(struct kvm *kvm);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce_64 *args);
-extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
-   struct kvm *kvm, unsigned long liobn);
 #define kvmppc_ioba_validate(stt, ioba, npages) \
(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
(stt)->size, (ioba), (npages)) ?\
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index b8527a74bd4d..3f25bd3e14eb 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -34,15 +34,10 @@ extern void mm_iommu_init(struct mm_struct *mm);
 extern void mm_iommu_cleanup(struct mm_struct *mm);
 extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct 

Re: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML

2022-04-28 Thread Krzysztof Kozlowski
On 27/04/2022 09:53, Michael Walle wrote:
> Convert the fsl,ls-extirq binding to the new YAML format.
> 
> In contrast to the original binding documentation, there are three
> compatibles which are used in their corresponding device trees which
> have a specific compatible and the (already documented) fallback
> compatible:
>  - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq"
>  - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq"
>  - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq"
> 
> Depending on the number of the number of the external IRQs which is
> usually 12 except for the LS1021A where there are only 6, the
> interrupt-map-mask was reduced from 0x to 0xf and 0x7
> respectively and the number of interrupt-map entries have to
> match.
> 
> Signed-off-by: Michael Walle 
> ---
> changes since v2:
>  - drop $ref to interrupt-controller.yaml
>  - use a more strict interrupt-map-mask and make it conditional on SoC
> 
> changes since v1:
>  - new patch
> 
>  .../interrupt-controller/fsl,ls-extirq.txt|  53 
>  .../interrupt-controller/fsl,ls-extirq.yaml   | 118 ++
>  2 files changed, 118 insertions(+), 53 deletions(-)
>  delete mode 100644 
> Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt
>  create mode 100644 
> Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml
> 


Reviewed-by: Krzysztof Kozlowski 


Best regards,
Krzysztof


Re: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML

2022-04-28 Thread Krzysztof Kozlowski
On 27/04/2022 22:08, Leo Li wrote:
>> Convert the fsl,ls-extirq binding to the new YAML format.
>>
>> In contrast to the original binding documentation, there are three
>> compatibles which are used in their corresponding device trees which have a
>> specific compatible and the (already documented) fallback
>> compatible:
>>  - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq"
>>  - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq"
>>  - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq"
>>
>> Depending on the number of the number of the external IRQs which is
>> usually 12 except for the LS1021A where there are only 6, the interrupt-map-
>> mask was reduced from 0x to 0xf and 0x7 respectively and the number
>> of interrupt-map entries have to match.
> 
> I assume this change won't prevent driver to be compatible with older device 
> trees using the 0x?  The original 0x should work for both 
> 6/12 interrupts or whatever reasonable number of interrupts that maybe used 
> in future SoCs.  So the purpose of this change is to make the binding more 
> specific to catch more errors in device tree?

Yes.

Best regards,
Krzysztof


Re: serial hang in qemu-system-ppc64 -M pseries

2022-04-28 Thread Rob Landley



On 4/28/22 00:41, Rob Landley wrote:
> On 4/27/22 10:27, Thomas Huth wrote:
>> On 26/04/2022 12.26, Rob Landley wrote:
>>> When I cut and paste 80-ish characters of text into the Linux serial 
>>> console, it
>>> reads 16 characters and stops. When I hit space, it reads another 16 
>>> characters,
>>> and if I keep at it will eventually catch up without losing data. If I type,
>>> every character shows up immediately.
>> 
>> That "16" certainly comes from VTERM_BUFSIZE in hw/char/spapr_vty.c in the 
>> QEMU sources, I think.
>> 
>>> (On other qemu targets and kernels I can cut and paste an entire uuencoded
>>> binary and it goes through just fine in one go, but this target hangs with 
>>> big
>>> pastes until I hit keys.)
>>> 
>>> Is this a qemu-side bug, or a kernel-side bug?
>>> 
>>> Kernel config attached (linux 5.18-rc3 or thereabouts), qemu invocation is:
>>> 
>>> qemu-system-ppc64 -M pseries -vga none -nographic -no-reboot -m 256 -kernel
>>> vmlinux -initrd powerpc64leroot.cpio.gz -append "panic=1 HOST=powerpc64le
>>> console=hvc0"
>> 
>> Which version of QEMU are you using?
> 
> $ qemu-system-ppc64 --version
> QEMU emulator version 6.2.92 (v6.2.0-rc2)
> Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

Just confirmed it behaves the same with current git (commit cf6f26d6f9b2).

Rob