Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path
On 4/28/22 00:49, Guilherme G. Piccoli wrote: > The panic notifiers' callbacks execute in an atomic context, with > interrupts/preemption disabled, and all CPUs not running the panic > function are off, so it's very dangerous to wait on a regular > spinlock, there's a risk of deadlock. > > This patch refactors the panic notifier of parisc/power driver > to make use of spin_trylock - for that, we've added a second > version of the soft-power function. Also, some comments were > reorganized and trailing white spaces, useless header inclusion > and blank lines were removed. > > Cc: Helge Deller > Cc: "James E.J. Bottomley" > Signed-off-by: Guilherme G. Piccoli You may add: Acked-by: Helge Deller # parisc Helge > --- > arch/parisc/include/asm/pdc.h | 1 + > arch/parisc/kernel/firmware.c | 27 +++ > drivers/parisc/power.c| 17 ++--- > 3 files changed, 34 insertions(+), 11 deletions(-) > > diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h > index b643092d4b98..7a106008e258 100644 > --- a/arch/parisc/include/asm/pdc.h > +++ b/arch/parisc/include/asm/pdc.h > @@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap); > int pdc_do_reset(void); > int pdc_soft_power_info(unsigned long *power_reg); > int pdc_soft_power_button(int sw_control); > +int pdc_soft_power_button_panic(int sw_control); > void pdc_io_reset(void); > void pdc_io_reset_devices(void); > int pdc_iodc_getc(void); > diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c > index 6a7e315bcc2e..0e2f70b592f4 100644 > --- a/arch/parisc/kernel/firmware.c > +++ b/arch/parisc/kernel/firmware.c > @@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long > *power_reg) > } > > /* > - * pdc_soft_power_button - Control the soft power button behaviour > - * @sw_control: 0 for hardware control, 1 for software control > + * pdc_soft_power_button{_panic} - Control the soft power button behaviour > + * @sw_control: 0 for hardware control, 1 for software control > * > * > * This PDC function places the soft power button under software or > * hardware control. > - * Under software control the OS may control to when to allow to shut > - * down the system. Under hardware control pressing the power button > + * Under software control the OS may control to when to allow to shut > + * down the system. Under hardware control pressing the power button > * powers off the system immediately. > + * > + * The _panic version relies in spin_trylock to prevent deadlock > + * on panic path. > */ > int pdc_soft_power_button(int sw_control) > { > @@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control) > return retval; > } > > +int pdc_soft_power_button_panic(int sw_control) > +{ > + int retval; > + unsigned long flags; > + > + if (!spin_trylock_irqsave(_lock, flags)) { > + pr_emerg("Couldn't enable soft power button\n"); > + return -EBUSY; /* ignored by the panic notifier */ > + } > + > + retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, > __pa(pdc_result), sw_control); > + spin_unlock_irqrestore(_lock, flags); > + > + return retval; > +} > + > /* > * pdc_io_reset - Hack to avoid overlapping range registers of Bridges > devices. > * Primarily a problem on T600 (which parisc-linux doesn't support) but > diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c > index 456776bd8ee6..8512884de2cf 100644 > --- a/drivers/parisc/power.c > +++ b/drivers/parisc/power.c > @@ -37,7 +37,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x) > > > > -/* parisc_panic_event() is called by the panic handler. > - * As soon as a panic occurs, our tasklets above will not be > - * executed any longer. This function then re-enables the > - * soft-power switch and allows the user to switch off the system > +/* > + * parisc_panic_event() is called by the panic handler. > + * > + * As soon as a panic occurs, our tasklets above will not > + * be executed any longer. This function then re-enables > + * the soft-power switch and allows the user to switch off > + * the system. We rely in pdc_soft_power_button_panic() > + * since this version spin_trylocks (instead of regular > + * spinlock), preventing deadlocks on panic path. > */ > static int parisc_panic_event(struct notifier_block *this, > unsigned long event, void *ptr) > { > /* re-enable the soft-power switch */ > - pdc_soft_power_button(0); > + pdc_soft_power_button_panic(0); > return NOTIFY_DONE; > } > > @@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = { > .priority = INT_MAX, > }; > > - > static int __init power_init(void) > { > unsigned long ret;
Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
On Wed, Apr 27, 2022 at 07:49:15PM -0300, Guilherme G. Piccoli wrote: > This patch renames the panic_notifier_list to panic_pre_reboot_list; > the idea is that a subsequent patch will refactor the panic path > in order to better split the notifiers, running some of them very > early, some of them not so early [but still before kmsg_dump()] and > finally, the rest should execute late, after kdump. The latter ones > are now in the panic pre-reboot list - the name comes from the idea > that these notifiers execute before panic() attempts rebooting the > machine (if that option is set). > > We also took the opportunity to clean-up useless header inclusions, > improve some notifier block declarations (e.g. in ibmasm/heartbeat.c) > and more important, change some priorities - we hereby set 2 notifiers > to run late in the list [iss_panic_event() and the IPMI panic_event()] > due to the risks they offer (may not return, for example). > Proper documentation is going to be provided in a subsequent patch, > that effectively refactors the panic path. For the IPMI portion: Acked-by: Corey Minyard Note that the IPMI panic_event() should always return, but it may take some time, especially if the IPMI controller is no longer functional. So the risk of a long delay is there and it makes sense to move it very late. -corey > > Cc: Alex Elder > Cc: Alexander Gordeev > Cc: Anton Ivanov > Cc: Benjamin Herrenschmidt > Cc: Bjorn Andersson > Cc: Boris Ostrovsky > Cc: Chris Zankel > Cc: Christian Borntraeger > Cc: Corey Minyard > Cc: Dexuan Cui > Cc: "H. Peter Anvin" > Cc: Haiyang Zhang > Cc: Heiko Carstens > Cc: Helge Deller > Cc: Ivan Kokshaysky > Cc: "James E.J. Bottomley" > Cc: James Morse > Cc: Johannes Berg > Cc: Juergen Gross > Cc: "K. Y. Srinivasan" > Cc: Mathieu Poirier > Cc: Matt Turner > Cc: Mauro Carvalho Chehab > Cc: Max Filippov > Cc: Michael Ellerman > Cc: Paul Mackerras > Cc: Pavel Machek > Cc: Richard Henderson > Cc: Richard Weinberger > Cc: Robert Richter > Cc: Stefano Stabellini > Cc: Stephen Hemminger > Cc: Sven Schnelle > Cc: Tony Luck > Cc: Vasily Gorbik > Cc: Wei Liu > Signed-off-by: Guilherme G. Piccoli > --- > > Notice that, with this name change, out-of-tree code that relies in the global > exported "panic_notifier_list" will fail to build. We could easily keep the > retro-compatibility by making the old symbol to still exist and point to the > pre_reboot list (or even, keep the old naming). > > But our design choice was to allow the breakage, making users rethink their > notifiers, adding them in the list that fits best. If that wasn't a good > decision, we're open to change it, of course. > Thanks in advance for the review! > > arch/alpha/kernel/setup.c | 4 ++-- > arch/parisc/kernel/pdc_chassis.c | 3 +-- > arch/powerpc/kernel/setup-common.c| 2 +- > arch/s390/kernel/ipl.c| 4 ++-- > arch/um/drivers/mconsole_kern.c | 2 +- > arch/um/kernel/um_arch.c | 2 +- > arch/x86/xen/enlighten.c | 2 +- > arch/xtensa/platforms/iss/setup.c | 4 ++-- > drivers/char/ipmi/ipmi_msghandler.c | 12 +++- > drivers/edac/altera_edac.c| 3 +-- > drivers/hv/vmbus_drv.c| 4 ++-- > drivers/leds/trigger/ledtrig-panic.c | 3 +-- > drivers/misc/ibmasm/heartbeat.c | 16 +--- > drivers/net/ipa/ipa_smp2p.c | 5 ++--- > drivers/parisc/power.c| 4 ++-- > drivers/remoteproc/remoteproc_core.c | 6 -- > drivers/s390/char/con3215.c | 2 +- > drivers/s390/char/con3270.c | 2 +- > drivers/s390/char/sclp_con.c | 2 +- > drivers/s390/char/sclp_vt220.c| 2 +- > drivers/staging/olpc_dcon/olpc_dcon.c | 6 -- > drivers/video/fbdev/hyperv_fb.c | 4 ++-- > include/linux/panic_notifier.h| 2 +- > kernel/panic.c| 9 - > 24 files changed, 54 insertions(+), 51 deletions(-) > > diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c > index d88bdf852753..8ace0d7113b6 100644 > --- a/arch/alpha/kernel/setup.c > +++ b/arch/alpha/kernel/setup.c > @@ -472,8 +472,8 @@ setup_arch(char **cmdline_p) > } > > /* Register a call for panic conditions. */ > - atomic_notifier_chain_register(_notifier_list, > - _panic_block); > + atomic_notifier_chain_register(_pre_reboot_list, > + _panic_block); > > #ifndef alpha_using_srm > /* Assume that we've booted from SRM if we haven't booted from MILO. > diff --git a/arch/parisc/kernel/pdc_chassis.c > b/arch/parisc/kernel/pdc_chassis.c > index da154406d368..0fd8d87fb4f9 100644 > --- a/arch/parisc/kernel/pdc_chassis.c > +++ b/arch/parisc/kernel/pdc_chassis.c > @@ -22,7 +22,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -135,7 +134,7 @@ void __init
Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
On 4/27/22 5:49 PM, Guilherme G. Piccoli wrote: This patch renames the panic_notifier_list to panic_pre_reboot_list; the idea is that a subsequent patch will refactor the panic path in order to better split the notifiers, running some of them very early, some of them not so early [but still before kmsg_dump()] and finally, the rest should execute late, after kdump. The latter ones are now in the panic pre-reboot list - the name comes from the idea that these notifiers execute before panic() attempts rebooting the machine (if that option is set). We also took the opportunity to clean-up useless header inclusions, improve some notifier block declarations (e.g. in ibmasm/heartbeat.c) and more important, change some priorities - we hereby set 2 notifiers to run late in the list [iss_panic_event() and the IPMI panic_event()] due to the risks they offer (may not return, for example). Proper documentation is going to be provided in a subsequent patch, that effectively refactors the panic path. Cc: Alex Elder For "drivers/net/ipa/ipa_smp2p.c": Acked-by: Alex Elder Cc: Alexander Gordeev Cc: Anton Ivanov Cc: Benjamin Herrenschmidt Cc: Bjorn Andersson Cc: Boris Ostrovsky Cc: Chris Zankel Cc: Christian Borntraeger Cc: Corey Minyard Cc: Dexuan Cui Cc: "H. Peter Anvin" Cc: Haiyang Zhang Cc: Heiko Carstens Cc: Helge Deller Cc: Ivan Kokshaysky Cc: "James E.J. Bottomley" Cc: James Morse Cc: Johannes Berg Cc: Juergen Gross Cc: "K. Y. Srinivasan" Cc: Mathieu Poirier Cc: Matt Turner Cc: Mauro Carvalho Chehab Cc: Max Filippov Cc: Michael Ellerman Cc: Paul Mackerras Cc: Pavel Machek Cc: Richard Henderson Cc: Richard Weinberger Cc: Robert Richter Cc: Stefano Stabellini Cc: Stephen Hemminger Cc: Sven Schnelle Cc: Tony Luck Cc: Vasily Gorbik Cc: Wei Liu Signed-off-by: Guilherme G. Piccoli --- . . .
Re: [PATCH v3 4/4] dt-bindings: fsl: convert fsl,layerscape-scfg to YAML
On Wed, Apr 27, 2022 at 09:53:38AM +0200, Michael Walle wrote: > Convert the fsl,layerscape-scfg binding to the new YAML format. > > In the device trees, the device node always have a "syscon" > compatible, which wasn't mentioned in the previous binding. > > Also added, compared to the original binding, is the > interrupt-controller subnode as used in arch/arm/boot/dts/ls1021a.dtsi > as well as the litte-endian and big-endian properties. > > Signed-off-by: Michael Walle > Reviewed-by: Krzysztof Kozlowski > --- > changes since v2: > - none > > changes since v1: > - moved to soc/fsl/fsl,layerscape-scfg.yaml > - generic name for node in example > - mention added "syscon" compatible in commit message > - reference specific interrupt controller > > .../arm/freescale/fsl,layerscape-scfg.txt | 19 -- > .../bindings/soc/fsl/fsl,layerscape-scfg.yaml | 58 +++ > 2 files changed, 58 insertions(+), 19 deletions(-) > delete mode 100644 > Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt > create mode 100644 > Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml Applied, thanks!
Re: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML
On Wed, 27 Apr 2022 09:53:37 +0200, Michael Walle wrote: > Convert the fsl,ls-extirq binding to the new YAML format. > > In contrast to the original binding documentation, there are three > compatibles which are used in their corresponding device trees which > have a specific compatible and the (already documented) fallback > compatible: > - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq" > - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq" > - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq" > > Depending on the number of the number of the external IRQs which is > usually 12 except for the LS1021A where there are only 6, the > interrupt-map-mask was reduced from 0x to 0xf and 0x7 > respectively and the number of interrupt-map entries have to > match. > > Signed-off-by: Michael Walle > --- > changes since v2: > - drop $ref to interrupt-controller.yaml > - use a more strict interrupt-map-mask and make it conditional on SoC > > changes since v1: > - new patch > > .../interrupt-controller/fsl,ls-extirq.txt| 53 > .../interrupt-controller/fsl,ls-extirq.yaml | 118 ++ > 2 files changed, 118 insertions(+), 53 deletions(-) > delete mode 100644 > Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt > create mode 100644 > Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml > Applied, thanks!
Re: [PATCH v2 2/2] ftrace: recordmcount: Handle sections with no non-weak symbols
On Thu, 28 Apr 2022 22:49:52 +0530 "Naveen N. Rao" wrote: > But, with ppc64 elf abi v1 which only supports the old -pg flag, mcount > location can differ between the weak and non-weak variants of a > function. In such scenarios, one of the two mcount entries will be > invalid. Such architectures need to validate mcount locations by > ensuring that the instruction(s) at those locations are as expected. On > powerpc, this can be a simple check to ensure that the instruction is a > 'bl'. This check can be further tightened as necessary. I was thinking about this more, and I was thinking that we could create another section; Perhaps __mcount_loc_weak. And place these in that section. That way, we could check if these symbols to see if there's already a symbol for it, and if there is, then drop it. -- Steve
Re: [PATCH v6] PCI hotplug: rpaphp: Error out on busy status from get-sensor-state
Bjorn Helgaas writes: > On Tue, Apr 26, 2022 at 11:07:39PM +0530, Mahesh Salgaonkar wrote: >> +/* >> + * RTAS call get-sensor-state(DR_ENTITY_SENSE) return values as per PAPR: >> + *-1: Hardware Error >> + *-2: RTAS_BUSY >> + *-3: Invalid sensor. RTAS Parameter Error. >> + * -9000: Need DR entity to be powered up and unisolated before RTAS call >> + * -9001: Need DR entity to be powered up, but not unisolated, before RTAS >> call >> + * -9002: DR entity unusable >> + * 990x: Extended delay - where x is a number in the range of 0-5 >> + */ >> +#define RTAS_HARDWARE_ERROR (-1) >> +#define RTAS_INVALID_SENSOR (-3) >> +#define SLOT_UNISOLATED (-9000) >> +#define SLOT_NOT_UNISOLATED (-9001) > > I would say "isolated" instead of "not unisolated", but I suppose this > follows language in the spec. If so, you should follow the spec. "not unisolated" is the spec language. >> +#define SLOT_NOT_USABLE (-9002) >> + >> +static int rtas_to_errno(int rtas_rc) >> +{ >> +int rc; >> + >> +switch (rtas_rc) { >> +case RTAS_HARDWARE_ERROR: >> +rc = -EIO; >> +break; >> +case RTAS_INVALID_SENSOR: >> +rc = -EINVAL; >> +break; >> +case SLOT_UNISOLATED: >> +case SLOT_NOT_UNISOLATED: >> +rc = -EFAULT; >> +break; >> +case SLOT_NOT_USABLE: >> +rc = -ENODEV; >> +break; >> +case RTAS_BUSY: >> +case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX: >> +rc = -EBUSY; >> +break; >> +default: >> +err("%s: unexpected RTAS error %d\n", __func__, rtas_rc); >> +rc = -ERANGE; >> +break; >> +} >> +return rc; > > This basically duplicates rtas_error_rc(). Why do we need two copies? It treats RTAS_BUSY, RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX differently, which is part of the point of this change. Aside: rtas_error_rc() (from powerpc's rtas.c) is badly named. Its conversions make sense for only a handful of RTAS calls. RTAS error codes have function-specific interpretations.
[PATCH net-next v2 13/15] eth: spider: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same values in the drivers just makes refactoring harder. Acked-by: Geoff Levand Signed-off-by: Jakub Kicinski --- CC: kou.ishiz...@toshiba.co.jp CC: linuxppc-dev@lists.ozlabs.org --- drivers/net/ethernet/toshiba/spider_net.c | 2 +- drivers/net/ethernet/toshiba/spider_net.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/toshiba/spider_net.c b/drivers/net/ethernet/toshiba/spider_net.c index f47b8358669d..c09cd961edbb 100644 --- a/drivers/net/ethernet/toshiba/spider_net.c +++ b/drivers/net/ethernet/toshiba/spider_net.c @@ -2270,7 +2270,7 @@ spider_net_setup_netdev(struct spider_net_card *card) timer_setup(>aneg_timer, spider_net_link_phy, 0); netif_napi_add(netdev, >napi, - spider_net_poll, SPIDER_NET_NAPI_WEIGHT); + spider_net_poll, NAPI_POLL_WEIGHT); spider_net_setup_netdev_ops(netdev); diff --git a/drivers/net/ethernet/toshiba/spider_net.h b/drivers/net/ethernet/toshiba/spider_net.h index 05b1a0736835..51948e2b3a34 100644 --- a/drivers/net/ethernet/toshiba/spider_net.h +++ b/drivers/net/ethernet/toshiba/spider_net.h @@ -44,7 +44,6 @@ extern char spider_net_driver_name[]; #define SPIDER_NET_RX_CSUM_DEFAULT 1 #define SPIDER_NET_WATCHDOG_TIMEOUT50*HZ -#define SPIDER_NET_NAPI_WEIGHT 64 #define SPIDER_NET_FIRMWARE_SEQS 6 #define SPIDER_NET_FIRMWARE_SEQWORDS 1024 -- 2.34.1
Re: [PATCH v6] PCI hotplug: rpaphp: Error out on busy status from get-sensor-state
On Tue, Apr 26, 2022 at 11:07:39PM +0530, Mahesh Salgaonkar wrote: > When certain PHB HW failure causes phyp to recover PHB, it marks the PE > state as temporarily unavailable until recovery is complete. This also > triggers an EEH handler in Linux which needs to notify drivers, and perform > recovery. But before notifying the driver about the PCI error it uses > get_adapter_state()->get-sensor-state() operation of the hotplug_slot to > determine if the slot contains a device or not. if the slot is empty, the If > recovery is skipped entirely. > > However on certain PHB failures, the rtas call get-sensor-state() returns > extended busy error (9902) until PHB is recovered by phyp. Once PHB is > recovered, the get-sensor-state() returns success with correct presence > status. The RTAS call interface rtas_get_sensor() loops over the rtas call > on extended delay return code (9902) until the return value is either > success (0) or error (-1). This causes the EEH handler to get stuck for ~6 > seconds before it could notify that the pci error has been detected and > stop any active operations. Hence with running I/O traffic, during this 6 > seconds, the network driver continues its operation and hits a timeout > (netdev watchdog). On timeouts, network driver go into ffdc capture mode I assume ffdc == First Failure Data Capture (please expand and remove the redundant "capture") Is this a powerpc thing? "ffdc" doesn't occur in drivers/net, so I don't know what network driver this refers to. > and reset path assuming the PCI device is in fatal condition. This > sometimes causes EEH recovery to fail. This impacts the ssh connection and > leads to the system being inaccessible. > > > [52732.244731] DEBUG: ibm_read_slot_reset_state2() > [52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=> > [52732.244798] DEBUG: in eeh_slot_presence_check > [52732.244804] DEBUG: error state check > [52732.244807] DEBUG: Is slot hotpluggable > [52732.244810] DEBUG: hotpluggable ops ? > [52732.244953] DEBUG: Calling ops->get_adapter_status > [52732.244958] DEBUG: calling rpaphp_get_sensor_state > [52736.564262] [ cut here ] > [52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o> > [52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev> > [...] > [52736.564505] NIP [c0c32368] dev_watchdog+0x438/0x440 > [52736.564513] LR [c0c32364] dev_watchdog+0x434/0x440 > > > To avoid this issue, fix the pci hotplug driver (rpaphp) to return an error > if the slot presence state can not be detected immediately while PE is in > EEH recovery state. Current implementation uses rtas_get_sensor() API which > blocks the slot check state until rtas call returns success. Change > rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state) directly > only if the respective pe is in EEH recovery state, and take actions based > on rtas return status. I'm not too clear on what the problem is. I guess you don't want the netdev watchdog timeout. Is the NIC still operating? It's just the PHB leading to the NIC that has an issue? Apparently the remedy is to return -ENODEV (from SLOT_NOT_USABLE == -9002) from rpaphp_get_sensor_state() instead of doing the retries. It would be good to explain why *that* is safe. > In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to > invoke rtas_get_sensor() as it was earlier with no change in existing > behavior. Nits: Follow historical convention in subject line. s/phyp/pHyp/ (or whatever the normal styling is) s/pe/PE/ (used inconsistently above and in comment) s/rtas/RTAS/ (Michael mentioned this already, but I guess you missed some) s/pci/PCI/ s/ffdc/First Failure Data Capture/ (or the correct expansion) Make similar changes in the comment below. > Signed-off-by: Mahesh Salgaonkar > Reviewed-by: Nathan Lynch > --- > Change in v6: > - Fixed typo's in the patch description as per review comments. > > Change in v5: > - Fixup #define macros with parentheses around the values. > > Change in V4: > - Error out on sensor busy only if pe is going through EEH recovery instead > of always error out. > > Change in V3: > - Invoke rtas_call(get-sensor-state) directly from > rpaphp_get_sensor_state() directly and do special handling. > - See v2 at > https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237336.html > > Change in V2: > - Alternate approach to fix the EEH issue instead of delaying slot presence > check proposed at > https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/236956.html > > Also refer: > https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237027.html > --- > drivers/pci/hotplug/rpaphp_pci.c | 100 > +- > 1 file changed, 97 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/hotplug/rpaphp_pci.c >
Re: [PATCH V12 00/20] riscv: Add COMPAT mode support for 64BIT
On Thu, 28 Apr 2022 05:25:19 PDT (-0700), guo...@kernel.org wrote: Hi Palmer, I see you have taken v12 into your riscv-compat branch and added asm/signal32.h. Do you need me help put compat_sigcontext & compat_ucontext & compat_rt_sigframe into signal32.h? And could we rename signal32.h to compat_signal.h to match compat_signal.c? In the end, thx for taking care of compat patch series. No problem. I was just trying to get something clean through all the autobuilders before making it look good, I think it didn't fail this time so I'll do a bit more refactoring. Shouldn't be too much longer at this point. On Tue, Apr 5, 2022 at 3:13 PM wrote: From: Guo Ren Currently, most 64-bit architectures (x86, parisc, powerpc, arm64, s390, mips, sparc) have supported COMPAT mode. But they all have history issues and can't use standard linux unistd.h. RISC-V would be first standard __SYSCALL_COMPAT user of include/uapi/asm-generic /unistd.h. The patchset are based on v5.18-rc1, you can compare rv64-compat v.s. rv32-native in qemu with following steps: - Prepare rv32 rootfs & fw_jump.bin by buildroot.org $ git clone git://git.busybox.net/buildroot $ cd buildroot $ make qemu_riscv32_virt_defconfig O=qemu_riscv32_virt_defconfig $ make -C qemu_riscv32_virt_defconfig $ make qemu_riscv64_virt_defconfig O=qemu_riscv64_virt_defconfig $ make -C qemu_riscv64_virt_defconfig (Got fw_jump.bin & rootfs.ext2 in qemu_riscvXX_virt_defconfig/images) - Prepare Linux rv32 & rv64 Image $ git clone g...@github.com:c-sky/csky-linux.git -b riscv_compat_v12 linux $ cd linux $ echo "CONFIG_STRICT_KERNEL_RWX=n" >> arch/riscv/configs/defconfig $ echo "CONFIG_STRICT_MODULE_RWX=n" >> arch/riscv/configs/defconfig $ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- O=../build-rv32/ rv32_defconfig $ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- O=../build-rv32/ Image $ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- O=../build-rv64/ defconfig $ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- O=../build-rv64/ Image - Prepare Qemu: $ git clone https://gitlab.com/qemu-project/qemu.git -b master linux $ cd qemu $ ./configure --target-list="riscv64-softmmu riscv32-softmmu" $ make Now let's compare rv64-compat with rv32-native memory footprint with almost the same defconfig, rootfs, opensbi in one qemu. - Run rv64 with rv32 rootfs in compat mode: $ ./build/qemu-system-riscv64 -cpu rv64 -M virt -m 64m -nographic -bios qemu_riscv64_virt_defconfig/images/fw_jump.bin -kernel build-rv64/Image -drive file qemu_riscv32_virt_defconfig/images/rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi" -netdev user,id=net0 -device virtio-net-device,netdev=net0 QEMU emulator version 6.2.50 (v6.2.0-29-g196d7182c8) OpenSBI v0.9 [0.00] Linux version 5.16.0-rc6-00017-g750f87086bdd-dirty (guoren@guoren-Z87-HD3) (riscv64-unknown-linux-gnu-gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.37) #96 SMP Tue Dec 28 21:01:55 CST 2021 [0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8020 [0.00] Machine model: riscv-virtio,qemu [0.00] earlycon: sbi0 at I/O port 0x0 (options '') [0.00] printk: bootconsole [sbi0] enabled [0.00] efi: UEFI not found. [0.00] Zone ranges: [0.00] DMA32[mem 0x8020-0x83ff] [0.00] Normal empty [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x8020-0x83ff] [0.00] Initmem setup node 0 [mem 0x8020-0x83ff] [0.00] SBI specification v0.2 detected [0.00] SBI implementation ID=0x1 Version=0x9 [0.00] SBI TIME extension detected [0.00] SBI IPI extension detected [0.00] SBI RFENCE extension detected [0.00] SBI v0.2 HSM extension detected [0.00] riscv: ISA extensions acdfhimsu [0.00] riscv: ELF capabilities acdfim [0.00] percpu: Embedded 17 pages/cpu s30696 r8192 d30744 u69632 [0.00] Built 1 zonelists, mobility grouping on. Total pages: 15655 [0.00] Kernel command line: rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi [0.00] Dentry cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [0.00] Inode-cache hash table entries: 4096 (order: 3, 32768 bytes, linear) [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off [0.00] Virtual kernel memory layout: [0.00] fixmap : 0xffcefee0 - 0xffceff00 (2048 kB) [0.00] pci io : 0xffceff00 - 0xffcf ( 16 MB) [0.00] vmemmap : 0xffcf - 0xffcf (4095 MB) [0.00] vmalloc : 0xffd0 - 0xffdf (65535 MB) [
Any technical information for Wind River 7457 board?
Below is the serial output at power on. Does anyone have any information at all? I know the processor is a single 7457 with Marvell/Galileo GT64260A host bridge. I think the board was made by Motorola or NXP. It has been difficult to track anything without Wind River support. -Steve VxWorks 653 System Boot Copyright 1984-2006 Wind River Systems, Inc. CPU: wrSbc7457 Power PC Version: 1.8 BSP version: 1.3/9 Creation date: Jun 9 2006, 11:38:14 OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols
Steven Rostedt wrote: On Thu, 28 Apr 2022 13:15:22 +0530 "Naveen N. Rao" wrote: Indeed, plain old -pg will be a problem. I'm not sure there is a generic way to address this. I suppose architectures will have to validate the mcount locations, something like this? Perhaps another solution is to make the mcount locations after the linking is done. The main downside to that is that it takes time to go over the entire vmlinux, and will slow down a compile that only modified a couple of files. Yes, and I think that is also very useful with LTO. So, that would be good to consider in the longer term. For now, I have posted a v2 of this series with your comments addressed. It is working well in my tests on powerpc in the different configurations, including the older elf v1 abi with -pg. If it looks ok to you, we can go with this approach for now. Thanks, Naveen
Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols
Steven Rostedt wrote: On Thu, 28 Apr 2022 13:15:22 +0530 "Naveen N. Rao" wrote: Indeed, plain old -pg will be a problem. I'm not sure there is a generic way to address this. I suppose architectures will have to validate the mcount locations, something like this? Perhaps another solution is to make the mcount locations after the linking is done. The main downside to that is that it takes time to go over the entire vmlinux, and will slow down a compile that only modified a couple of files. Yes, and I think that is also very useful with LTO. So, that would be a good one to consider in the longer term. For now, I have posted a v2 of this series with your comments addressed. It is working well in my tests on powerpc in the different configurations, including the older elf abi v1 that uses -pg. If it looks ok to you, we can use this approach for now. Thanks, Naveen
[PATCH v2 1/2] ftrace: Drop duplicate mcount locations
In the absence of section symbols [1], objtool (today) and recordmcount (with a subsequent patch) generate __mcount_loc relocation records with weak symbols as the base. This works fine as long as those weak symbols are not overridden, but if they are, these can result in duplicate entries in the final vmlinux mcount location table. This will cause ftrace to fail when trying to patch the same location twice. Fix this by dropping duplicate locations during ftrace init. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 Signed-off-by: Naveen N. Rao --- kernel/trace/ftrace.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 4f1d2f5e726341..038610f1803987 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -6496,7 +6496,7 @@ static int ftrace_process_locs(struct module *mod, struct dyn_ftrace *rec; unsigned long count; unsigned long *p; - unsigned long addr; + unsigned long addr, prev_addr = 0; unsigned long flags = 0; /* Shut up gcc */ int ret = -ENOMEM; @@ -6550,6 +6550,7 @@ static int ftrace_process_locs(struct module *mod, while (p < end) { unsigned long end_offset; addr = ftrace_call_adjust(*p++); + /* * Some architecture linkers will pad between * the different mcount_loc sections of different @@ -6559,6 +6560,15 @@ static int ftrace_process_locs(struct module *mod, if (!addr) continue; + /* +* Drop duplicate entries, which can happen when weak +* functions are overridden, and __mcount_loc relocation +* records were generated against function names due to +* absence of non-weak section symbols. +*/ + if (addr == prev_addr) + continue; + end_offset = (pg->index+1) * sizeof(pg->records[0]); if (end_offset > PAGE_SIZE << pg->order) { /* We should have allocated enough */ @@ -6569,6 +6579,7 @@ static int ftrace_process_locs(struct module *mod, rec = >records[pg->index++]; rec->ip = addr; + prev_addr = addr; } /* We should have used all pages */ -- 2.35.1
[PATCH v2 2/2] ftrace: recordmcount: Handle sections with no non-weak symbols
Kernel builds on powerpc are failing with the below error [1]: CC kernel/kexec_file.o Cannot find symbol for section 9: .text.unlikely. kernel/kexec_file.o: failed Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [2], binutils started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Handle this by falling back to a weak symbol, similar to what objtool does in commit 44f6a7c0755d8d ("objtool: Fix seg fault with Clang non-section symbols"). This approach however can result in duplicate and/or invalid addresses in the final vmlinux mcount location table. As an example, with this commit, relocation records for __mcount_loc for kexec_file.o now include two entries with the weak functions arch_kexec_apply_relocations() and arch_kexec_apply_relocation_add() as the relocation bases: ... 0080 R_PPC64_ADDR64.text+0x1d34 0088 R_PPC64_ADDR64.text+0x1fec 0090 R_PPC64_ADDR64 arch_kexec_apply_relocations_add+0x000c 0098 R_PPC64_ADDR64 arch_kexec_apply_relocations+0x000c Powerpc does not override these functions today, so these get converted to correct offsets in the mcount location table in vmlinux. If one or both of these weak functions are overridden in future, in the final vmlinux mcount table, references to these will change over to the non-weak variant which has its own mcount location entry. As such, there will now be two entries for these functions. On ppc32, mcount location is always the third instruction in a function. On ppc64 with elf abi v2 (ppc64le), mcount location depends on whether the function has a global entry (fourth instruction) or not (second instruction), but this is expected to be the same across weak/non-weak implementations of a function. As such, in both these scenarios, as well as with other architectures where mcount location is at the same offset into a function, the two mcount entries will point to the same address. Ftrace skips the duplicate entries due to a previous commit. But, with ppc64 elf abi v1 which only supports the old -pg flag, mcount location can differ between the weak and non-weak variants of a function. In such scenarios, one of the two mcount entries will be invalid. Such architectures need to validate mcount locations by ensuring that the instruction(s) at those locations are as expected. On powerpc, this can be a simple check to ensure that the instruction is a 'bl'. This check can be further tightened as necessary. Introduce a config option HAVE_MCOUNT_LOC_VALIDATION that architectures can select to indicate support for validating the mcount locations during ftrace initialization. Add a flag (-a) to recordmcount which can then be passed to allow recordmcount to emit relocation records using weak symbols as the base. [1] https://github.com/linuxppc/issues/issues/388 [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 Signed-off-by: Naveen N. Rao --- Makefile | 4 ++ arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/ftrace.h | 8 +-- arch/powerpc/kernel/trace/ftrace.c | 11 kernel/trace/Kconfig | 6 ++ scripts/Makefile.build | 3 + scripts/recordmcount.c | 6 +- scripts/recordmcount.h | 94 ++ 8 files changed, 113 insertions(+), 20 deletions(-) diff --git a/Makefile b/Makefile index 29e273d3f8ccbf..b2a9fdb49815fb 100644 --- a/Makefile +++ b/Makefile @@ -858,6 +858,10 @@ ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT BUILD_C_RECORDMCOUNT := y export BUILD_C_RECORDMCOUNT endif + ifdef CONFIG_HAVE_MCOUNT_LOC_VALIDATION +HAVE_MCOUNT_LOC_VALIDATION := y +export HAVE_MCOUNT_LOC_VALIDATION + endif endif ifdef CONFIG_HAVE_FENTRY # s390-linux-gnu-gcc did not support -mfentry until gcc-9. diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 174edabb74fa11..acae4085aa6d6b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -229,6 +229,7 @@ config PPC select HAVE_KRETPROBES select HAVE_LD_DEAD_CODE_DATA_ELIMINATION select HAVE_LIVEPATCH if HAVE_DYNAMIC_FTRACE_WITH_REGS + select HAVE_MCOUNT_LOC_VALIDATION select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S) select HAVE_OPTPROBES diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index d83758acd1c7c3..d8b104ed2fdf38 100644 --- a/arch/powerpc/include/asm/ftrace.h +++
[PATCH v2 0/2] ftrace/recordmcount: Handle object files without section symbols
This is v2 of the series posted at: http://lkml.kernel.org/r/cover.1651047542.git.naveen.n@linux.vnet.ibm.com For v2, the first patch is slightly modified to skip the loop, rather than depending on addr == 0 to do so. The second patch is updated to make this behavior be opt-in by architectures so that they can validate the read mcount locations. - Naveen Naveen N. Rao (2): ftrace: Drop duplicate mcount locations ftrace: recordmcount: Handle sections with no non-weak symbols Makefile | 4 ++ arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/ftrace.h | 8 +-- arch/powerpc/kernel/trace/ftrace.c | 11 kernel/trace/Kconfig | 6 ++ kernel/trace/ftrace.c | 13 - scripts/Makefile.build | 3 + scripts/recordmcount.c | 6 +- scripts/recordmcount.h | 94 ++ 9 files changed, 125 insertions(+), 21 deletions(-) base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707 -- 2.35.1
[powerpc] kernel BUG at mm/mmap.c:3164! w/ltp(mmapstress03)
While running LTP tests (mmapstress03 specifically) against 5.18.0-rc4-next-20220428 booted on IBM Power server mentioned BUG is encountered. # ./mmapstress03 mmapstress030 TINFO : uname.machine=ppc64le kernel is 64bit mmapstress03: errno = 12: failed to fiddle with brk at the end mmapstress031 TFAIL : mmapstress03.c:212: Test failed [ 32.396145] mmap: mmapstress03 (3023): VmData 18446744073706799104 exceed data ulimit 18446744073709551615. Update limits or use boot option ignore_rlimit_data. [ 32.396192] [ cut here ] [ 32.396193] kernel BUG at mm/mmap.c:3164! [ 32.396195] Oops: Exception in kernel mode, sig: 5 [#1] [ 32.396210] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 32.396213] Modules linked in: dm_mod mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set bonding tls nf_tables nfnetlink sunrpc binfmt_misc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp ibmveth xts vmx_crypto fuse [ 32.396262] CPU: 5 PID: 3023 Comm: mmapstress03 Not tainted 5.18.0-rc4-next-20220428 #16 [ 32.396267] NIP: c04c4750 LR: c04c4730 CTR: c04bf5d0 [ 32.396270] REGS: c0001abeb810 TRAP: 0700 Not tainted (5.18.0-rc4-next-20220428) [ 32.396274] MSR: 80029033 CR: 22002224 XER: [ 32.396283] CFAR: c08af740 IRQMASK: 0 [ 32.396283] GPR00: c04c4730 c0001abebab0 c2a71300 [ 32.396283] GPR04: c00079dcd000 0008 [ 32.396283] GPR08: 0008 0001 c00079dcd040 [ 32.396283] GPR12: c00079dcd008 c0087fffa300 [ 32.396283] GPR16: [ 32.396283] GPR20: c2aaae85 [ 32.396283] GPR24: 7fffaa5c1200 c00020de3660 [ 32.396283] GPR28: 000c c00020de3600 000d [ 32.396320] NIP [c04c4750] exit_mmap+0x190/0x390 [ 32.396327] LR [c04c4730] exit_mmap+0x170/0x390 [ 32.396332] Call Trace: [ 32.396334] [c0001abebab0] [c04c4730] exit_mmap+0x170/0x390 (unreliable) [ 32.396340] [c0001abebbd0] [c01700f4] __mmput+0x54/0x200 [ 32.396344] [c0001abebc10] [c017fe5c] exit_mm+0xfc/0x190 [ 32.396348] [c0001abebc50] [c018016c] do_exit+0x27c/0x5a0 [ 32.396352] [c0001abebcf0] [c018063c] do_group_exit+0x4c/0xd0 [ 32.396356] [c0001abebd30] [c01806e4] sys_exit_group+0x24/0x30 [ 32.396360] [c0001abebd50] [c0037084] system_call_exception+0x254/0x550 [ 32.396364] [c0001abebe10] [c000bfe8] system_call_vectored_common+0xe8/0x278 [ 32.396369] --- interrupt: 3000 at 0x7fffaa318d04 [ 32.396374] NIP: 7fffaa318d04 LR: CTR: [ 32.396377] REGS: c0001abebe80 TRAP: 3000 Not tainted (5.18.0-rc4-next-20220428) [ 32.396380] MSR: 8280f033 CR: 4200 XER: [ 32.396389] IRQMASK: 0 [ 32.396389] GPR00: 00ea 7fffe43f3420 7fffaa457100 0001 [ 32.396389] GPR04: 11a602a0 7fffaa5c1200 [ 32.396389] GPR08: [ 32.396389] GPR12: 7fffaa5ca500 [ 32.396389] GPR16: [ 32.396389] GPR20: 0001 [ 32.396389] GPR24: 7fffaa450938 0001 7fffaa4529f8 [ 32.396389] GPR28: 0001 7fffaa5c3510 f000 0001 [ 32.396425] NIP [7fffaa318d04] 0x7fffaa318d04 [ 32.396427] LR [] 0x0 [ 32.396429] --- interrupt: 3000 [ 32.396431] Instruction dump: [ 32.396433] 6000 3880 38610020 483eff5d 6000 7c7f1b79 4082ffb8 813d0058 [ 32.396439] 7d29f278 7d290034 5529d97e 69290001 <0b09> 6000 7fa3eb78 483e328d [ 32.396447] ---[ end trace ]--- [ 32.398759] [ 33.398760] Kernel panic - not syncing: Fatal exception This problem was introduced with 5.18.0-rc4-next-20220427. I am unable to complete the git bisect due to build failure related to mapletree-vs-khugepaged issue. Thanks -Sachin
Re: [PATCH] KVM: PPC: Book3S HV: Initialize AMOR in nested entry
Nicholas Piggin writes: > Excerpts from Fabiano Rosas's message of April 26, 2022 12:21 am: >> The hypervisor always sets AMOR to ~0, but let's ensure we're not >> passing stale values around. >> > > Reviewed-by: Nicholas Piggin > > Looks like our L0 doesn't do anything with hvregs.amor ? It doesn't. And if the HV ever starts clearing bits from AMOR, then we would need to change any kernel code that writes and reads from AMR ( such as the KUAP) to take into consideration that we might read a different value from what we wrote.
[PATCH 2/2] tools/perf/tests: Fix session topology test to skip the test in guest environment
The session topology test fails in powerpc pSeries platform. Test logs: <<>> Session topology : FAILED! <<>> This testcases tests cpu topology by checking the core_id and socket_id stored in perf_env from perf session. The data from perf session is compared with the cpu topology information from "/sys/devices/system/cpu/cpuX/topology" like core_id, physical_package_id. In case of virtual environment, detail like physical_package_id is restricted to be exposed. Hence physical_package_id is set to -1. The testcase fails on such platforms since socket_id can't be fetched from topology info. Skip the testcase in powerpc for pSeries. Use the utility function "cpuinfo_field" to check platform from /proc/cpuinfo. Signed-off-by: Athira Rajeev --- tools/perf/tests/topology.c | 17 + 1 file changed, 17 insertions(+) diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c index ee1e3dcbc0bd..0ddcafa158db 100644 --- a/tools/perf/tests/topology.c +++ b/tools/perf/tests/topology.c @@ -109,6 +109,23 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map) && strncmp(session->header.env.arch, "aarch64", 7)) return TEST_SKIP; + /* +* In powerpc pSeries platform, not all the topology information +* are exposed via sysfs. Due to restriction, detail like +* physical_package_id will be set to -1. Hence skip this +* test for pSeries. +*/ + if (strncmp(session->header.env.arch, "powerpc", 7)) { + char *cpuinfo_platform = NULL; + + cpuinfo_platform = cpuinfo_field("platform"); + if (!strcmp(cpuinfo_platform, "pSeries")) { + free(cpuinfo_platform); + return TEST_SKIP; + } + free(cpuinfo_platform); + } + TEST_ASSERT_VAL("Session header CPU map not set", session->header.env.cpu); for (i = 0; i < session->header.env.nr_cpus_avail; i++) { -- 2.35.1
[PATCH 1/2] tools/perf: Add utility function to read /proc/cpuinfo for any field
/proc/cpuinfo provides information about type of processor, number of CPU's etc. Reading /proc/cpuinfo file outputs useful information by field name like cpu, platform, model (depending on architecture) and its value separated by colon. Add new utility function "cpuinfo_field" in "util/header.c" which accepts field name as input string to search in /proc/cpuinfo content. This returns the first matching value as resulting string. Example, calling the function "cpuinfo_field(platform)" in powerpc returns the platform value. This can be used to fetch processor information from "cpuinfo" by other utilities/testcases. Signed-off-by: Athira Rajeev --- tools/perf/util/header.c | 54 tools/perf/util/header.h | 1 + 2 files changed, 55 insertions(+) diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index a27132e5a5ef..0c8dfd0c1e78 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -983,6 +983,60 @@ static int write_dir_format(struct feat_fd *ff, return do_write(ff, >dir.version, sizeof(data->dir.version)); } +/* + * Return entry from /proc/cpuinfo + * indicated by "search" parameter. + */ +char *cpuinfo_field(const char *search) +{ + FILE *file; + char *buf = NULL; + char *copy_buf = NULL, *p; + size_t len = 0; + int ret = -1; + + if (!search) + return NULL; + + file = fopen("/proc/cpuinfo", "r"); + if (!file) + return NULL; + + while (getline(, , file) > 0) { + ret = strncmp(buf, search, strlen(search)); + if (!ret) + break; + } + + if (ret) + goto done; + + /* +* Trim the new line and separate +* value for search field from ":" +* in cpuinfo line output. +* Example output line: +* platform : +*/ + copy_buf = buf; + p = strchr(copy_buf, ':'); + if (p && *(p+1) == ' ' && *(p+2)) + copy_buf = p + 2; + p = strchr(copy_buf, '\n'); + if (p) + *p = '\0'; + + /* Copy the filtered string to buf */ + strcpy(buf, copy_buf); + + fclose(file); + return buf; + +done: + free(buf); + fclose(file); + return NULL; +} /* * Check whether a CPU is online * diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h index 0eb4bc29a5a4..b0f754364bd4 100644 --- a/tools/perf/util/header.h +++ b/tools/perf/util/header.h @@ -166,4 +166,5 @@ int get_cpuid(char *buffer, size_t sz); char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused); int strcmp_cpuid_str(const char *s1, const char *s2); +char *cpuinfo_field(const char *search); #endif /* __PERF_HEADER_H */ -- 2.35.1
[PATCH 0/2] Fix session topology test for powerpc and add utility function to get cpuinfo entries
The session topology test fails in powerpc pSeries platform. Test logs: <<>> Session topology : FAILED! <<>> This test uses cpu topology information and in powerpc, some of the topology info is restricted in environment like virtualized platform. Hence this test needs to be skipped in pSeries platform for powerpc. The information about platform is available in /proc/cpuinfo. Patch 1 adds generic utility function in "util/header.c" to read /proc/cpuinfo for any entry. Though the testcase fix needs value from "platform" entry, making this as a generic function to return value for any entry from the /proc/cpuinfo file which can be used commonly in future usecases. Patch 2 uses the newly added utility function to look for platform and skip the test in pSeries platform for powerpc. Athira Rajeev (2): tools/perf: Add utility function to read /proc/cpuinfo for any field tools/perf/tests: Fix session topology test to skip the test in guest environment tools/perf/tests/topology.c | 17 tools/perf/util/header.c| 54 + tools/perf/util/header.h| 1 + 3 files changed, 72 insertions(+) -- 2.35.1
Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols
On Thu, 28 Apr 2022 13:15:22 +0530 "Naveen N. Rao" wrote: > Indeed, plain old -pg will be a problem. I'm not sure there is a generic > way to address this. I suppose architectures will have to validate the > mcount locations, something like this? Perhaps another solution is to make the mcount locations after the linking is done. The main downside to that is that it takes time to go over the entire vmlinux, and will slow down a compile that only modified a couple of files. -- Steve
Re: [PATCH 0/3] perf tools: Tidy up symbol end fixup (v3)
Em Mon, Apr 25, 2022 at 01:59:03PM -0700, Ian Rogers escreveu: > On Fri, Apr 15, 2022 at 5:40 PM Namhyung Kim wrote: > > > > Hello, > > > > This work is a follow-up of Ian's previous one at > > https://lore.kernel.org/all/20220412154817.2728324-1-irog...@google.com/ > > > > Fixing up more symbol ends as introduced in: > > https://lore.kernel.org/lkml/20220317135536.805-1-mpet...@redhat.com/ > > > > it caused perf annotate to run into memory limits - every symbol holds > > all the disassembled code in the annotation, and so making symbols > > ends further away dramatically increased memory usage (40MB to >1GB). > > > > Modify the symbol end fixup logic so that special kernel cases aren't > > applied in the common case. > > > > v3 changes) > > * rename is_kernel to is_kallsyms > > * move the logic to generic function > > * remove arch-specific functions > > > > Thanks, > > Namhyung > > Thanks Namhyung! The series: > > Acked-by: Ian Rogers Thanks, applied to perf/urgent. - Arnaldo > > Namhyung Kim (3): > > perf symbol: Pass is_kallsyms to symbols__fixup_end() > > perf symbol: Update symbols__fixup_end() > > perf symbol: Remove arch__symbols__fixup_end() > > > > tools/perf/arch/arm64/util/machine.c | 21 --- > > tools/perf/arch/powerpc/util/Build | 1 - > > tools/perf/arch/powerpc/util/machine.c | 25 - > > tools/perf/arch/s390/util/machine.c| 16 --- > > tools/perf/util/symbol-elf.c | 2 +- > > tools/perf/util/symbol.c | 37 +++--- > > tools/perf/util/symbol.h | 3 +-- > > 7 files changed, 29 insertions(+), 76 deletions(-) > > delete mode 100644 tools/perf/arch/powerpc/util/machine.c > > > > > > base-commit: 41204da4c16071be9090940b18f566832d46becc > > -- > > 2.36.0.rc0.470.gd361397f0d-goog > > -- - Arnaldo
[PATCH v4.19 0/2] Custom backports for powerpc SLB issues
Hi Greg, Here are two custom backports to v4.19 for some powerpc issues we've discovered. Both were fixed upstream as part of a large non-backportable rewrite. Other stable kernel versions are not affected. cheers Michael Ellerman (1): powerpc/64s: Unmerge EX_LR and EX_DAR Nicholas Piggin (1): powerpc/64/interrupt: Temporarily save PPR on stack to fix register corruption due to SLB miss arch/powerpc/include/asm/exception-64s.h | 37 ++-- 1 file changed, 22 insertions(+), 15 deletions(-) -- 2.35.1
[PATCH v4.19 2/2] powerpc/64s: Unmerge EX_LR and EX_DAR
The SLB miss handler is not fully re-entrant, it is able to work because we ensure that the SLB entries for the kernel text and data segment, as well as the kernel stack are pinned in the SLB. Accesses to kernel data outside of those areas has to be carefully managed and can only occur in certain parts of the code. One way we deal with that is by storing some values in temporary slots in the paca. In v4.13 in commit dbeea1d6b4bd ("powerpc/64s/paca: EX_LR can be merged with EX_DAR") we merged the storage for two temporary slots for register storage during SLB miss handling. That was safe at the time because the two slots were never used at the same time. Unfortunately in v4.17 in commit c2b4d8b7417a ("powerpc/mm/hash64: Increase the VA range") we broke that condition, and introduced a case where the two slots could be in use at the same time, leading to one being corrupted. Specifically in slb_miss_common() when we detect that we're handling a fault for a large virtual address (> 512TB) we go to the "8" label, there we store the original fault address into paca->exslb[EX_DAR], before jumping to large_addr_slb() (using rfid). We then use the EXCEPTION_PROLOG_COMMON and RECONCILE_IRQ_STATE macros to do exception setup, before reloading the fault address from paca->exslb[EX_DAR] and storing it into pt_regs->dar (Data Address Register). However the code generated by those macros can cause a recursive SLB miss on a kernel address in three places. Firstly is the saving of the PPR (Program Priority Register), which happens on all CPUs since Power7, the PPR is saved to the thread struct which can be anywhere in memory. There is also the call to accumulate_stolen_time() if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y and CONFIG_PPC_SPLPAR=y, and also the call to trace_hardirqs_off() if CONFIG_TRACE_IRQFLAGS=y. The latter two call into generic C code and can lead to accesses anywhere in memory. On modern 64-bit CPUs we have 1TB segments, so for any of those accesses to cause an SLB fault they must access memory more than 1TB away from the kernel text, data and kernel stack. That typically only happens on machines with more than 1TB of RAM. However it is possible on multi-node Power9 systems, because memory on the 2nd node begins at 32TB in the linear mapping. If we take a recursive SLB fault then we will corrupt the original fault address with the LR (Link Register) value, because the EX_DAR and EX_LR slots share storage. Subsequently we will think we're trying to fault that LR address, which is the wrong address, and will also mostly likely lead to a segfault because the LR address will be < 512TB and so will be rejected by slb_miss_large_addr(). This appears as a spurious segfault to userspace, and if show_unhandled_signals is enabled you will see a fault reported in dmesg with the LR address, not the expected fault address, eg: prog[123]: segfault (11) at 128a61808 nip 128a618cc lr 128a61808 code 3 in prog[128a6+1] prog[123]: code: 4ba4 39200040 3ce4 7d2903a6 3c000200 78e707c6 780083e4 7d3b4b78 prog[123]: code: 7d455378 7d7d5b78 7d9f6378 7da46b78 7d3a4b78 7d465378 7d7c5b78 Notice that the fault address == the LR, and the faulting instruction is a simple store that should never use LR. In upstream this was fixed in v4.20 in commit 48e7b7695745 ("powerpc/64s/hash: Convert SLB miss handlers to C"), however that is a huge rewrite and not backportable. The minimal fix for stable is to just unmerge the EX_LR and EX_DAR slots again, avoiding the corruption of the DAR value. This uses an extra 8 bytes per CPU, which is negligble. Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/exception-64s.h | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index f0424c6fdeca..4fdae1c182df 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -48,11 +48,12 @@ #define EX_CCR 52 #define EX_CFAR56 #define EX_PPR 64 +#define EX_LR 72 #if defined(CONFIG_RELOCATABLE) -#define EX_CTR 72 -#define EX_SIZE10 /* size in u64 units */ +#define EX_CTR 80 +#define EX_SIZE11 /* size in u64 units */ #else -#define EX_SIZE9 /* size in u64 units */ +#define EX_SIZE10 /* size in u64 units */ #endif /* @@ -60,14 +61,6 @@ */ #define MAX_MCE_DEPTH 4 -/* - * EX_LR is only used in EXSLB and where it does not overlap with EX_DAR - * EX_CCR similarly with DSISR, but being 4 byte registers there is a hole - * in the save area so it's not necessary to overlap them. Could be used - * for future savings though if another 4 byte register was to be saved. - */ -#define EX_LR EX_DAR - /* * EX_R3 is only used by the bad_stack handler. bad_stack reloads and * saves DAR from SPRN_DAR, and
[PATCH v4.19 1/2] powerpc/64/interrupt: Temporarily save PPR on stack to fix register corruption due to SLB miss
From: Nicholas Piggin This is a minimal stable kernel fix for the problem solved by 4c2de74cc869 ("powerpc/64: Interrupts save PPR on stack rather than thread_struct"). Upstream kernels between 4.17-4.20 have this bug, so I propose this patch for 4.19 stable. Longer description from mpe: In commit f384796c4 ("powerpc/mm: Add support for handling > 512TB address in SLB miss") we added support for using multiple context ids per process. Previously accessing past the first context id was a fatal error for the process. With the new support it became non-fatal, and so the previous "bad_addr_slb" handler was changed to be the "large_addr_slb" handler. That handler uses the EXCEPTION_PROLOG_COMMON() macro, which in-turn calls the SAVE_PPR() macro. At the point where SAVE_PPR() is used, the r9-13 register values from the original user fault are saved in paca->exslb. It's not until later in EXCEPTION_PROLOG_COMMON_2() that they are saved from paca->exslb onto the kernel stack. The PPR is saved into current->thread.ppr, which is notably not on the kernel stack the way pt_regs are. This means we can take an SLB miss on current->thread.ppr. If that happens in the "large_addr_slb" case we will clobber the saved user r9-r13 in paca->exslb with kernel values. Later we will save those clobbered values into the pt_regs on the stack, and when we return to userspace those kernel values will be restored. Typically this appears as some sort of segfault in userspace, with an address that looks like a kernel address. In dmesg it can appear as: [19117.440331] some_program[1869625]: unhandled signal 11 at cf6bda10 nip 7fff780d559c lr 7fff781ae56c code 30001 The upstream fix for this issue was to move PPR into pt_regs, on the kernel stack, avoiding the possibility of an SLB fault when saving it. However changing the size of pt_regs is an intrusive change, and has side effects in other parts of the kernel. A minimal fix is to temporarily save the PPR in an unused part of pt_regs, then save the user register values from paca->exslb into pt_regs, and then move the saved PPR into thread.ppr. Fixes: f384796c40dc ("powerpc/mm: Add support for handling > 512TB address in SLB miss") Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220316033235.903657-1-npig...@gmail.com --- arch/powerpc/include/asm/exception-64s.h | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 35fb5b11955a..f0424c6fdeca 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -243,10 +243,22 @@ * PPR save/restore macros used in exceptions_64s.S * Used for P7 or later processors */ -#define SAVE_PPR(area, ra, rb) \ +#define SAVE_PPR(area, ra) \ +BEGIN_FTR_SECTION_NESTED(940) \ + ld ra,area+EX_PPR(r13);/* Read PPR from paca */\ + std ra,RESULT(r1); /* Store PPR in RESULT for now */ \ +END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940) + +/* + * This is called after we are finished accessing 'area', so we can now take + * SLB faults accessing the thread struct, which will use PACA_EXSLB area. + * This is required because the large_addr_slb handler uses EXSLB and it also + * uses the common exception macros including this PPR saving. + */ +#define MOVE_PPR_TO_THREAD(ra, rb) \ BEGIN_FTR_SECTION_NESTED(940) \ ld ra,PACACURRENT(r13);\ - ld rb,area+EX_PPR(r13);/* Read PPR from paca */\ + ld rb,RESULT(r1); /* Read PPR from stack */ \ std rb,TASKTHREADPPR(ra); \ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940) @@ -515,9 +527,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) 3: EXCEPTION_PROLOG_COMMON_1(); \ beq 4f; /* if from kernel mode */ \ ACCOUNT_CPU_USER_ENTRY(r13, r9, r10); \ - SAVE_PPR(area, r9, r10); \ + SAVE_PPR(area, r9);\ 4: EXCEPTION_PROLOG_COMMON_2(area)\ - EXCEPTION_PROLOG_COMMON_3(n) \ + beq 5f; /* if from kernel mode */ \ + MOVE_PPR_TO_THREAD(r9, r10); \ +5: EXCEPTION_PROLOG_COMMON_3(n) \ ACCOUNT_STOLEN_TIME /* Save original regs values from save area to stack
Re: [PATCH V12 00/20] riscv: Add COMPAT mode support for 64BIT
Hi Palmer, I see you have taken v12 into your riscv-compat branch and added asm/signal32.h. Do you need me help put compat_sigcontext & compat_ucontext & compat_rt_sigframe into signal32.h? And could we rename signal32.h to compat_signal.h to match compat_signal.c? In the end, thx for taking care of compat patch series. On Tue, Apr 5, 2022 at 3:13 PM wrote: > > From: Guo Ren > > Currently, most 64-bit architectures (x86, parisc, powerpc, arm64, > s390, mips, sparc) have supported COMPAT mode. But they all have > history issues and can't use standard linux unistd.h. RISC-V would > be first standard __SYSCALL_COMPAT user of include/uapi/asm-generic > /unistd.h. > > The patchset are based on v5.18-rc1, you can compare rv64-compat > v.s. rv32-native in qemu with following steps: > > - Prepare rv32 rootfs & fw_jump.bin by buildroot.org >$ git clone git://git.busybox.net/buildroot >$ cd buildroot >$ make qemu_riscv32_virt_defconfig O=qemu_riscv32_virt_defconfig >$ make -C qemu_riscv32_virt_defconfig >$ make qemu_riscv64_virt_defconfig O=qemu_riscv64_virt_defconfig >$ make -C qemu_riscv64_virt_defconfig >(Got fw_jump.bin & rootfs.ext2 in qemu_riscvXX_virt_defconfig/images) > > - Prepare Linux rv32 & rv64 Image >$ git clone g...@github.com:c-sky/csky-linux.git -b riscv_compat_v12 linux >$ cd linux >$ echo "CONFIG_STRICT_KERNEL_RWX=n" >> arch/riscv/configs/defconfig >$ echo "CONFIG_STRICT_MODULE_RWX=n" >> arch/riscv/configs/defconfig >$ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- > O=../build-rv32/ rv32_defconfig >$ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- > O=../build-rv32/ Image >$ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- > O=../build-rv64/ defconfig >$ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- > O=../build-rv64/ Image > > - Prepare Qemu: >$ git clone https://gitlab.com/qemu-project/qemu.git -b master linux >$ cd qemu >$ ./configure --target-list="riscv64-softmmu riscv32-softmmu" >$ make > > Now let's compare rv64-compat with rv32-native memory footprint with almost > the same > defconfig, rootfs, opensbi in one qemu. > > - Run rv64 with rv32 rootfs in compat mode: >$ ./build/qemu-system-riscv64 -cpu rv64 -M virt -m 64m -nographic -bios > qemu_riscv64_virt_defconfig/images/fw_jump.bin -kernel build-rv64/Image > -drive file qemu_riscv32_virt_defconfig/images/rootfs.ext2,format=raw,id=hd0 > -device virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro > console=ttyS0 earlycon=sbi" -netdev user,id=net0 -device > virtio-net-device,netdev=net0 > > QEMU emulator version 6.2.50 (v6.2.0-29-g196d7182c8) > OpenSBI v0.9 > [0.00] Linux version 5.16.0-rc6-00017-g750f87086bdd-dirty > (guoren@guoren-Z87-HD3) (riscv64-unknown-linux-gnu-gcc (GCC) 10.2.0, GNU ld > (GNU Binutils) 2.37) #96 SMP Tue Dec 28 21:01:55 CST 2021 > [0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8020 > [0.00] Machine model: riscv-virtio,qemu > [0.00] earlycon: sbi0 at I/O port 0x0 (options '') > [0.00] printk: bootconsole [sbi0] enabled > [0.00] efi: UEFI not found. > [0.00] Zone ranges: > [0.00] DMA32[mem 0x8020-0x83ff] > [0.00] Normal empty > [0.00] Movable zone start for each node > [0.00] Early memory node ranges > [0.00] node 0: [mem 0x8020-0x83ff] > [0.00] Initmem setup node 0 [mem > 0x8020-0x83ff] > [0.00] SBI specification v0.2 detected > [0.00] SBI implementation ID=0x1 Version=0x9 > [0.00] SBI TIME extension detected > [0.00] SBI IPI extension detected > [0.00] SBI RFENCE extension detected > [0.00] SBI v0.2 HSM extension detected > [0.00] riscv: ISA extensions acdfhimsu > [0.00] riscv: ELF capabilities acdfim > [0.00] percpu: Embedded 17 pages/cpu s30696 r8192 d30744 u69632 > [0.00] Built 1 zonelists, mobility grouping on. Total pages: 15655 > [0.00] Kernel command line: rootwait root=/dev/vda ro console=ttyS0 > earlycon=sbi > [0.00] Dentry cache hash table entries: 8192 (order: 4, 65536 bytes, > linear) > [0.00] Inode-cache hash table entries: 4096 (order: 3, 32768 bytes, > linear) > [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off > [0.00] Virtual kernel memory layout: > [0.00] fixmap : 0xffcefee0 - 0xffceff00 (2048 > kB) > [0.00] pci io : 0xffceff00 - 0xffcf ( 16 > MB) > [0.00] vmemmap : 0xffcf - 0xffcf (4095 > MB) > [0.00] vmalloc : 0xffd0 - 0xffdf > (65535 MB) > [0.00] lowmem : 0xffe0 - 0xffe003e0 ( 62 > MB) > [0.00] kernel :
Re: [PATCH net-next v5 08/18] net: sparx5: Replace usage of found with dedicated list iterator variable
Hello, On Wed, 2022-04-27 at 18:06 +0200, Jakob Koschel wrote: > To move the list iterator variable into the list_for_each_entry_*() > macro in the future it should be avoided to use the list iterator > variable after the loop body. > > To *never* use the list iterator variable after the loop it was > concluded to use a separate iterator variable instead of a > found boolean [1]. > > This removes the need to use a found variable and simply checking if > the variable was set, can determine if the break/goto was hit. > > Link: > https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/ > [1] > Signed-off-by: Jakob Koschel > --- > .../microchip/sparx5/sparx5_mactable.c| 25 +-- > 1 file changed, 12 insertions(+), 13 deletions(-) > > diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c > b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c > index a5837dbe0c7e..bb8d9ce79ac2 100644 > --- a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c > +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c > @@ -362,8 +362,7 @@ static void sparx5_mact_handle_entry(struct sparx5 > *sparx5, >unsigned char mac[ETH_ALEN], >u16 vid, u32 cfg2) > { > - struct sparx5_mact_entry *mact_entry; > - bool found = false; > + struct sparx5_mact_entry *mact_entry = NULL, *iter; > u16 port; > > if (LRN_MAC_ACCESS_CFG_2_MAC_ENTRY_ADDR_TYPE_GET(cfg2) != > @@ -378,28 +377,28 @@ static void sparx5_mact_handle_entry(struct sparx5 > *sparx5, > return; > > mutex_lock(>mact_lock); > - list_for_each_entry(mact_entry, >mact_entries, list) { > - if (mact_entry->vid == vid && > - ether_addr_equal(mac, mact_entry->mac)) { > - found = true; > - mact_entry->flags |= MAC_ENT_ALIVE; > - if (mact_entry->port != port) { > + list_for_each_entry(iter, >mact_entries, list) { > + if (iter->vid == vid && > + ether_addr_equal(mac, iter->mac)) { I'm sorry for the late feedback. If you move the 'mact_entry = iter;' statement here, the diffstat will be slightly smaller and the patch more readable, IMHO. There is similar situation in the next patch. Cheers, Paolo
Re: [PATCH 20/30] panic: Add the panic informational notifier list
On 27/04/2022 23:49, Guilherme G. Piccoli wrote: The goal of this new panic notifier is to allow its users to register callbacks to run earlier in the panic path than they currently do. This aims at informational mechanisms, like dumping kernel offsets and showing device error data (in case it's simple registers reading, for example) as well as mechanisms to disable log flooding (like hung_task detector / RCU warnings) and the tracing dump_on_oops (when enabled). Any (non-invasive) information that should be provided before kmsg_dump() as well as log flooding preventing code should fit here, as long it offers relatively low risk for kdump. For now, the patch is almost a no-op, although it changes a bit the ordering in which some panic notifiers are executed - specially affected by this are the notifiers responsible for disabling the hung_task detector / RCU warnings, which now run first. In a subsequent patch, the panic path will be refactored, then the panic informational notifiers will effectively run earlier, before ksmg_dump() (and usually before kdump as well). We also defer documenting it all properly in the subsequent refactor patch. Finally, while at it, we removed some useless header inclusions too. Cc: Benjamin Herrenschmidt Cc: Catalin Marinas Cc: Florian Fainelli Cc: Frederic Weisbecker Cc: "H. Peter Anvin" Cc: Hari Bathini Cc: Joel Fernandes Cc: Jonathan Hunter Cc: Josh Triplett Cc: Lai Jiangshan Cc: Leo Yan Cc: Mathieu Desnoyers Cc: Mathieu Poirier Cc: Michael Ellerman Cc: Mike Leach Cc: Mikko Perttunen Cc: Neeraj Upadhyay Cc: Nicholas Piggin Cc: Paul Mackerras Cc: Suzuki K Poulose Cc: Thierry Reding Cc: Thomas Bogendoerfer Signed-off-by: Guilherme G. Piccoli --- arch/arm64/kernel/setup.c | 2 +- arch/mips/kernel/relocate.c | 2 +- arch/powerpc/kernel/setup-common.c| 2 +- arch/x86/kernel/setup.c | 2 +- drivers/bus/brcmstb_gisb.c| 2 +- drivers/hwtracing/coresight/coresight-cpu-debug.c | 4 ++-- drivers/soc/tegra/ari-tegra186.c | 3 ++- include/linux/panic_notifier.h| 1 + kernel/hung_task.c| 3 ++- kernel/panic.c| 4 kernel/rcu/tree.c | 1 - kernel/rcu/tree_stall.h | 3 ++- kernel/trace/trace.c | 2 +- 13 files changed, 19 insertions(+), 12 deletions(-) ... diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c index 1874df7c6a73..7b1012454525 100644 --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c @@ -535,7 +535,7 @@ static int debug_func_init(void) _func_knob_fops); /* Register function to be called for panic */ - ret = atomic_notifier_chain_register(_notifier_list, + ret = atomic_notifier_chain_register(_info_list, _notifier); if (ret) { pr_err("%s: unable to register notifier: %d\n", @@ -552,7 +552,7 @@ static int debug_func_init(void) static void debug_func_exit(void) { - atomic_notifier_chain_unregister(_notifier_list, + atomic_notifier_chain_unregister(_info_list, _notifier); debugfs_remove_recursive(debug_debugfs_dir); } Acked-by: Suzuki K Poulose
Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
Hi Guilherme, On 27/04/2022 23:49, Guilherme G. Piccoli wrote: The panic notifier infrastructure executes registered callbacks when a panic event happens - such callbacks are executed in atomic context, with interrupts and preemption disabled in the running CPU and all other CPUs disabled. That said, mutexes in such context are not a good idea. This patch replaces a regular mutex with a mutex_trylock safer approach; given the nature of the mutex used in the driver, it should be pretty uncommon being unable to acquire such mutex in the panic path, hence no functional change should be observed (and if it is, that would be likely a deadlock with the regular mutex). Fixes: 2227b7c74634 ("coresight: add support for CPU debug module") Cc: Leo Yan Cc: Mathieu Poirier Cc: Mike Leach Cc: Suzuki K Poulose Signed-off-by: Guilherme G. Piccoli How would you like to proceed with queuing this ? I am happy either way. In case you plan to push this as part of this series (I don't see any potential conflicts) : Reviewed-by: Suzuki K Poulose
Re: [PATCH 2/2] recordmcount: Handle sections with no non-weak symbols
Steven Rostedt wrote: On Wed, 27 Apr 2022 15:01:22 +0530 "Naveen N. Rao" wrote: If one or both of these weak functions are overridden in future, in the final vmlinux mcount table, references to these will change over to the non-weak variant which has its own mcount location entry. As such, there will now be two entries for these functions, both pointing to the same non-weak location. But is that really true in all cases? x86 uses fentry these days, and other archs do things differently too. But the original mcount (-pg) call happened *after* the frame setup. That means the offset of the mcount call would be at different offsets wrt the start of the function. If you have one of these architectures that still use mcount, and the weak function doesn't have the same size frame setup as the overriding function, then the addresses will not be the same. Indeed, plain old -pg will be a problem. I'm not sure there is a generic way to address this. I suppose architectures will have to validate the mcount locations, something like this? diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index d83758acd1c7c3..d8b104ed2fdf38 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -12,13 +12,7 @@ #ifndef __ASSEMBLY__ extern void _mcount(void); - -static inline unsigned long ftrace_call_adjust(unsigned long addr) -{ - /* relocation of mcount call site is the same as the address */ - return addr; -} - +unsigned long ftrace_call_adjust(unsigned long addr); unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp); diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 4ee04aacf9f13c..976c08cd0573f7 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -858,6 +858,17 @@ void arch_ftrace_update_code(int command) ftrace_modify_all_code(command); } +unsigned long ftrace_call_adjust(unsigned long addr) +{ + ppc_inst_t op = ppc_inst_read((u32 *)addr); + + if (!is_bl_op(op)) + return 0; + + /* relocation of mcount call site is the same as the address */ + return addr; +} + #ifdef CONFIG_PPC64 #define PACATOC offsetof(struct paca_struct, kernel_toc) We can tighten those checks as necessary, but it will be upto the architectures to validate the mcount locations. This all will have to be opt-in so that only architectures doing necessary validation will allow mcount relocations against weak symbols. - Naveen
[PATCH kernel] KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation. The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine. If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version. On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs. So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput. This removes the real mode handlers from KVM and related code from the powernv platform. This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/kvm/Makefile | 3 - arch/powerpc/include/asm/iommu.h | 6 +- arch/powerpc/include/asm/kvm_ppc.h| 2 - arch/powerpc/include/asm/mmu_context.h| 5 - arch/powerpc/platforms/powernv/pci.h | 3 +- arch/powerpc/kernel/iommu.c | 4 +- arch/powerpc/kvm/book3s_64_vio.c | 43 ++ arch/powerpc/kvm/book3s_64_vio_hv.c | 672 -- arch/powerpc/mm/book3s64/iommu_api.c | 68 -- arch/powerpc/platforms/powernv/pci-ioda-tce.c | 5 +- arch/powerpc/platforms/powernv/pci-ioda.c | 46 +- arch/powerpc/platforms/pseries/iommu.c| 3 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 - 13 files changed, 69 insertions(+), 801 deletions(-) delete mode 100644 arch/powerpc/kvm/book3s_64_vio_hv.c diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 9bdfc8b50899..8e3681a86074 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -37,9 +37,6 @@ kvm-e500mc-objs := \ e500_emulate.o kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs) -kvm-book3s_64-builtin-objs-$(CONFIG_SPAPR_TCE_IOMMU) := \ - book3s_64_vio_hv.o - kvm-pr-y := \ fpu.o \ emulate.o \ diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index d7912b66c874..7e29c73e3dd4 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -51,13 +51,11 @@ struct iommu_table_ops { int (*xchg_no_kill)(struct iommu_table *tbl, long index, unsigned long *hpa, - enum dma_data_direction *direction, - bool realmode); + enum dma_data_direction *direction); void (*tce_kill)(struct iommu_table *tbl, unsigned long index, - unsigned long pages, - bool realmode); + unsigned long pages); __be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc); #endif diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 838d4cb460b7..44200a27371b 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -177,8 +177,6 @@ extern void kvmppc_setup_partition_table(struct kvm *kvm); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce_64 *args); -extern struct kvmppc_spapr_tce_table *kvmppc_find_table( - struct kvm *kvm, unsigned long liobn); #define kvmppc_ioba_validate(stt, ioba, npages) \ (iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \ (stt)->size, (ioba), (npages)) ?\ diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index b8527a74bd4d..3f25bd3e14eb 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -34,15 +34,10 @@ extern void mm_iommu_init(struct mm_struct *mm); extern void mm_iommu_cleanup(struct mm_struct *mm); extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct
Re: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML
On 27/04/2022 09:53, Michael Walle wrote: > Convert the fsl,ls-extirq binding to the new YAML format. > > In contrast to the original binding documentation, there are three > compatibles which are used in their corresponding device trees which > have a specific compatible and the (already documented) fallback > compatible: > - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq" > - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq" > - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq" > > Depending on the number of the number of the external IRQs which is > usually 12 except for the LS1021A where there are only 6, the > interrupt-map-mask was reduced from 0x to 0xf and 0x7 > respectively and the number of interrupt-map entries have to > match. > > Signed-off-by: Michael Walle > --- > changes since v2: > - drop $ref to interrupt-controller.yaml > - use a more strict interrupt-map-mask and make it conditional on SoC > > changes since v1: > - new patch > > .../interrupt-controller/fsl,ls-extirq.txt| 53 > .../interrupt-controller/fsl,ls-extirq.yaml | 118 ++ > 2 files changed, 118 insertions(+), 53 deletions(-) > delete mode 100644 > Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt > create mode 100644 > Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml > Reviewed-by: Krzysztof Kozlowski Best regards, Krzysztof
Re: [PATCH v3 3/4] dt-bindings: interrupt-controller: fsl, ls-extirq: convert to YAML
On 27/04/2022 22:08, Leo Li wrote: >> Convert the fsl,ls-extirq binding to the new YAML format. >> >> In contrast to the original binding documentation, there are three >> compatibles which are used in their corresponding device trees which have a >> specific compatible and the (already documented) fallback >> compatible: >> - "fsl,ls1046a-extirq", "fsl,ls1043a-extirq" >> - "fsl,ls2080a-extirq", "fsl,ls1088a-extirq" >> - "fsl,lx2160a-extirq", "fsl,ls1088a-extirq" >> >> Depending on the number of the number of the external IRQs which is >> usually 12 except for the LS1021A where there are only 6, the interrupt-map- >> mask was reduced from 0x to 0xf and 0x7 respectively and the number >> of interrupt-map entries have to match. > > I assume this change won't prevent driver to be compatible with older device > trees using the 0x? The original 0x should work for both > 6/12 interrupts or whatever reasonable number of interrupts that maybe used > in future SoCs. So the purpose of this change is to make the binding more > specific to catch more errors in device tree? Yes. Best regards, Krzysztof
Re: serial hang in qemu-system-ppc64 -M pseries
On 4/28/22 00:41, Rob Landley wrote: > On 4/27/22 10:27, Thomas Huth wrote: >> On 26/04/2022 12.26, Rob Landley wrote: >>> When I cut and paste 80-ish characters of text into the Linux serial >>> console, it >>> reads 16 characters and stops. When I hit space, it reads another 16 >>> characters, >>> and if I keep at it will eventually catch up without losing data. If I type, >>> every character shows up immediately. >> >> That "16" certainly comes from VTERM_BUFSIZE in hw/char/spapr_vty.c in the >> QEMU sources, I think. >> >>> (On other qemu targets and kernels I can cut and paste an entire uuencoded >>> binary and it goes through just fine in one go, but this target hangs with >>> big >>> pastes until I hit keys.) >>> >>> Is this a qemu-side bug, or a kernel-side bug? >>> >>> Kernel config attached (linux 5.18-rc3 or thereabouts), qemu invocation is: >>> >>> qemu-system-ppc64 -M pseries -vga none -nographic -no-reboot -m 256 -kernel >>> vmlinux -initrd powerpc64leroot.cpio.gz -append "panic=1 HOST=powerpc64le >>> console=hvc0" >> >> Which version of QEMU are you using? > > $ qemu-system-ppc64 --version > QEMU emulator version 6.2.92 (v6.2.0-rc2) > Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers Just confirmed it behaves the same with current git (commit cf6f26d6f9b2). Rob