Re: [PATCH -next] powerpc/book3e: Fix build error
YueHaibing writes: > On 2022/5/17 18:45, Christophe Leroy wrote: >> Le 17/05/2022 à 11:48, YueHaibing a écrit : >>> arch/powerpc/mm/nohash/fsl_book3e.c: In function ‘relocate_init’: >>> arch/powerpc/mm/nohash/fsl_book3e.c:348:2: error: implicit declaration of >>> function ‘early_get_first_memblock_info’ >>> [-Werror=implicit-function-declaration] >>>early_get_first_memblock_info(__va(dt_ptr), ); >>>^ >>> >>> Add missing include file linux/of_fdt.h to fix this. >>> >>> Signed-off-by: YueHaibing >> >> Thats for fixing that. >> >> Reviewed-by: Christophe Leroy >> >> It means we don't have any defconfig for 32 bits booke with >> CONFIG_RELOCATABLE ? > > Indeed, there is no defconfig with CONFIG_RELOCATABLE under > arch/powerpc/configs It's selected by CRASH_DUMP, which is in ppc64_defconfig. But it's not enabled in corenet32_smp_defconfig which is what I build, or any of the other 85xx configs. I guess it should be, I think it's true that RELOCATABLE=y exercises more interesting code paths? cheers
Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
"Eric W. Biederman" writes: > Looking at this the pr_err is absolutely needed. If an unsupported case > winds up in the purgatory blob and the code can't handle it things > will fail silently much worse later. It won't fail later, it will fail the syscall. sys_kexec_file_load() kimage_file_alloc_init() kimage_file_prepare_segments() arch_kexec_kernel_image_load() kexec_image_load_default() image->fops->load() elf64_load()# powerpc bzImage64_load()# x86 kexec_load_purgatory() kexec_apply_relocations() Which does: if (relsec->sh_type == SHT_RELA) ret = arch_kexec_apply_relocations_add(pi, section, relsec, symtab); else if (relsec->sh_type == SHT_REL) ret = arch_kexec_apply_relocations(pi, section, relsec, symtab); if (ret) return ret; And that error is bubbled all the way back up. So as long as arch_kexec_apply_relocations() returns an error the syscall will fail back to userspace and there'll be an error message at that level. It's true that having nothing printed in dmesg makes it harder to work out why the syscall failed. But it's a kernel bug if there are unhandled relocations in the kernel-supplied purgatory code, so a user really has no way to do anything about the error even if it is printed. > "Naveen N. Rao" writes: > >> Baoquan He wrote: >>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote: kexec_load_purgatory() can fail for many reasons - there is no need to print an error when encountering unsupported relocations. This solves a build issue on powerpc with binutils v2.36 and newer [1]. Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [2], binutils started dropping section symbols that it thought >>> I am not familiar with binutils, while wondering if this exists in other >>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we >>> have problem with it? >> >> I'm not aware of this specific file causing a problem on other architectures >> - >> perhaps the config options differ enough. There are however more reports of >> similar issues affecting other architectures with the llvm integrated >> assembler: >> https://github.com/ClangBuiltLinux/linux/issues/981 >> >>> were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol >>> But arch_kexec_apply_relocations_add is weak symbol on ppc. >> >> Yes. Note that it is just the section symbol that gets dropped. The section >> is >> still present and will continue to hold the symbols for the functions >> themselves. > > So we have a case where binutils thinks it is doing something useful > and our kernel specific tool gets tripped up by it. It's not just binutils, the LLVM assembler has the same behavior. > Reading the recordmcount code it looks like it is finding any symbol > within a section but ignoring weak symbols. So I suspect the only > remaining symbol in the section is __weak and that confuses > recordmcount. > > Does removing the __weak annotation on those functions fix the build > error? If so we can restructure the kexec code to simply not use __weak > symbols. > > Otherwise the fix needs to be in recordmcount or binutils, and we should > loop whoever maintains recordmcount in to see what they can do. It seems that recordmcount is not really maintained anymore now that x86 uses objtool? There've been several threads about fixing recordmcount, but none of them seem to have lead to a solution. These weak symbol vs recordmcount problems have been worked around going back as far as 2020: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9 cheers
[powerpc:topic/ppc-kvm] BUILD SUCCESS 2852ebfa10afdcefff35ec72c8da97141df9845c
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/ppc-kvm branch HEAD: 2852ebfa10afdcefff35ec72c8da97141df9845c KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting elapsed time: 724m configs tested: 119 configs skipped: 104 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm64 defconfig arm64allyesconfig arm allmodconfig arm defconfig arm allyesconfig i386 randconfig-c001-20220516 m68km5407c3_defconfig m68k allmodconfig sparc sparc64_defconfig mipsar7_defconfig s390 allyesconfig sh se7712_defconfig arm pxa3xx_defconfig sh se7206_defconfig sh sh7724_generic_defconfig m68k m5249evb_defconfig arm h5000_defconfig sh se7724_defconfig powerpc mpc85xx_cds_defconfig riscv nommu_k210_sdcard_defconfig m68k m5208evb_defconfig sh sh03_defconfig shecovec24-romimage_defconfig sh r7780mp_defconfig powerpc ppc64_defconfig arcnsimosci_defconfig powerpc pq2fads_defconfig sh ecovec24_defconfig mipsgpr_defconfig m68k bvme6000_defconfig arm randconfig-c002-20220516 x86_64 randconfig-c001-20220516 ia64defconfig m68k allyesconfig m68kdefconfig cskydefconfig nios2allyesconfig alpha defconfig alphaallyesconfig nios2 defconfig arc allyesconfig h8300allyesconfig xtensa allyesconfig arc defconfig sh allmodconfig s390defconfig s390 allmodconfig parisc defconfig parisc64defconfig parisc allyesconfig sparc defconfig i386 allyesconfig sparcallyesconfig i386defconfig i386 debian-10.3-kselftests i386 debian-10.3 mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allnoconfig powerpc allmodconfig x86_64 randconfig-a012-20220516 x86_64 randconfig-a016-20220516 x86_64 randconfig-a011-20220516 x86_64 randconfig-a014-20220516 x86_64 randconfig-a013-20220516 x86_64 randconfig-a015-20220516 i386 randconfig-a016-20220516 i386 randconfig-a013-20220516 i386 randconfig-a015-20220516 i386 randconfig-a012-20220516 i386 randconfig-a014-20220516 i386 randconfig-a011-20220516 i386 randconfig-a012 i386 randconfig-a014 i386 randconfig-a016 s390 randconfig-r044-20220516 riscvrandconfig-r042-20220516 arc randconfig-r043-20220516 riscv defconfig riscvnommu_virt_defconfig riscv rv32_defconfig riscvnommu_k210_defconfig riscv allnoconfig riscvallmodconfig riscvallyesconfig x86_64rhel-8.3-kselftests um x86_64_defconfig um i386_defconfig x86_64 rhel-8.3-func x86_64 rhel-8.3-syz x86_64 kexec x86_64 defconfig x86_64 allyesconfig x86_64 rhel-8.3-kunit x86_64 rhel-8.3 clang tested configs: powerpc randconfig-c003-20220516 riscvrandconfig-c006-20220516 mips randconfig-c004-20220516 arm
Re: [PATCH v4] locking/csd_lock: change csdlock_debug from early_param to __setup
On Tue, May 17, 2022 at 11:22:04AM +0800, Chen Zhongjin wrote: > On 2022/5/10 17:46, Chen Zhongjin wrote: > > csdlock_debug uses early_param and static_branch_enable() to enable > > csd_lock_wait feature, which triggers a panic on arm64 with config: > > CONFIG_SPARSEMEM=y > > CONFIG_SPARSEMEM_VMEMMAP=n > > > > With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in > > static_key_enable() and returns NULL which makes NULL dereference > > because mem_section is initialized in sparse_init() which is later > > than parse_early_param() stage. > > > > For powerpc this is also broken, because early_param stage is > > earlier than jump_label_init() so static_key_enable won't work. > > powerpc throws an warning: "static key 'xxx' used before call > > to jump_label_init()". > > > > Thus, early_param is too early for csd_lock_wait to run > > static_branch_enable(), so changes it to __setup to fix these. > > > > Fixes: 8d0968cc6b8f ("locking/csd_lock: Add boot parameter for controlling > > CSD lock debugging") > > Cc: sta...@vger.kernel.org > > Reported-by: Chen jingwen > > Signed-off-by: Chen Zhongjin > > --- > > Change v3 -> v4: > > Fix title and description because this fix is also applied > > to powerpc. > > For more detailed arm64 bug report see: > > https://lore.kernel.org/linux-arm-kernel/e8715911-f835-059d-27f8-cc5f5ad30...@huawei.com/t/ > > > > Change v2 -> v3: > > Add module name in title > > > > Change v1 -> v2: > > Fix return 1 for __setup > > --- > > kernel/smp.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/smp.c b/kernel/smp.c > > index 65a630f62363..381eb15cd28f 100644 > > --- a/kernel/smp.c > > +++ b/kernel/smp.c > > @@ -174,9 +174,9 @@ static int __init csdlock_debug(char *str) > > if (val) > > static_branch_enable(_debug_enabled); > > > > - return 0; > > + return 1; > > } > > -early_param("csdlock_debug", csdlock_debug); > > +__setup("csdlock_debug=", csdlock_debug); > > > > static DEFINE_PER_CPU(call_single_data_t *, cur_csd); > > static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); > > Ping for review. Thanks! I have pulled it into -rcu for testing and further review. It might well need to go through some other path, though. Thanx, Paul
Re: [PATCH] net: unexport csum_and_copy_{from,to}_user
Al Viro writes: > On Thu, Apr 21, 2022 at 09:04:40AM +0200, Christoph Hellwig wrote: >> csum_and_copy_from_user and csum_and_copy_to_user are exported by >> a few architectures, but not actually used in modular code. Drop >> the exports. >> >> Signed-off-by: Christoph Hellwig > > Acked-by: Al Viro > > Not sure which tree should it go through - Arnd's, perhaps? It's already in akpm's tree: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-nonmm-stable=6308499b5e99c0c903fde2c605e41d9a86c4be6c cheers
RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
> What I'm planning to do in the altera_edac notifier is: > > if (kdump_is_set) > return; Yes. That's what I think should happen. -Tony
Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
On 17/05/2022 14:02, Luck, Tony wrote: >> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else >> we run the code as-is? Does that make sense to you? > > The "skip" option sounds like it needs some special flag associated with > an entry on the notifier chain. But there are other notifier chains ... so > that > sounds messy to me. > > Just all the notifiers in priority order. If any want to take different > actions > based on kdump status, change the code. That seems more flexible than > an "all or nothing" approach by skipping. > > -Tony I guess I've expressed myself in a poor way - sorry! What I'm planning to do in the altera_edac notifier is: if (kdump_is_set) return; /* regular code */ In other words: if the kdump is set, this notifier will be effectively a nop (although it's gonna be called). Lemme know your thoughts Tony, if that makes sense. Thanks, Guilherme
RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else > we run the code as-is? Does that make sense to you? The "skip" option sounds like it needs some special flag associated with an entry on the notifier chain. But there are other notifier chains ... so that sounds messy to me. Just all the notifiers in priority order. If any want to take different actions based on kdump status, change the code. That seems more flexible than an "all or nothing" approach by skipping. -Tony
Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
On 17/05/2022 11:11, Petr Mladek wrote: > [...] >>> Then notifiers could make an informed choice on whether to deep dive to >>> get all the possible details (when there is no kdump) or just skim the high >>> level stuff (to maximize chance of getting a successful kdump). >>> >>> -Tony >> >> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier? > > I like this idea. > > One small problem is that kexec_crash_loaded() has valid result > only under kexec_mutex. On the other hand, it should stay true > once loaded so that the small race window should be innocent. > >> With that, are you/Petr/Dinh OK in moving it for the info list? > > Sounds good to me. > > Best Regards, > Petr Perfect, I'll do that for V2 then =) Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else we run the code as-is? Does that make sense to you? I'll postpone it to run almost in the end of info list (last position is for panic_print). Thanks, Guilherme
Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
On 17/05/2022 10:57, Petr Mladek wrote: > [...] --- a/drivers/misc/bcm-vk/bcm_vk_dev.c +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent) >>> [... snip ...] >>> It seems to reset some hardware or so. IMHO, it should go into the >>> pre-reboot list. >> >> Mixed feelings here, I'm looping Broadcom maintainers to comment. >> (CC Scott and Broadcom list) >> >> I'm afraid it breaks kdump if this device is not reset beforehand - it's >> a doorbell write, so not high risk I think... >> >> But in case the not-reset device can be probed normally in kdump kernel, >> then I'm fine in moving this to the reboot list! I don't have the HW to >> test myself. > > Good question. Well, it if has to be called before kdump then > even "hypervisor" list is a wrong place because is not always > called before kdump. Agreed! I'll defer that to Scott and Broadcom folks to comment. If it's not strictly necessary, I'll happily move it to the reboot list. If necessary, we could use the machine_crash_kexec() approach, but we'll fall into the case arm64 doesn't support it and I'm not sure if this device is available for arm - again a question for the maintainers. > [...] --- a/drivers/power/reset/ltc2952-poweroff.c +++ b/drivers/power/reset/ltc2952-poweroff.c >> [...] >> This is setting a variable only, and once it's set (data->kernel_panic >> is the bool's name), it just bails out the IRQ handler and a timer >> setting - this timer seems kinda tricky, so bailing out ASAP makes sense >> IMHO. > > IMHO, the timer informs the hardware that the system is still alive > in the middle of panic(). If the timer is not working then the > hardware (chip) will think that the system frozen in panic() > and will power off the system. See the comments in > drivers/power/reset/ltc2952-poweroff.c: > [ snip ...] > IMHO, we really have to keep it alive until we reach the reboot stage. > > Another question is how it actually works when the interrupts are > disabled during panic() and the timer callbacks are not handled. Agreed here! Guess I can move this one the reboot list, fine by me. Unless PM folks think otherwise. > [...] >> Disagree here, I'm CCing Florian for information. >> >> This notifier preserves RAM so it's *very interesting* if we have >> kmsg_dump() for example, but maybe might be also relevant in case kdump >> kernel is configured to store something in a persistent RAM (then, >> without this notifier, after kdump reboots the system data would be lost). > > I see. It is actually similar problem as with > drivers/firmware/google/gsmi.c. > > I does similar things like kmsg_dump() so it should be called in > the same location (after info notifier list and before kdump). > > A solution might be to put it at these notifiers at the very > end of the "info" list or make extra "dump" notifier list. Here I still disagree. I've commented in the other response thread (about Google gsmi) about the semantics of the hypervisor list, but again: this list should contain callbacks that (a) Should run early, _by default_ before a kdump; (b) Communicate with the firmware/hypervisor in a "low-risk" way; Imagine a scenario where users configure kdump kernel to save something in a persistent form in DRAM - it'd be like a late pstore, in the next kernel. This callback enables that, it's meant to inform FW "hey, panic happened, please from now on don't clear the RAM in the next FW-reboot". I don't see a reason to postpone that - let's see if the maintainers have an opinion. Cheers, Guilherme
Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
On 17/05/2022 10:28, Petr Mladek wrote: > [...] >>> Disagree here. I'm looping Google maintainers, so they can comment. >>> (CCed Evan, David, Julius) >>> >>> This notifier is clearly a hypervisor notification mechanism. I've fixed >>> a locking stuff there (in previous patch), I feel it's low-risk but even >>> if it's mid-risk, the class of such callback remains a perfect fit with >>> the hypervisor list IMHO. >> >> This logs a panic to our "eventlog", a tiny logging area in SPI flash >> for critical and power-related events. In some cases this ends up >> being the only clue we get in a Chromebook feedback report that a >> panic occurred, so from my perspective moving it to the front of the >> line seems like a good idea. > > IMHO, this would really better fit into the pre-reboot notifier list: > >+ the callback stores the log so it is similar to kmsg_dump() > or console_flush_on_panic() > >+ the callback should be proceed after "info" notifiers > that might add some other useful information. > > Honestly, I am not sure what exactly hypervisor callbacks do. But I > think that they do not try to extract the kernel log because they > would need to handle the internal format. > I guess the main point in your response is : "I am not sure what exactly hypervisor callbacks do". We need to be sure about the semantics of such list, and agree on that. So, my opinion about this first list, that we call "hypervisor list", is: it contains callbacks that (1) should run early, preferably before kdump (or even if kdump isn't set, should run ASAP); (2) these callbacks perform some communication with an abstraction that runs "below" the kernel, like a firmware or hypervisor. Classic example: pvpanic, that communicates with VMM (usually qemu) and allow such VMM to snapshot the full guest memory, for example. (3) Should be low-risk. What defines risk is the level of reliability of subsequent operations - if the callback have 50% of chance of "bricking" the system totally and prevent kdump / kmsg_dump() / reboot , this is high risk one for example. Some good fits IMO: pvpanic, sstate_panic_event() [sparc], fadump in powerpc, etc. So, this is a good case for the Google notifier as well - it's not collecting data like the dmesg (hence your second bullet seems to not apply here, info notifiers won't add info to be collected by gsmi). It is a firmware/hypervisor/whatever-gsmi-is notification mechanism, that tells such "lower" abstraction a panic occurred. It seems low risk and we want it to run ASAP, if possible. So, I'd like to keep it here, unless gsmi maintainers disagree or I'm perhaps misunderstanding the meaning of this first list. Cheers, Guilherme
Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path
On 17/05/2022 07:58, Petr Mladek wrote: > [...] >> Thanks for the review Petr. Patch was already merged - my goal was to be >> concise, i.e., a patch per driver / module, so the patch kinda fixes >> whatever I think is wrong with the driver with regards panic handling. >> >> Do you think it worth to remove this patch from Greg's branch just to >> split it in 2? Personally I think it's not worth, but opinions are welcome. > > No problem. It is not worth the effort. > OK, perfect! Cheers, Guilherme
Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()
On Sat, May 14, 2022 at 10:17:38AM +0200, Thomas Gleixner wrote: > On Fri, May 13 2022 at 14:19, Ricardo Neri wrote: > > On Fri, May 06, 2022 at 11:41:13PM +0200, Thomas Gleixner wrote: > >> The argument about not bloating the code > >> with an "obvious???" function which is quite small is slightly beyond my > >> comprehension level. > > > > That obvious function would look like this: > > > > void hpet_set_comparator_one_shot(int channel, u32 delta) > > { > > u32 count; > > > > count = hpet_readl(HPET_COUNTER); > > count += delta; > > hpet_writel(count, HPET_Tn_CMP(channel)); > > } > > This function only works reliably when the delta is large. See > hpet_clkevt_set_next_event(). That is a good point. One more reason to not have a hpet_set_comparator_one_shot(), IMO. Thanks and BR, Ricardo
Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote: > On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote: > > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > >> + /* > >> + * If in use, the HPET hardlockup detector relies on tsc_khz. > >> + * Reconfigure it to make use of the refined tsc_khz. > >> + */ > >> + lockup_detector_reconfigure(); > > > > I don't know if the API is conceptually good. > > > > You change something that the lockup detector is currently using, > > *while* the detector is running asynchronously, and then reconfigure > > it. What happens in the window? If this code is only used for small > > adjustments maybe it does not really matter but in principle it's > > a bad API to export. > > > > lockup_detector_reconfigure as an internal API is okay because it > > reconfigures things while the watchdog is stopped [actually that > > looks untrue for soft dog which uses watchdog_thresh in > > is_softlockup(), but that should be fixed]. > > > > You're the arch so you're allowed to stop the watchdog and configure > > it, e.g., hardlockup_detector_perf_stop() is called in arch/. > > > > So you want to disable HPET watchdog if it was enabled, then update > > wherever you're using tsc_khz, then re-enable. > > The real question is whether making this refined tsc_khz value > immediately effective matters at all. IMO, it does not because up to > that point the watchdog was happily using the coarse calibrated value > and the whole use TSC to assess whether the HPET fired mechanism is just > a guestimate anyway. So what's the point of trying to guess 'more > correct'. In some of my test systems I observed that, the TSC value does not fall within the expected error window the first time the HPET channel expires. I inferred that the error computed using the coarser tsc_khz was wrong. Recalculating the error window with refined tsc_khz would correct it. However, restarting the timer has the side-effect of kicking the timer and, therefore pushing the first HPET NMI further in the future. Perhaps kicking HPET channel, not recomputing the error window, corrected (masked?) the problem. I will investigate further and rework or drop this patch as needed. Thanks and BR, Ricardo
Re: [PATCH v3] PCI/AER: Handle Multi UnCorrectable/Correctable errors properly
On Wed, May 11, 2022 at 05:29:45PM -0700, Sathyanarayanan Kuppuswamy wrote: > > > On 5/11/22 4:40 PM, Bjorn Helgaas wrote: > > On Mon, Apr 18, 2022 at 03:02:37PM +, Kuppuswamy Sathyanarayanan wrote: > > > Currently the aer_irq() handler returns IRQ_NONE for cases without bits > > > PCI_ERR_ROOT_UNCOR_RCV or PCI_ERR_ROOT_COR_RCV are set. But this > > > assumption is incorrect. > > > > > > Consider a scenario where aer_irq() is triggered for a correctable > > > error, and while we process the error and before we clear the error > > > status in "Root Error Status" register, if the same kind of error > > > is triggered again, since aer_irq() only clears events it saw, the > > > multi-bit error is left in tact. This will cause the interrupt to fire > > > again, resulting in entering aer_irq() with just the multi-bit error > > > logged in the "Root Error Status" register. > > > > > > Repeated AER recovery test has revealed this condition does happen > > > and this prevents any new interrupt from being triggered. Allow to > > > process interrupt even if only multi-correctable (BIT 1) or > > > multi-uncorrectable bit (BIT 3) is set. > > > > > > Also note that, for cases with only multi-bit error is set, since this > > > is not the first occurrence of the error, PCI_ERR_ROOT_ERR_SRC may have > > > zero or some junk value. So we cannot cleanly process this error > > > information using aer_isr_one_error(). All we are attempting with this > > > fix is to make sure error interrupt processing can continue in this > > > scenario. > > > > > > This error can be reproduced by making following changes to the > > > aer_irq() function and by executing the given test commands. > > > > > > static irqreturn_t aer_irq(int irq, void *context) > > > struct aer_err_source e_src = {}; > > > > > > pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, > > > _src.status); > > > + pci_dbg(pdev->port, "Root Error Status: %04x\n", > > > + e_src.status); > > > if (!(e_src.status & AER_ERR_STATUS_MASK)) > > > > Do you mean > > > >if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) > > > > here? AER_ERR_STATUS_MASK would be after this fix. > > Yes. You are correct. Do you want me to update it and Fixes tag > and send next version? I moved the repro details to a bugzilla, updated the commit log as below, and applied to pci/error for v5.19, thanks! commit 203926da2bff ("PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits") Author: Kuppuswamy Sathyanarayanan Date: Mon Apr 18 15:02:37 2022 + PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits When a Root Port or Root Complex Event Collector receives an error Message e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status register and logs the Requester ID in the Error Source Identification register. If it receives a second ERR_COR Message before software clears PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the Requester ID is lost. In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared: - hardware receives ERR_COR message - hardware sets PCI_ERR_ROOT_COR_RCV - aer_irq() entered - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS) - aer_irq(): now status == PCI_ERR_ROOT_COR_RCV - hardware receives second ERR_COR message - hardware sets PCI_ERR_ROOT_MULTI_COR_RCV - aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status) - PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set - aer_irq() entered again - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS) - aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV - aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set - PCI_ERR_ROOT_MULTI_COR_RCV is still set The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV. Fix the problem by queueing an AER event and clearing the Root Error Status bits when any of these bits are set: PCI_ERR_ROOT_COR_RCV PCI_ERR_ROOT_UNCOR_RCV PCI_ERR_ROOT_MULTI_COR_RCV PCI_ERR_ROOT_MULTI_UNCOR_RCV See the bugzilla link for details from Eric about how to reproduce this problem. [bhelgaas: commit log, move repro details to bugzilla] Fixes: e167bfcaa4cd ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992 Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppusw...@linux.intel.com Reported-by: Eric Badger Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Bjorn Helgaas Reviewed-by: Ashok Raj
Re: [PATCH] net: unexport csum_and_copy_{from,to}_user
On Thu, Apr 21, 2022 at 09:04:40AM +0200, Christoph Hellwig wrote: > csum_and_copy_from_user and csum_and_copy_to_user are exported by > a few architectures, but not actually used in modular code. Drop > the exports. > > Signed-off-by: Christoph Hellwig Acked-by: Al Viro Not sure which tree should it go through - Arnd's, perhaps?
Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote: > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > > The HPET hardlockup detector relies on tsc_khz to estimate the value of > > that the TSC will have when its HPET channel fires. A refined tsc_khz > > helps to estimate better the expected TSC value. > > > > Using the early value of tsc_khz may lead to a large error in the expected > > TSC value. Restarting the NMI watchdog detector has the effect of kicking > > its HPET channel and make use of the refined tsc_khz. > > > > When the HPET hardlockup is not in use, restarting the NMI watchdog is > > a noop. > > > > Cc: Andi Kleen > > Cc: Stephane Eranian > > Cc: "Ravi V. Shankar" > > Cc: io...@lists.linux-foundation.org > > Cc: linuxppc-dev@lists.ozlabs.org > > Cc: x...@kernel.org > > Signed-off-by: Ricardo Neri > > --- > > Changes since v5: > > * Introduced this patch > > > > Changes since v4 > > * N/A > > > > Changes since v3 > > * N/A > > > > Changes since v2: > > * N/A > > > > Changes since v1: > > * N/A > > --- > > arch/x86/kernel/tsc.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > > index cafacb2e58cc..cc1843044d88 100644 > > --- a/arch/x86/kernel/tsc.c > > +++ b/arch/x86/kernel/tsc.c > > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct > > work_struct *work) > > /* Inform the TSC deadline clockevent devices about the recalibration */ > > lapic_update_tsc_freq(); > > > > + /* > > +* If in use, the HPET hardlockup detector relies on tsc_khz. > > +* Reconfigure it to make use of the refined tsc_khz. > > +*/ > > + lockup_detector_reconfigure(); > > I don't know if the API is conceptually good. > > You change something that the lockup detector is currently using, > *while* the detector is running asynchronously, and then reconfigure > it. Yes, this is what happens. > What happens in the window? If this code is only used for small > adjustments maybe it does not really matter Please see my comment > but in principle it's a bad API to export. > > lockup_detector_reconfigure as an internal API is okay because it > reconfigures things while the watchdog is stopped I see. > [actually that looks untrue for soft dog which uses watchdog_thresh in > is_softlockup(), but that should be fixed]. Perhaps there should be a watchdog_thresh_user. When the user updates it, the detector is stopped, watchdog_thresh is updated, and then the detector is resumed. > > You're the arch so you're allowed to stop the watchdog and configure > it, e.g., hardlockup_detector_perf_stop() is called in arch/. I had it like this but it did not look right to me. You are right, however, I can stop/restart the watchdog when needed. Thanks and BR, Ricardo
[powerpc:next-test 139/145] arch/powerpc/sysdev/mpc5xxx_clocks.c:26:2: error: call to undeclared function 'fwnode_for_each_parent_node'; ISO C99 and later do not support implicit function declarations
tree: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test head: 2e4a9942261f89ad204a8189634029a4b1f0efb6 commit: f9724684ea2198caaab132514ca104175ed2f15f [139/145] powerpc/mpc5xxx: Switch mpc5xxx_get_bus_frequency() to use fwnode config: powerpc-mpc512x_defconfig (https://download.01.org/0day-ci/archive/20220518/202205180443.cfi2jikj-...@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 853fa8ee225edf2d0de94b0dcbd31bea916e825e) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc cross compiling tool for clang build # apt-get install binutils-powerpc-linux-gnu # https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=f9724684ea2198caaab132514ca104175ed2f15f git remote add powerpc https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git git fetch --no-tags powerpc next-test git checkout f9724684ea2198caaab132514ca104175ed2f15f # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> arch/powerpc/sysdev/mpc5xxx_clocks.c:26:2: error: call to undeclared >> function 'fwnode_for_each_parent_node'; ISO C99 and later do not support >> implicit function declarations [-Wimplicit-function-declaration] fwnode_for_each_parent_node(fwnode, parent) { ^ >> arch/powerpc/sysdev/mpc5xxx_clocks.c:26:45: error: expected ';' after >> expression fwnode_for_each_parent_node(fwnode, parent) { ^ ; 2 errors generated. vim +/fwnode_for_each_parent_node +26 arch/powerpc/sysdev/mpc5xxx_clocks.c 8 9 /** 10 * mpc5xxx_fwnode_get_bus_frequency - Find the bus frequency for a firmware node 11 * @fwnode: firmware node 12 * 13 * Returns bus frequency (IPS on MPC512x, IPB on MPC52xx), 14 * or 0 if the bus frequency cannot be found. 15 */ 16 unsigned long mpc5xxx_fwnode_get_bus_frequency(struct fwnode_handle *fwnode) 17 { 18 struct fwnode_handle *parent; 19 u32 bus_freq; 20 int ret; 21 22 ret = fwnode_property_read_u32(fwnode, "bus-frequency", _freq); 23 if (!ret) 24 return bus_freq; 25 > 26 fwnode_for_each_parent_node(fwnode, parent) { -- 0-DAY CI Kernel Test Service https://01.org/lkp
Looking for old boards with Galileo / Marvell Discovery chips on them.
Hello. Currently looking for old EV64260 and EV64360 development boards. If you have any to sell, please contact me. Tried contacting Mark Greer of list, but not sure if he got any of my email. Oh, also looking for a BDI2000 or BDI3000 if you have one to sell. Thank you. -Steve OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category
On Mon, May 09, 2022 at 03:59:40PM +0200, Thomas Gleixner wrote: > On Thu, May 05 2022 at 17:00, Ricardo Neri wrote: > > Add a NMI_WATCHDOG as a new category of NMI handler. This new category > > is to be used with the HPET-based hardlockup detector. This detector > > does not have a direct way of checking if the HPET timer is the source of > > the NMI. Instead, it indirectly estimates it using the time-stamp counter. > > > > Therefore, we may have false-positives in case another NMI occurs within > > the estimated time window. For this reason, we want the handler of the > > detector to be called after all the NMI_LOCAL handlers. A simple way > > of achieving this with a new NMI handler category. > > > > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs > > *regs) > > } > > raw_spin_unlock(_reason_lock); > > > > + handled = nmi_handle(NMI_WATCHDOG, regs); > > + if (handled == NMI_HANDLED) > > + goto out; > > + > > How is this supposed to work reliably? > > If perf is active and the HPET NMI and the perf NMI come in around the > same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog > won't be checked. Because MSI is strictly edge and the message is only > sent once, this can result in a stale watchdog, no? This is true. Instead, at the end of each NMI I should _also_ check if the TSC is within the expected value of the HPET NMI watchdog. In this way, unrelated NMIs (e.g., perf NMI) are handled and we don't miss the NMI from the HPET channel. Thanks and BR, Ricardo
[powerpc:next-test 141/145] arch/powerpc/kernel/trace/ftrace.c:714:6: error: redefinition of 'ftrace_free_init_tramp'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test head: 2e4a9942261f89ad204a8189634029a4b1f0efb6 commit: 0cad223e88229b1f021f9e167bdb2ba7b0d0c369 [141/145] powerpc/ftrace: Remove ftrace init tramp once kernel init is complete config: powerpc-buildonly-randconfig-r005-20220516 (https://download.01.org/0day-ci/archive/20220518/202205180129.raxtjqf6-...@intel.com/config) compiler: powerpc-linux-gcc (GCC) 11.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=0cad223e88229b1f021f9e167bdb2ba7b0d0c369 git remote add powerpc https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git git fetch --no-tags powerpc next-test git checkout 0cad223e88229b1f021f9e167bdb2ba7b0d0c369 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> arch/powerpc/kernel/trace/ftrace.c:714:6: error: redefinition of >> 'ftrace_free_init_tramp' 714 | void ftrace_free_init_tramp(void) | ^~ In file included from include/linux/ftrace.h:23, from arch/powerpc/kernel/trace/ftrace.c:20: arch/powerpc/include/asm/ftrace.h:119:20: note: previous definition of 'ftrace_free_init_tramp' with type 'void(void)' 119 | static inline void ftrace_free_init_tramp(void) { } |^~ vim +/ftrace_free_init_tramp +714 arch/powerpc/kernel/trace/ftrace.c 713 > 714 void ftrace_free_init_tramp(void) 715 { 716 int i; 717 718 for (i = 0; i < NUM_FTRACE_TRAMPS && ftrace_tramps[i]; i++) 719 if (ftrace_tramps[i] == (unsigned long)ftrace_tramp_init) { 720 ftrace_tramps[i] = 0; 721 return; 722 } 723 } 724 -- 0-DAY CI Kernel Test Service https://01.org/lkp
Re: [PATCH v2 4/4] powerpc/52xx: Convert to use fwnode API
On Tue, May 17, 2022 at 09:38:56AM +1000, Michael Ellerman wrote: > Andy Shevchenko writes: > > On Mon, May 16, 2022 at 05:05:12PM +0300, Andy Shevchenko wrote: > >> On Mon, May 16, 2022 at 11:48:05PM +1000, Michael Ellerman wrote: > >> > Andy Shevchenko writes: > >> > > We may convert the GPT driver to use fwnode API for the sake > >> > > of consistency of the used APIs inside the driver. > >> > > >> > I'm not sure about this one. > >> > > >> > It's more consistent to use fwnode in this driver, but it's very > >> > inconsistent with the rest of the powerpc code. We have basically no > >> > uses of the fwnode APIs at the moment. > >> > >> Fair point! > >> > >> > It seems like a pretty straight-forward conversion, but there could > >> > easily be a bug in there, I don't have any way to test it. Do you? > >> > >> Nope, only compile testing. The important part of this series is to > >> clean up of_node from GPIO library, so since here it's a user of > >> it I want to do that. This patch is just ad-hoc conversion that I > >> noticed is possible. But there is no any requirement to do so. > >> > >> Lemme drop this from v3. > > > > I just realize that there is no point to send a v3. You can just apply > > first 3 patches. Or is your comment against entire series? > > No, my comment is just about this patch. > > I don't mind converting to new APIs when it's blocking some other > cleanup. But given the age of this code I think it's probably better to > just leave the rest of it as-is, unless someone volunteers to test it. > > So yeah I'll just take patches 1-3 of this v2 series, no need to resend. Thanks! One note though, the fwnode_for_each_parent_node() is not yet available in upstream, but will be after v5.19-rc1. It means the patch 3 can't be applied without that. That's why LKP complained on patch 4 in this series. That said, the easiest way is to postpone it till v5.19-rc1 is out. -- With Best Regards, Andy Shevchenko
Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
Looking at this the pr_err is absolutely needed. If an unsupported case winds up in the purgatory blob and the code can't handle it things will fail silently much worse later. So the proposed patch is unfortunately the wrong direction. "Naveen N. Rao" writes: > Baoquan He wrote: >> On 04/25/22 at 11:11pm, Naveen N. Rao wrote: >>> kexec_load_purgatory() can fail for many reasons - there is no need to >>> print an error when encountering unsupported relocations. >>> This solves a build issue on powerpc with binutils v2.36 and newer [1]. >>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section >>> symbols") [2], binutils started dropping section symbols that it thought >> I am not familiar with binutils, while wondering if this exists in other >> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we >> have problem with it? > > I'm not aware of this specific file causing a problem on other architectures - > perhaps the config options differ enough. There are however more reports of > similar issues affecting other architectures with the llvm integrated > assembler: > https://github.com/ClangBuiltLinux/linux/issues/981 > >> >>> were unused. This isn't an issue in general, but with kexec_file.c, gcc >>> is placing kexec_arch_apply_relocations[_add] into a separate >>> .text.unlikely section and the section symbol ".text.unlikely" is being >>> dropped. Due to this, recordmcount is unable to find a non-weak symbol >> But arch_kexec_apply_relocations_add is weak symbol on ppc. > > Yes. Note that it is just the section symbol that gets dropped. The section is > still present and will continue to hold the symbols for the functions > themselves. So we have a case where binutils thinks it is doing something useful and our kernel specific tool gets tripped up by it. Reading the recordmcount code it looks like it is finding any symbol within a section but ignoring weak symbols. So I suspect the only remaining symbol in the section is __weak and that confuses recordmcount. Does removing the __weak annotation on those functions fix the build error? If so we can restructure the kexec code to simply not use __weak symbols. Otherwise the fix needs to be in recordmcount or binutils, and we should loop whoever maintains recordmcount in to see what they can do. Eric
Re: [PATCH 17/30] tracing: Improve panic/die notifiers
On 11/05/2022 08:45, Petr Mladek wrote: > [...] > DIE_OOPS and PANIC_NOTIFIER are from different enum. > It feels like comparing apples with oranges here. > > IMHO, the proper way to unify the two notifiers is > a check of the @self parameter. Something like: > > static int trace_die_panic_handler(struct notifier_block *self, > unsigned long ev, void *unused) > { > if (self == trace_die_notifier && val != DIE_OOPS) > goto out; > > ftrace_dump(ftrace_dump_on_oops); > out: > return NOTIFY_DONE; > } > > Best Regards, > Petr OK Petr, thanks - will implement your suggestion in V2 (CC Steven) Cheers!
Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers
On 10/05/2022 12:28, Petr Mladek wrote: > [...] > IMHO, the check of the @self parameter was the proper solution. > > "gisb_die_notifier" list uses @val from enum die_val. > "gisb_panic_notifier" list uses @val from enum panic_notifier_val. > > These are unrelated types. It might easily break when > someone defines the same constant also in enum die_val. > > Best Regards, > Petr OK Petr, I'll drop this idea for V2 - will just remove the useless header / prototype then. (CC Florian) Cheers, Guilherme
Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks
On 17/05/2022 10:11, Petr Mladek wrote: > [...] >> You mentioned 2 cases: >> >> (a) Same notifier_list used in different situations; >> >> (b) Same *notifier callback* used in different lists; >> >> Mine is case (b), right? Can you show me an example of case (a)? > > There are many examples of case (a): > > [... snip ...] > These all call the same list/chain in different situations. > The situation is distinguished by @val. > > >> You can see in the following patches (or grep the kernel) that people are >> using >> this identification parameter to determine which kind of OOPS trigger >> the callback to condition the execution of the function to specific >> cases. > > Could you please show me some existing code for case (b)? > I am not able to find any except in your patches. > Hi Petr, thanks for the examples - I agree with you. In the end, seems I'm kind of abusing the API. This id is used to distinguish different situations in which the callback is called, but in the same "realm"/notifier list. In my case I have different list calling the same callback and (ab-)using the id to make distinction. I can rework the patches using pointer comparison, it's fine =) So, I'll drop this patch in V2. > Anyway, the solution in 16th patch is bad, definitely. > hv_die_panic_notify_crash() uses "val" to disinguish > both: > > + "panic_notifier_list" vs "die_chain" > + die_val when callen via "die_chain" > > The API around "die_chain" API is not aware of enum panic_notifier_val > and the API using "panic_notifier_list" is not aware of enum die_val. > As I said, it is mixing apples and oranges and it is error prone. > OK, I'll re-work that patch - there's more there to be changed, that one is complex heheh Cheers!
Re: [PATCH 2/2] powerpc/perf: Fix the threshold compare group constraint for power9
> On 06-May-2022, at 11:40 AM, Kajol Jain wrote: > > Thresh compare bits for a event is used to program thresh compare > field in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9). > When scheduling events as a group, all events in that group should > match value in threshold bits (like thresh compare, thresh control, > thresh select). Otherwise event open for the sibling events should fail. > But in the current code, incase thresh compare bits are not valid, > we are not failing in group_constraint function which can result > in invalid group schduling. > > Fix the issue by returning -1 incase event is threshold and threshold > compare value is not valid. > > Thresh control bits in the event code is used to program thresh_ctl > field in Monitor Mode Control Register A (MMCRA: 48-55). In below example, > the scheduling of group events PM_MRK_INST_CMPL (873534401e0) and > PM_THRESH_MET (8734340101ec) is expected to fail as both event > request different thresh control bits and invalid thresh compare value. > > Result before the patch changes: > > [command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1 > > Performance counter stats for 'sleep 1': > >11,048 r8735340401e0 > 1,967 r8734340101ec > > 1.001354036 seconds time elapsed > > 0.001421000 seconds user > 0.0 seconds sys > > Result after the patch changes: > > [command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1 > Error: > The sys_perf_event_open() syscall returned with 22 (Invalid argument) > for event (r8735340401e0). > /bin/dmesg | grep -i perf may provide additional information. > > Fixes: 78a16d9fc1206 ("powerpc/perf: Avoid FAB_*_MATCH checks for power9") > Signed-off-by: Kajol Jain Reviewed-by: Athira Rajeev Thanks Athira > --- > arch/powerpc/perf/isa207-common.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/perf/isa207-common.c > b/arch/powerpc/perf/isa207-common.c > index 013b06af6fe6..bb5d64862bc9 100644 > --- a/arch/powerpc/perf/isa207-common.c > +++ b/arch/powerpc/perf/isa207-common.c > @@ -508,7 +508,8 @@ int isa207_get_constraint(u64 event, unsigned long > *maskp, unsigned long *valp, > if (event_is_threshold(event) && is_thresh_cmp_valid(event)) { > mask |= CNST_THRESH_MASK; > value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT); > - } > + } else if (event_is_threshold(event)) > + return -1; > } else { > /* >* Special case for PM_MRK_FAB_RSP_MATCH and > PM_MRK_FAB_RSP_MATCH_CYC, > -- > 2.31.1 >
Re: [PATCH] powerpc/irq: Remove arch_local_irq_restore() for !CONFIG_CC_HAS_ASM_GOTO
Christophe Leroy writes: > All supported versions of GCC support asm goto. I thought clang was the one that only recently added support for asm goto. Apparently clang added support in 2019, in clang 9. The earliest clang we claim to support is 11. So this patch is good, I'll just adjust the change log to say GCC/clang. cheers > diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c > index c768cde03e36..dd09919c3c66 100644 > --- a/arch/powerpc/kernel/irq.c > +++ b/arch/powerpc/kernel/irq.c > @@ -216,7 +216,6 @@ static inline void replay_soft_interrupts_irqrestore(void) > #define replay_soft_interrupts_irqrestore() replay_soft_interrupts() > #endif > > -#ifdef CONFIG_CC_HAS_ASM_GOTO > notrace void arch_local_irq_restore(unsigned long mask) > { > unsigned char irq_happened; > @@ -312,82 +311,6 @@ notrace void arch_local_irq_restore(unsigned long mask) > __hard_irq_enable(); > preempt_enable(); > } > -#else > -notrace void arch_local_irq_restore(unsigned long mask) > -{ > - unsigned char irq_happened; > - > - /* Write the new soft-enabled value */ > - irq_soft_mask_set(mask); > - if (mask) > - return; > - > - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) > - WARN_ON_ONCE(in_nmi() || in_hardirq()); > - > - /* > - * From this point onward, we can take interrupts, preempt, > - * etc... unless we got hard-disabled. We check if an event > - * happened. If none happened, we know we can just return. > - * > - * We may have preempted before the check below, in which case > - * we are checking the "new" CPU instead of the old one. This > - * is only a problem if an event happened on the "old" CPU. > - * > - * External interrupt events will have caused interrupts to > - * be hard-disabled, so there is no problem, we > - * cannot have preempted. > - */ > - irq_happened = get_irq_happened(); > - if (!irq_happened) { > - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) > - WARN_ON_ONCE(!(mfmsr() & MSR_EE)); > - return; > - } > - > - /* We need to hard disable to replay. */ > - if (!(irq_happened & PACA_IRQ_HARD_DIS)) { > - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) > - WARN_ON_ONCE(!(mfmsr() & MSR_EE)); > - __hard_irq_disable(); > - local_paca->irq_happened |= PACA_IRQ_HARD_DIS; > - } else { > - /* > - * We should already be hard disabled here. We had bugs > - * where that wasn't the case so let's dbl check it and > - * warn if we are wrong. Only do that when IRQ tracing > - * is enabled as mfmsr() can be costly. > - */ > - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) { > - if (WARN_ON_ONCE(mfmsr() & MSR_EE)) > - __hard_irq_disable(); > - } > - > - if (irq_happened == PACA_IRQ_HARD_DIS) { > - local_paca->irq_happened = 0; > - __hard_irq_enable(); > - return; > - } > - } > - > - /* > - * Disable preempt here, so that the below preempt_enable will > - * perform resched if required (a replayed interrupt may set > - * need_resched). > - */ > - preempt_disable(); > - irq_soft_mask_set(IRQS_ALL_DISABLED); > - trace_hardirqs_off(); > - > - replay_soft_interrupts_irqrestore(); > - local_paca->irq_happened = 0; > - > - trace_hardirqs_on(); > - irq_soft_mask_set(IRQS_ENABLED); > - __hard_irq_enable(); > - preempt_enable(); > -} > -#endif > EXPORT_SYMBOL(arch_local_irq_restore); > > /* > -- > 2.35.1
Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()
Christophe Leroy writes: > Le 15/05/2022 à 12:28, Michael Ellerman a écrit : >> On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote: >>> This series reduces by 70% the time required to activate >>> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX. >>> >>> Measure is performed in function ftrace_replace_code() using mftb() >>> around the loop. >>> >>> With the series, >>> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured. >>> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured. >>> >>> [...] >> >> Patches 1, 3 and 4 applied to powerpc/next. >> >> [1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without >> CONFIG_MODULES >> >> https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596 >> [3/4] powerpc/code-patching: Use jump_label for testing freed initmem >> >> https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6 >> [4/4] powerpc/code-patching: Use jump_label to check if poking_init() is done >> >> https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a >> > > Patch 2 was the keystone of this series. What happened to it ? It broke on 64-bit. I think I know why but I haven't had time to test it. Will try and get it fixed in the next day or two. cheers
Re: [PATCH] powerpc/vdso: Fix incorrect CFI in gettimeofday.S
"Naveen N. Rao" writes: > Michael Ellerman wrote: >> >> diff --git a/arch/powerpc/kernel/vdso/gettimeofday.S >> b/arch/powerpc/kernel/vdso/gettimeofday.S >> index eb9c81e1c218..0aee255e9cbb 100644 >> --- a/arch/powerpc/kernel/vdso/gettimeofday.S >> +++ b/arch/powerpc/kernel/vdso/gettimeofday.S >> @@ -22,12 +22,15 @@ >> .macro cvdso_call funct call_time=0 >>.cfi_startproc >> PPC_STLUr1, -PPC_MIN_STKFRM(r1) >> + .cfi_adjust_cfa_offset PPC_MIN_STKFRM >> mflrr0 >> - .cfi_register lr, r0 >> PPC_STLUr1, -PPC_MIN_STKFRM(r1) >> + .cfi_adjust_cfa_offset PPC_MIN_STKFRM >> PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1) > > > >> @@ -46,6 +50,7 @@ >> mtlrr0 >>.cfi_restore lr >> addir1, r1, 2 * PPC_MIN_STKFRM >> + .cfi_def_cfa_offset 0 > > Should this be .cfi_adjust_cfa_offset, given that we used that at the > start of the function? AIUI "adjust x" is offset += x, whereas "def x" is offset = x. So we could use adjust here, but we'd need to adjust by -(2 * PPC_MIN_STKFRM). It seemed clearer to just set the offset back to 0, which is what it is at the start of the function. But I'm not a CFI expert at all, so I'll defer to anyone else who has an opinion :) cheers
Re: [PATCH -next] powerpc/book3e: Fix build error
Le 17/05/2022 à 11:48, YueHaibing a écrit : > arch/powerpc/mm/nohash/fsl_book3e.c: In function ‘relocate_init’: > arch/powerpc/mm/nohash/fsl_book3e.c:348:2: error: implicit declaration of > function ‘early_get_first_memblock_info’ > [-Werror=implicit-function-declaration] >early_get_first_memblock_info(__va(dt_ptr), ); >^ > > Add missing include file linux/of_fdt.h to fix this. > > Signed-off-by: YueHaibing Thats for fixing that. Reviewed-by: Christophe Leroy It means we don't have any defconfig for 32 bits booke with CONFIG_RELOCATABLE ? > --- > arch/powerpc/mm/nohash/fsl_book3e.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/mm/nohash/fsl_book3e.c > b/arch/powerpc/mm/nohash/fsl_book3e.c > index 08a984e29433..093da4dbdee1 100644 > --- a/arch/powerpc/mm/nohash/fsl_book3e.c > +++ b/arch/powerpc/mm/nohash/fsl_book3e.c > @@ -36,6 +36,7 @@ > #include > #include > #include > +#include > > #include > #include
Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
Baoquan He wrote: On 04/25/22 at 11:11pm, Naveen N. Rao wrote: kexec_load_purgatory() can fail for many reasons - there is no need to print an error when encountering unsupported relocations. This solves a build issue on powerpc with binutils v2.36 and newer [1]. Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [2], binutils started dropping section symbols that it thought I am not familiar with binutils, while wondering if this exists in other ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we have problem with it? I'm not aware of this specific file causing a problem on other architectures - perhaps the config options differ enough. There are however more reports of similar issues affecting other architectures with the llvm integrated assembler: https://github.com/ClangBuiltLinux/linux/issues/981 were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol But arch_kexec_apply_relocations_add is weak symbol on ppc. Yes. Note that it is just the section symbol that gets dropped. The section is still present and will continue to hold the symbols for the functions themselves. in .text.unlikely to generate a relocation record against. Dropping pr_err() calls results in these functions being left in .text section, Why dropping pr_err() can make arch_kexec_apply_relocations_add put in .text? I'm not actually sure, though Josh suspected that printk() might be cold: http://lkml.kernel.org/r/20210214155147.3owdimqv2lyhu6by@treble - Naveen
Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
On 04/25/22 at 11:11pm, Naveen N. Rao wrote: > kexec_load_purgatory() can fail for many reasons - there is no need to > print an error when encountering unsupported relocations. > > This solves a build issue on powerpc with binutils v2.36 and newer [1]. > Since commit d1bcae833b32f1 ("ELF: Don't generate unused section > symbols") [2], binutils started dropping section symbols that it thought I am not familiar with binutils, while wondering if this exists in other ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we have problem with it? > were unused. This isn't an issue in general, but with kexec_file.c, gcc > is placing kexec_arch_apply_relocations[_add] into a separate > .text.unlikely section and the section symbol ".text.unlikely" is being > dropped. Due to this, recordmcount is unable to find a non-weak symbol But arch_kexec_apply_relocations_add is weak symbol on ppc. > in .text.unlikely to generate a relocation record against. Dropping > pr_err() calls results in these functions being left in .text section, Why dropping pr_err() can make arch_kexec_apply_relocations_add put in .text? > enabling recordmcount to emit a proper relocation record. > > [1] https://github.com/linuxppc/issues/issues/388 > [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 > > Signed-off-by: Naveen N. Rao > --- > kernel/kexec_file.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index 8347fc158d2b96..55d144c58b5278 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -121,7 +121,6 @@ int __weak > arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr > *section, >const Elf_Shdr *relsec, const Elf_Shdr *symtab) > { > - pr_err("RELA relocation unsupported.\n"); > return -ENOEXEC; > } > > @@ -138,7 +137,6 @@ int __weak > arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section, >const Elf_Shdr *relsec, const Elf_Shdr *symtab) > { > - pr_err("REL relocation unsupported.\n"); > return -ENOEXEC; > } > > > base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707 > -- > 2.35.1 >
[Bug 215389] pagealloc: memory corruption at building glibc-2.33 and running its' testsuite
https://bugzilla.kernel.org/show_bug.cgi?id=215389 --- Comment #26 from Christophe Leroy (christophe.le...@csgroup.eu) --- Note that THREAD_SHIFT is set to 14 when using KASAN: config THREAD_SHIFT int "Thread shift" if EXPERT range 13 15 default "15" if PPC_256K_PAGES default "14" if PPC64 default "14" if KASAN default "13" help Used to define the stack size. The default is almost always what you want. Only change this if you know what you are doing. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215389] pagealloc: memory corruption at building glibc-2.33 and running its' testsuite
https://bugzilla.kernel.org/show_bug.cgi?id=215389 --- Comment #25 from Christophe Leroy (christophe.le...@csgroup.eu) --- The Kernel stack overflow looks odd. Value of R1 is wrong and LR is NULL. Don't know how we ended up here, but probably not by a real stack overflow. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215389] pagealloc: memory corruption at building glibc-2.33 and running its' testsuite
https://bugzilla.kernel.org/show_bug.cgi?id=215389 --- Comment #24 from Christophe Leroy (christophe.le...@csgroup.eu) --- Seems like with Inline KASAN your kernel is far too big compared to what we support at the time being: c2468000 T __end_rodata c280 T __init_begin c280 T _sinittext c2801644 T prom_init The init text section is behind the 32Mbytes boundary, it means that prom_init and other functions are not called anymore directly but via a trampoline. c00c <__start>: c00c: 2c 05 00 00 cmpwi r5,0 c010: 41 82 00 1c beq c02c <__start+0x20> c014: 42 9f 00 05 bcl 20,4*cr7+so,c018 <__start+0xc> c018: 7d 08 02 a6 mflrr8 c01c: 3d 08 00 00 addis r8,r8,0 c020: 39 08 ff e8 addir8,r8,-24 c024: 48 00 38 e5 bl c0003908 ... c0003908: 3d 80 c2 80 lis r12,-15744 c000390c: 39 8c 16 44 addir12,r12,5700 c0003910: 7d 89 03 a6 mtctr r12 c0003914: 4e 80 04 20 bctr And it cannot work because at that time the kernel is not yet relocated to its final location. There was the same problem with PPC64 and it was fix by 24d33ac5b8ff ("powerpc/64s: Make prom_init require RELOCATABLE"). Don't know if a similar approach could work. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH 3/3] [RFC] powerpc: Book3S 64-bit outline-only KASAN support
Le 17/05/2022 à 09:31, Paul Mackerras a écrit : > On Sun, May 15, 2022 at 07:33:52AM +, Christophe Leroy wrote: >> >> >> Le 11/05/2022 à 09:28, Paul Mackerras a écrit : >>> From: Daniel Axtens >>> >>> Implement a limited form of KASAN for Book3S 64-bit machines running under >>> the Radix MMU, supporting only outline mode. > > [snip] > >>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >>> index b779603978e1..0bf76e40c9c2 100644 >>> --- a/arch/powerpc/Kconfig >>> +++ b/arch/powerpc/Kconfig >>> @@ -105,6 +105,7 @@ config PPC >>> # Please keep this list sorted alphabetically. >>> # >>> select ARCH_32BIT_OFF_T if PPC32 >>> + select ARCH_DISABLE_KASAN_INLINEif PPC_RADIX_MMU >>> select ARCH_ENABLE_MEMORY_HOTPLUG >>> select ARCH_ENABLE_MEMORY_HOTREMOVE >>> select ARCH_HAS_COPY_MC if PPC64 >>> @@ -152,6 +153,7 @@ config PPC >>> select ARCH_WANT_IPC_PARSE_VERSION >>> select ARCH_WANT_IRQS_OFF_ACTIVATE_MM >>> select ARCH_WANT_LD_ORPHAN_WARN >>> + select ARCH_WANTS_NO_INSTR >> >> Can you explain why we need that ? > > The help text for the option says: > > An architecture should select this if the noinstr macro is being used on > functions to denote that the toolchain should avoid instrumenting such > functions and is required for correctness. > > All it really seems to do is restrict the conditions under which the > GCOV and KCOV options can be enabled. > >> Is it tied to KASAN ? Is yes, why didn't we have it for PPC32 until now ? > > Since we really do need to avoid instrumenting certain functions on > ppc64 (as in things will break if we do instrument them), I think we > need to select ARCH_WANTS_NO_INSTR. > > For ppc32, as far as I recall there is much less code that runs in > real mode and it is mostly assembler (except for some boot code), > mostly because all addresses have to be explicitly translated to > physical addresses for 32-bit real-mode code, unlike ppc64 where we > can use access memory in the linear mapping using virtual addresses > because of the fact that the CPU ignores the top 4 bits of the > effective address in real mode. That said, there is a lot less code > that runs in real mode on ppc64 than there used to be. > >> Maybe that's independant of KASAN and would be worth a separate patch ? > > Yes, possibly, though KASAN does appear to be the only user of noinstr > in arch/powerpc. > > [snip] > >>> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h >>> b/arch/powerpc/include/asm/book3s/64/radix.h >>> index d090d9612348..bafc9869afcd 100644 >>> --- a/arch/powerpc/include/asm/book3s/64/radix.h >>> +++ b/arch/powerpc/include/asm/book3s/64/radix.h >>> @@ -35,6 +35,11 @@ >>>#define RADIX_PMD_SHIFT (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE) >>>#define RADIX_PUD_SHIFT (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE) >>>#define RADIX_PGD_SHIFT (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE) >>> + >>> +#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE) >>> +#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE) >>> +#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE) >>> + >>>/* >>> * Size of EA range mapped by our pagetables. >>> */ >>> @@ -68,11 +73,11 @@ >>> * >>> * >>> * 3rd quadrant expanded: >>> - * +--+ >>> + * +--+ Highest address (0xc010) >>> + * +--+ KASAN shadow end (0xc00fc000) >>> * | | >>> * | | >>> - * | | >>> - * +--+ Kernel vmemmap end >>> (0xc010) >>> + * +--+ Kernel vmemmap end/shadow start >>> (0xc00e) >>> * | | >>> * | 512TB | >>> * | | >>> @@ -126,6 +131,8 @@ >>>#define RADIX_VMEMMAP_SIZE RADIX_KERN_MAP_SIZE >>>#define RADIX_VMEMMAP_END(RADIX_VMEMMAP_START + >>> RADIX_VMEMMAP_SIZE) >>> >>> +/* For the sizes of the shadow area, see kasan.h */ >>> + >> >> Why does this comment pops up here ? > > Do you mean, why is it there at all, or do you mean why is it in that > particular place rather than closer to the "3rd quadrant expanded" > diagram or something? Why at this place mainly, I mean I can't see the relationship between the added comment and the following lines starting with a #ifndef ASSEMBLY. If the comment is to be usefull, it should be added to a related place I guess. > >>>#ifndef __ASSEMBLY__ >>>#define RADIX_PTE_TABLE_SIZE (sizeof(pte_t) << RADIX_PTE_INDEX_SIZE) >>>#define RADIX_PMD_TABLE_SIZE (sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE) >>> diff --git a/arch/powerpc/include/asm/interrupt.h >>> b/arch/powerpc/include/asm/interrupt.h >>> index
Re: [PATCH] powerpc/vdso: Fix incorrect CFI in gettimeofday.S
Michael Ellerman wrote: diff --git a/arch/powerpc/kernel/vdso/gettimeofday.S b/arch/powerpc/kernel/vdso/gettimeofday.S index eb9c81e1c218..0aee255e9cbb 100644 --- a/arch/powerpc/kernel/vdso/gettimeofday.S +++ b/arch/powerpc/kernel/vdso/gettimeofday.S @@ -22,12 +22,15 @@ .macro cvdso_call funct call_time=0 .cfi_startproc PPC_STLUr1, -PPC_MIN_STKFRM(r1) + .cfi_adjust_cfa_offset PPC_MIN_STKFRM mflrr0 - .cfi_register lr, r0 PPC_STLUr1, -PPC_MIN_STKFRM(r1) + .cfi_adjust_cfa_offset PPC_MIN_STKFRM PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1) @@ -46,6 +50,7 @@ mtlrr0 .cfi_restore lr addir1, r1, 2 * PPC_MIN_STKFRM + .cfi_def_cfa_offset 0 Should this be .cfi_adjust_cfa_offset, given that we used that at the start of the function? - Naveen
Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
Naveen N. Rao wrote: kexec_load_purgatory() can fail for many reasons - there is no need to print an error when encountering unsupported relocations. This solves a build issue on powerpc with binutils v2.36 and newer [1]. Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [2], binutils started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Dropping pr_err() calls results in these functions being left in .text section, enabling recordmcount to emit a proper relocation record. [1] https://github.com/linuxppc/issues/issues/388 [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 Signed-off-by: Naveen N. Rao --- kernel/kexec_file.c | 2 -- 1 file changed, 2 deletions(-) Eric, Any comments on this? There have been many reports of build breakages due to this. FWIW, there have been similar fixes in the past: - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/init/initramfs.c?id=55d5b7dd6451b58489ce384282ca5a4a289eb8d5 - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9 - Naveen
Re: [PATCH 3/3] [RFC] powerpc: Book3S 64-bit outline-only KASAN support
On Sun, May 15, 2022 at 07:33:52AM +, Christophe Leroy wrote: > > > Le 11/05/2022 à 09:28, Paul Mackerras a écrit : > > From: Daniel Axtens > > > > Implement a limited form of KASAN for Book3S 64-bit machines running under > > the Radix MMU, supporting only outline mode. [snip] > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > > index b779603978e1..0bf76e40c9c2 100644 > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -105,6 +105,7 @@ config PPC > > # Please keep this list sorted alphabetically. > > # > > select ARCH_32BIT_OFF_T if PPC32 > > + select ARCH_DISABLE_KASAN_INLINEif PPC_RADIX_MMU > > select ARCH_ENABLE_MEMORY_HOTPLUG > > select ARCH_ENABLE_MEMORY_HOTREMOVE > > select ARCH_HAS_COPY_MC if PPC64 > > @@ -152,6 +153,7 @@ config PPC > > select ARCH_WANT_IPC_PARSE_VERSION > > select ARCH_WANT_IRQS_OFF_ACTIVATE_MM > > select ARCH_WANT_LD_ORPHAN_WARN > > + select ARCH_WANTS_NO_INSTR > > Can you explain why we need that ? The help text for the option says: An architecture should select this if the noinstr macro is being used on functions to denote that the toolchain should avoid instrumenting such functions and is required for correctness. All it really seems to do is restrict the conditions under which the GCOV and KCOV options can be enabled. > Is it tied to KASAN ? Is yes, why didn't we have it for PPC32 until now ? Since we really do need to avoid instrumenting certain functions on ppc64 (as in things will break if we do instrument them), I think we need to select ARCH_WANTS_NO_INSTR. For ppc32, as far as I recall there is much less code that runs in real mode and it is mostly assembler (except for some boot code), mostly because all addresses have to be explicitly translated to physical addresses for 32-bit real-mode code, unlike ppc64 where we can use access memory in the linear mapping using virtual addresses because of the fact that the CPU ignores the top 4 bits of the effective address in real mode. That said, there is a lot less code that runs in real mode on ppc64 than there used to be. > Maybe that's independant of KASAN and would be worth a separate patch ? Yes, possibly, though KASAN does appear to be the only user of noinstr in arch/powerpc. [snip] > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > > b/arch/powerpc/include/asm/book3s/64/radix.h > > index d090d9612348..bafc9869afcd 100644 > > --- a/arch/powerpc/include/asm/book3s/64/radix.h > > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > > @@ -35,6 +35,11 @@ > > #define RADIX_PMD_SHIFT (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE) > > #define RADIX_PUD_SHIFT (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE) > > #define RADIX_PGD_SHIFT (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE) > > + > > +#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE) > > +#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE) > > +#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE) > > + > > /* > >* Size of EA range mapped by our pagetables. > >*/ > > @@ -68,11 +73,11 @@ > >* > >* > >* 3rd quadrant expanded: > > - * +--+ > > + * +--+ Highest address (0xc010) > > + * +--+ KASAN shadow end (0xc00fc000) > >* | | > >* | | > > - * | | > > - * +--+ Kernel vmemmap end > > (0xc010) > > + * +--+ Kernel vmemmap end/shadow start > > (0xc00e) > >* | | > >* | 512TB| > >* | | > > @@ -126,6 +131,8 @@ > > #define RADIX_VMEMMAP_SIZERADIX_KERN_MAP_SIZE > > #define RADIX_VMEMMAP_END (RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE) > > > > +/* For the sizes of the shadow area, see kasan.h */ > > + > > Why does this comment pops up here ? Do you mean, why is it there at all, or do you mean why is it in that particular place rather than closer to the "3rd quadrant expanded" diagram or something? > > #ifndef __ASSEMBLY__ > > #define RADIX_PTE_TABLE_SIZE (sizeof(pte_t) << RADIX_PTE_INDEX_SIZE) > > #define RADIX_PMD_TABLE_SIZE (sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE) > > diff --git a/arch/powerpc/include/asm/interrupt.h > > b/arch/powerpc/include/asm/interrupt.h > > index fc28f46d2f9d..fb244b6ca7f0 100644 > > --- a/arch/powerpc/include/asm/interrupt.h > > +++ b/arch/powerpc/include/asm/interrupt.h > > @@ -327,22 +327,46 @@ static inline void interrupt_nmi_enter_prepare(struct > > pt_regs *regs, struct inte > > } > > #endif > > > > + /* If data relocations are enabled, it's safe to use nmi_enter() */ > > + if (mfmsr() & MSR_DR) { > > + nmi_enter(); > > +
Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()
Le 15/05/2022 à 12:28, Michael Ellerman a écrit : > On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote: >> This series reduces by 70% the time required to activate >> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX. >> >> Measure is performed in function ftrace_replace_code() using mftb() >> around the loop. >> >> With the series, >> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured. >> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured. >> >> [...] > > Patches 1, 3 and 4 applied to powerpc/next. > > [1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without > CONFIG_MODULES > > https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596 > [3/4] powerpc/code-patching: Use jump_label for testing freed initmem > > https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6 > [4/4] powerpc/code-patching: Use jump_label to check if poking_init() is done > > https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a > Patch 2 was the keystone of this series. What happened to it ? Christophe