Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
John Paul Adrian Glaubitz writes: > Hi Michael! > > On 10/28/21 08:39, Michael Ellerman wrote: >> That completed fine on my BE VM here. >> >> I ran these in two tmux windows: >> $ sbuild -d sid --arch=powerpc --no-arch-all gcc-11_11.2.0-10.dsc >> $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-11_11.2.0-10.dsc > > Could you try gcc-10 instead? It's testsuite has crashed the host for me > with a patched kernel twice now. > > $ dget -u > https://deb.debian.org/debian/pool/main/g/gcc-10/gcc-10_10.3.0-12.dsc > $ sbuild -d sid --arch=powerpc --no-arch-all gcc-10_10.3.0-12.dsc > $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-10_10.3.0-12.dsc Sure, will give that a try. I was able to crash my machine over the weekend, building openjdk, but I haven't been able to reproduce it for ~24 hours now (I didn't change anything). Can you try running your guests with no SMT threads? I think one of your guests was using: -smp 32,sockets=1,dies=1,cores=8,threads=4 Can you change that to: -smp 8,sockets=1,dies=1,cores=8,threads=1 And something similar for the other guest(s). If the system is stable with those settings that would be useful information, and would also mean you could use the system without it crashing semi regularly. cheers
Re: [PATCH v2 12/45] csky: Use do_kernel_power_off()
Only for this patch, Acked-by: Guo Ren On Thu, Oct 28, 2021 at 5:18 AM Dmitry Osipenko wrote: > > Kernel now supports chained power-off handlers. Use do_kernel_power_off() > that invokes chained power-off handlers. It also invokes legacy > pm_power_off() for now, which will be removed once all drivers will > be converted to the new power-off API. > > Signed-off-by: Dmitry Osipenko > --- > arch/csky/kernel/power.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/csky/kernel/power.c b/arch/csky/kernel/power.c > index 923ee4e381b8..86ee202906f8 100644 > --- a/arch/csky/kernel/power.c > +++ b/arch/csky/kernel/power.c > @@ -9,16 +9,14 @@ EXPORT_SYMBOL(pm_power_off); > void machine_power_off(void) > { > local_irq_disable(); > - if (pm_power_off) > - pm_power_off(); > + do_kernel_power_off(); > asm volatile ("bkpt"); > } > > void machine_halt(void) > { > local_irq_disable(); > - if (pm_power_off) > - pm_power_off(); > + do_kernel_power_off(); > asm volatile ("bkpt"); > } > > -- > 2.33.1 > -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
ppc64le STRICT_MODULE_RWX and livepatch apply_relocate_add() crashes
Starting with 5.14 kernels, I can reliably reproduce a crash [1] on ppc64le when loading livepatches containing late klp-relocations [2]. These are relocations, specific to livepatching, that are resolved not when a livepatch module is loaded, but only when a livepatch-target module is loaded. There was previously related work by Josh and Peter [3] to simplify a lot of x86 and s390x code (at the time, the only two arches to HAVE_LIVEPATCH and STRICT_MODULE_RWX) as part of disallowing writable executable mappings. Now that Power has STRICT_MODULE_RWX, I think we will need to consider this architecture as well. The crash was originally spotted by the external kpatch-build tool [4] when building its integration tests on rhel-9-beta. It can also be reproduced by the endless-WIP klp-convert patchset [5], which brings klp-relocation creation from kpatch-build to the upstream build. I further verified: - turning STRICT_MODULE_RWX off resulted in no crash - alternatively, reverting the following commits resulted in no crash: d556e1be3332 ("livepatch: Remove module_disable_ro() usage") 0d9fbf78fefb ("module: Remove module_disable_ro()") I haven't started looking at a fix yet, but in the case of the x86 code update, its apply_relocate_add() implementation was modified to use a common text_poke() function to allowed us to drop module_{en,dis}ble_ro() games by the livepatching code. I can take a closer look this week, but thought I'd send out a report in case this may be a known todo for STRICT_MODULE_RWX on Power. -- Joe [1] crashing kernel log [ 84.837986] = TEST: klp-convert symbols = [ 84.858937] % modprobe test_klp_convert_mod [ 84.879040] % modprobe test_klp_convert1 [ 84.908056] BUG: Unable to handle kernel data access on write at 0xc008018402f0 [ 84.908067] Faulting instruction address: 0xc0056b58 [ 84.908072] Oops: Kernel access of bad area, sig: 11 [#1] [ 84.908077] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 84.908082] Modules linked in: test_klp_convert1(K+) test_klp_convert_mod bonding tls rfkill pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror dm_region_hash dm_log dm_mod [last unloaded: test_klp_atomic_replace] [ 84.908114] CPU: 1 PID: 4205 Comm: modprobe Kdump: loaded Tainted: G K 5.14.0+ #2 [ 84.908121] NIP: c0056b58 LR: c0056b1c CTR: 0009 [ 84.908127] REGS: c0005dce3480 TRAP: 0300 Tainted: G K (5.14.0+) [ 84.908132] MSR: 8280b033 CR: 24224484 XER: [ 84.908147] CFAR: c0056a68 DAR: c008018402f0 DSISR: 0a00 IRQMASK: 0 GPR00: c0056b1c c0005dce3720 c2a2af00 GPR04: c008018402f0 396b3d62 e98b0020f8410018 GPR08: 4e8004207d8903a6 8000 c008018382f0 000d GPR12: 4000 c7fcf480 c0004d7e2000 c008018706d8 GPR16: c00801850228 c0004d7e2c00 c10d6248 GPR20: c298c1c8 c00801860380 c008018706f0 aaab GPR24: c0004d7e2b40 c0080187 c0080184005c 008c GPR28: c00801860380 c00800770008 c0004d7e2000 c008018402f0 [ 84.908209] NIP [c0056b58] create_stub+0x78/0x240 [ 84.908217] LR [c0056b1c] create_stub+0x3c/0x240 [ 84.908223] Call Trace: [ 84.908225] [c0005dce3720] [c0004d7e2b40] 0xc0004d7e2b40 (unreliable) [ 84.908232] [c0005dce37a0] [c0056e0c] stub_for_addr+0xec/0x120 [ 84.908240] [c0005dce37d0] [c0057f14] apply_relocate_add+0x814/0x9a0 [ 84.908247] [c0005dce38d0] [c021ca38] klp_apply_section_relocs+0x208/0x2d0 [ 84.908255] [c0005dce39c0] [c021cb90] klp_init_object_loaded+0x90/0x1d0 [ 84.908262] [c0005dce3a50] [c021d2dc] klp_enable_patch+0x32c/0x540 [ 84.908269] [c0005dce3b10] [c00801840030] test_klp_convert_init+0x28/0x48 [test_klp_convert1] [ 84.908277] [c0005dce3b30] [c0012230] do_one_initcall+0x60/0x2c0 [ 84.908284] [c0005dce3c00] [c026012c] do_init_module+0x7c/0x3b0 [ 84.908290] [c0005dce3c90] [c0262b74] __do_sys_finit_module+0xd4/0x160 [ 84.908296] [c0005dce3db0] [c0030664] system_call_exception+0x144/0x280 [ 84.908303] [c0005dce3e10] [c000bff0] system_call_vectored_common+0xf0/0x280 [ 84.908310] --- interrupt: 3000 at 0x7fffa06d6b9c [ 84.908315] NIP: 7fffa06d6b9c LR: CTR: [ 84.908320] REGS: c0005dce3e80 TRAP: 3000 Tainted: G K (5.14.0+) [ 84.908325] MSR: 8280f033 CR: 28224244 XER: [ 84.908340] IRQMASK: 0 GPR00: 0161 7fffc4f74ad0 7fffa07d7100 0005 GPR04: 00012a926ca0 000
Re: [PATCH 04/13] nvdimm/btt: use goto error labels on btt_blk_init()
On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain wrote: > > This will make it easier to share common error paths. > > Signed-off-by: Luis Chamberlain > --- > drivers/nvdimm/btt.c | 19 --- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c > index 29cc7325e890..23ee8c005db5 100644 > --- a/drivers/nvdimm/btt.c > +++ b/drivers/nvdimm/btt.c > @@ -1520,10 +1520,11 @@ static int btt_blk_init(struct btt *btt) > { > struct nd_btt *nd_btt = btt->nd_btt; > struct nd_namespace_common *ndns = nd_btt->ndns; > + int rc = -ENOMEM; > > btt->btt_disk = blk_alloc_disk(NUMA_NO_NODE); > if (!btt->btt_disk) > - return -ENOMEM; > + goto out; I tend to not use a goto when there is nothing to unwind. The rest looks good to me. After dropping "goto out;" you can add: Reviewed-by: Dan Williams
Re: [PATCH 03/13] nvdimm/btt: do not call del_gendisk() if not needed
On Fri, Oct 15, 2021 at 4:53 PM Luis Chamberlain wrote: > > We know we don't need del_gendisk() if we haven't added > the disk, so just skip it. This should fix a bug on older > kernels, as del_gendisk() became able to deal with > disks not added only recently, after the patch titled > "block: add flag for add_disk() completion notation". Perhaps put this in: commit $abbrev_commit ("block: add flag for add_disk() completion notation") ...format, but I can't seem to find that commit? If you're touching the changelog how about one that clarifies the impact and drops "we"? "del_gendisk() is not required if the disk has not been added. On kernels prior to commit $abbrev_commit ("block: add flag for add_disk() completion notation") it is mandatory to not call del_gendisk() if the underlying device has not been through device_add()." Fixes: 41cd8b70c37a ("libnvdimm, btt: add support for blk integrity") With that you can add: Reviewed-by: Dan Williams
[powerpc:next] BUILD SUCCESS 81291383ffde08b23bce75e7d6b2575ce9d3475c
ecovec24_defconfig arm shannon_defconfig powerpc tqm8541_defconfig sh j2_defconfig armmulti_v5_defconfig mips allyesconfig powerpc powernv_defconfig powerpc mpc8560_ads_defconfig arm stm32_defconfig microblaze mmu_defconfig openrisc or1klitex_defconfig sh sh2007_defconfig arm ixp4xx_defconfig powerpc iss476-smp_defconfig xtensasmp_lx200_defconfig arm omap1_defconfig powerpc katmai_defconfig powerpcwarp_defconfig armoxnas_v6_defconfig powerpc mpc885_ads_defconfig powerpc ppc64e_defconfig microblaze defconfig shedosk7705_defconfig powerpc mpc837x_rdb_defconfig mips db1xxx_defconfig sh r7785rp_defconfig ia64 alldefconfig sh urquell_defconfig arm h5000_defconfig mipsgpr_defconfig powerpcge_imp3a_defconfig arm64alldefconfig powerpc tqm8548_defconfig s390 debug_defconfig m68k amiga_defconfig shsh7763rdp_defconfig powerpc tqm8560_defconfig mips lemote2f_defconfig arm eseries_pxa_defconfig arm footbridge_defconfig arm randconfig-c002-20211031 arm randconfig-c002-20211028 ia64 allmodconfig ia64defconfig ia64 allyesconfig m68kdefconfig m68k allmodconfig m68k allyesconfig nios2 defconfig nds32 allnoconfig arc allyesconfig nds32 defconfig alpha defconfig alphaallyesconfig nios2allyesconfig h8300allyesconfig arc defconfig sh allmodconfig xtensa allyesconfig parisc defconfig s390defconfig parisc allyesconfig s390 allyesconfig s390 allmodconfig sparcallyesconfig sparc defconfig i386defconfig i386 debian-10.3 i386 allyesconfig mips allmodconfig powerpc allnoconfig powerpc allyesconfig powerpc allmodconfig x86_64 randconfig-a002-20211028 x86_64 randconfig-a004-20211028 x86_64 randconfig-a005-20211028 x86_64 randconfig-a001-20211028 x86_64 randconfig-a006-20211028 x86_64 randconfig-a003-20211028 i386 randconfig-a004-20211028 i386 randconfig-a003-20211028 i386 randconfig-a002-20211028 i386 randconfig-a006-20211028 i386 randconfig-a001-20211028 i386 randconfig-a005-20211028 i386 randconfig-a003-20211031 i386 randconfig-a006-20211031 i386 randconfig-a002-20211031 i386 randconfig-a005-20211031 i386 randconfig-a001-20211031 i386 randconfig-a004-20211031 x86_64 randconfig-a015-20211029 x86_64 randconfig-a013-20211029 x86_64 randconfig-a011-20211029 x86_64 randconfig-a014-20211029 x86_64 randconfig-a012-20211029 x86_64 randconfig-a016-20211029 i386 randconfig-a012-20211029 i386 randconfig-a013-20211029 i386 randconfig-a011-20211029 i386 randconfig-a015-20211029 i386 randconfig-a016-20211029 i386 randconfig-a014-20211029 x86_64 randconfig-a005-20211031 x86_64 randconfig-a004-20211031 x86_64 randconfig-a002-20211031 x86_64 randconfig-a003-20211031 x86_64 randconfig-a001-20211031 x86_64 randconfig-a006