Nicholas Piggin <npig...@gmail.com> writes: > The early machine check runs in real mode, so locking is unnecessary. > Worse, the windup does not restore AMR, so this can result in a false > KUAP fault after a recoverable machine check hits inside a user copy > operation. > > Fix this similarly to HMI by just avoiding the kuap lock in the > early machine check handler (it will be set by the late handler that > runs in virtual mode if that runs). If the virtual mode handler is > reached, it will lock and restore the AMR.
For the archives, this is how I tested this. Build with KUAP enabled, disassemble load_elf_binary(), in there is a call to __copy_tofrom_user(), preceded by a write to AMR, eg: c00000000045eec8: a6 03 3d 7d mtspr 29,r9 c00000000045eecc: 2c 01 00 4c isync c00000000045eed0: 78 93 44 7e mr r4,r18 c00000000045eed4: 78 e3 83 7f mr r3,r28 c00000000045eed8: b1 c1 c3 4b bl c00000000009b088 <__copy_tofrom_user+0x8> Boot mambo using skiboot.tcl, break into the mambo shell. Add a breakpoint at the branch to __copy_tofrom_user(): systemsim % b 0xc00000000045eed8 breakpoint set at [0:0:0]: 0xc00000000045eed8 (0xC00000000045EED8) Enc:0x00000000 : INVALID Continue, run `ls` in the system shell and it should break at your breakpoint: systemsim % c 4439260000000: [0:0]: (PC:0x00007FFFB43B2F00) : 2.1 Mega-Inst/Sec : 2.1 Mega-Cycles/Sec [1 Zaps 0 PA-Zaps] *ON* [0:0] pri=4 extra=0 4440009381609: (7208208132): # ls [0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl $-0x3C3E50 INFO: 4440936223969: (8135050536): ** Execution stopped: user (tcl), ** 4440936223969: ** finished running 8135050536 instructions ** Print the AMR, it has been cleared: systemsim % p amr 0x0000000000000000 Then inject a machine check exception, and continue: systemsim % exc_mce systemsim % c 4440936231861: (8135058428): [ 8673.510176] Disabling lock debugging due to kernel taint 4440936246871: (8135073438): [ 8673.510205] MCE: CPU0: machine check (Warning) Host TLB Multihit [Recovered] 4440936266680: (8135093247): [ 8673.510244] MCE: CPU0: NIP: [c00000000045eed8] load_elf_binary+0xef8/0x1970 4440936282657: (8135109224): [ 8673.510275] MCE: CPU0: Probable Software error (some chance of hardware cause) [0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl $-0x3C3E50 INFO: 4440936296116: (8135122683): ** Execution stopped: user (tcl), ** 4440936296116: ** finished running 8135122683 instructions ** Now we're back at our breakpoint. Continue again and we should get an oops due to a bad AMR fault: systemsim % c 4440936301692: (8135128259): [ 8673.510312] ------------[ cut here ]------------ 4440936321016: (8135147583): [ 8673.510336] Bug: Write fault blocked by AMR! 4440936331347: (8135157914): [ 8673.510350] WARNING: CPU: 0 PID: 95 at arch/powerpc/include/asm/book3s/64/kup-radix.h:102 __do_page_fault+0x604/0xe60 4440936352510: (8135179077): [ 8673.510410] Modules linked in: 4440936365222: (8135191789): [ 8673.510436] CPU: 0 PID: 95 Comm: ls Tainted: G M 5.2.0-rc2-gcc-8.2.0 #273 4440936383775: (8135210342): [ 8673.510473] NIP: c0000000000716b4 LR: c0000000000716b0 CTR: c000000000ca88b0 4440936401995: (8135228562): [ 8673.510508] REGS: c0000000ec883530 TRAP: 0700 Tainted: G M (5.2.0-rc2-gcc-8.2.0) 4440936430641: (8135257208): [ 8673.510545] MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 28002422 XER: 20040000 4440936498754: (8135325321): [ 8673.510597] CFAR: c00000000011b8e4 IRQMASK: 1 4440936505159: (8135331726): [ 8673.510597] GPR00: c0000000000716b0 c0000000ec8837c0 c0000000015f4900 0000000000000020 4440936515814: (8135342381): [ 8673.510597] GPR04: c000000001824550 0000000000000000 746c756166206574 64656b636f6c6220 4440936528594: (8135355161): [ 8673.510597] GPR08: 00000000fed30000 c000000001130de8 0000000000000000 9000000030001033 4440936541374: (8135367941): [ 8673.510597] GPR12: 0000000000002000 c0000000018e0000 0000000080000000 00007fffe2e3de09 4440936554154: (8135380721): [ 8673.510597] GPR16: c000000000ed2c50 0000000010000000 c000000000ed2c50 00000000100d3648 4440936564809: (8135391376): [ 8673.510597] GPR20: c0000000f0968b00 00000000100e3648 00007fff930a0000 0000000002000000 4440936577589: (8135404156): [ 8673.510597] GPR24: 0000000002000000 c0000000ee830600 0000000000000301 00007fffe2e3de09 4440936590369: (8135416936): [ 8673.510597] GPR28: 0000000000000000 000000000a000000 0000000000000000 c0000000ec883900 4440936611699: (8135438266): [ 8673.510918] NIP [c0000000000716b4] __do_page_fault+0x604/0xe60 4440936628747: (8135455314): [ 8673.510951] LR [c0000000000716b0] __do_page_fault+0x600/0xe60 4440936642325: (8135468892): [ 8673.510978] Call Trace: 4440936655614: (8135482181): [ 8673.511000] [c0000000ec8837c0] [c0000000000716b0] __do_page_fault+0x600/0xe60 (unreliable) 4440936677874: (8135504441): [ 8673.511045] [c0000000ec883890] [c00000000000b0d4] handle_page_fault+0x18/0x38 4440936700658: (8135527225): [ 8673.511091] --- interrupt: 301 at __copy_tofrom_user_power7+0x230/0x7ac 4440936709188: (8135535755): [ 8673.511091] LR = load_elf_binary+0xefc/0x1970 4440936728082: (8135554649): [ 8673.511142] [c0000000ec883b90] [c00000000045ee80] load_elf_binary+0xea0/0x1970 (unreliable) 4440936750368: (8135576935): [ 8673.511187] [c0000000ec883c90] [c0000000003d2f88] search_binary_handler.part.12+0xb8/0x2b0 4440936772446: (8135599013): [ 8673.511230] [c0000000ec883d20] [c0000000003d3934] __do_execve_file.isra.14+0x684/0xa10 4440936793891: (8135620458): [ 8673.511272] [c0000000ec883df0] [c0000000003d41b8] sys_execve+0x38/0x50 4440936813829: (8135640396): [ 8673.511311] [c0000000ec883e20] [c00000000000bdf4] system_call+0x5c/0x70 4440936828817: (8135655384): [ 8673.511340] Instruction dump: 4440936848134: (8135674701): [ 8673.511361] 60000000 2fb70000 e93f0168 419e0620 2fa90000 409cfba4 3c82ff8e 38846b88 4440936874244: (8135700811): [ 8673.511412] 3c62ff8e 38636c98 480aa1d1 60000000 <0fe00000> e80100e0 3b80000b eae10088 4440936891327: (8135717894): [ 8673.511464] ---[ end trace 0698ac8ff1068918 ]--- 4440938377906: (8137204473): Segmentation fault Apply the fix, retest, and no oops is seen. cheers