On Fri, 2016-08-19 at 19:21 +0200, Laurent Dufour wrote: > Hi, > > While working on the TM support for CRIU, I faced a TM Bad Thing > exception. > > Digging further, I found that it is *easy* to raised it from the user > space. I attached below a simple program which raise it all the time, > like this : > > [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info > unavailable] > [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 > (msr 0x201033) > [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1] > [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV > [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT > nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables > ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv > kvm > uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 > ses > enclosure scsi_transport_sas bnx2x ipr mdio libcrc32c > [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted > 4.7.0 #34 > [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: > c0000000fceb4000 > [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: > 0000000000000000 > [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not > tainted (4.7.0) > [12045.222418] MSR: 9000000300201033 > <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: > 28444280 XER: 20000000 > [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 > PACATMSCRATCH: 900000014280f033 > GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 > c0000000fce390d0 > GPR04: 900000034280f033 0000000000000000 0000000000000000 > 0000000000000000 > GPR08: 0000000000000000 b000000000001033 0000000000000001 > 0000000000000000 > GPR12: 0000000000000000 c000000002926400 0000000000000000 > 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > GPR20: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 > 0000000000000000 > GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 > c0000000fce390d0 > [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c > [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0 > [12045.223630] Call Trace: > [12045.223655] [c0000000fceb7d80] [c000000000026e74] > sys_rt_sigreturn+0x494/0x6c0 > [12045.223738] [c0000000fceb7e30] [c0000000000092e0] > system_call+0x38/0x108 > [12045.223806] Instruction dump: > [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 > 7c0122a6 f80304b8 > [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> > e80304b8 > 7c0123a6 4e800020 > [12045.224074] ---[ end trace cb8002ee240bae76 ]--- > > The exception is raised when the kernel is restoring the TM SPRS from > the signal stack. But this operation is not allowed while in a > transaction. > > The sampler test is ending the signal handler with a pending > transaction > while the signal got caught during a transaction itself. > > I can't see any straight way to get rid of that, except by clearing > the > transactional state in the path of sigreturn.... >
This is correct - I have a patch. > Please advise. > I'm happy to do it if you don't have time (I pretty much already have for my testing), do you want to send your test case in as a selftest/powerpc? It is good to have these to guard against regressions as these kinds of pathes aren't often exercised. > Cheers, > Laurent.