On Fri, 2016-08-19 at 19:21 +0200, Laurent Dufour wrote:
> Hi,
> 
> While working on the TM support for CRIU, I faced a TM Bad Thing
> exception.
> 
> Digging further, I found that it is *easy* to raised it from the user
> space. I attached below a simple program which raise it all the time,
> like this :
> 
> [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info
> unavailable]
> [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40
> (msr 0x201033)
> [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
> [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
> [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle
> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
> nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables
> ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv
> kvm
> uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4
> ses
> enclosure scsi_transport_sas bnx2x ipr mdio libcrc32c
> [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted
> 4.7.0 #34
> [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti:
> c0000000fceb4000
> [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR:
> 0000000000000000
> [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700   Not
> tainted  (4.7.0)
> [12045.222418] MSR: 9000000300201033
> <SF,HV,ME,IR,DR,RI,LE,TM[SE]>  CR:
> 28444280  XER: 20000000
> [12045.222625] CFAR: c0000000000163b8 SOFTE: 0
> PACATMSCRATCH: 900000014280f033
> GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100
> c0000000fce390d0
> GPR04: 900000034280f033 0000000000000000 0000000000000000
> 0000000000000000
> GPR08: 0000000000000000 b000000000001033 0000000000000001
> 0000000000000000
> GPR12: 0000000000000000 c000000002926400 0000000000000000
> 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470
> 0000000000000000
> GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001
> c0000000fce390d0
> [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
> [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
> [12045.223630] Call Trace:
> [12045.223655] [c0000000fceb7d80] [c000000000026e74]
> sys_rt_sigreturn+0x494/0x6c0
> [12045.223738] [c0000000fceb7e30] [c0000000000092e0]
> system_call+0x38/0x108
> [12045.223806] Instruction dump:
> [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0
> 7c0122a6 f80304b8
> [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6>
> e80304b8
> 7c0123a6 4e800020
> [12045.224074] ---[ end trace cb8002ee240bae76 ]---
> 
> The exception is raised when the kernel is restoring the TM SPRS from
> the signal stack. But this operation is not allowed while in a
> transaction.
> 
> The sampler test is ending the signal handler with a pending
> transaction
> while the signal got caught during a transaction itself.
> 
> I can't see any straight way to get rid of that, except by clearing
> the
> transactional state in the path of sigreturn....
> 

This is correct - I have a patch.

> Please advise.
> 

I'm happy to do it if you don't have time (I pretty much already have
for my testing), do you want to send your test case in as a
selftest/powerpc? It is good to have these to guard against regressions
as these kinds of pathes aren't often exercised.

> Cheers,
> Laurent.

Reply via email to