Re: [PATCH v3 1/3] powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery

2020-02-19 Thread Michael Ellerman
On Tue, 2020-02-11 at 03:38:29 UTC, Gustavo Luiz Duarte wrote:
> After a treclaim, we expect to be in non-transactional state. If we don't 
> clear
> the current thread's MSR[TS] before we get preempted, then
> tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
> suspended transaction state.
> 
> When handling a signal caught in transactional state, handle_rt_signal64()
> calls get_tm_stackpointer() that treclaims the transaction using
> tm_reclaim_current() but without clearing the thread's MSR[TS]. This can cause
> the TM Bad Thing exception below if later we pagefault and get preempted 
> trying
> to access the user's sigframe, using __put_user(). Afterwards, when we are
> rescheduled back into do_page_fault() (but now in suspended state since the
> thread's MSR[TS] was not cleared), upon executing 'rfid' after completion of
> the page fault handling, the exception is raised because a transition from
> suspended to non-transactional state is invalid.
> 
>   Unexpected TM Bad Thing exception at c000de44 (msr 
> 0x800302a03031) tm_scratch=80010280b033
>   Oops: Unrecoverable exception, sig: 6 [#1]
>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>   Modules linked in: nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
> nf_defrag_ipv4 ip6_tables ip_tables nft_compat ip_set nf_tables nfnetlink xts 
> vmx_crypto sg virtio_balloon
>   r_mod cdrom virtio_net net_failover virtio_blk virtio_scsi failover 
> dm_mirror dm_region_hash dm_log dm_mod
>   CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
>   NIP:  c000de44 LR: c0034728 CTR: 
>   REGS: c0003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
>   MSR:  800302a03031   CR: 44000884 
>  XER: 
>   CFAR: c000dda4 IRQMASK: 0
>   PACATMSCRATCH: 80010280b033
>   GPR00: c0034728 c00f65a17c80 c1662800 
> 7fffacf3fd78
>   GPR04: 1000 1000  
> c00f611f8af0
>   GPR08:  78006001  
> 000c
>   GPR12: c00f611f84b0 c0003ffcb200  
> 
>   GPR16:    
> 
>   GPR20:    
> c00f611f8140
>   GPR24:  7fffacf3fd68 c00f65a17d90 
> c00f611f7800
>   GPR28: c00f65a17e90 c00f65a17e90 c1685e18 
> 7fffacf3f000
>   NIP [c000de44] fast_exception_return+0xf4/0x1b0
>   LR [c0034728] handle_rt_signal64+0x78/0xc50
>   Call Trace:
>   [c00f65a17c80] [c0034710] handle_rt_signal64+0x60/0xc50 
> (unreliable)
>   [c00f65a17d30] [c0023640] do_notify_resume+0x330/0x460
>   [c00f65a17e20] [c000dcc4] ret_from_except_lite+0x70/0x74
>   Instruction dump:
>   7c4ff120 e8410170 7c5a03a6 3840 f8410060 e8010070 e8410080 e8610088
>   6000 6000 e8810090 e8210078 <4c24> 4800 e8610178 
> 88ed0989
>   ---[ end trace 93094aa44b442f87 ]---
> 
> The simplified sequence of events that triggers the above exception is:
> 
>   ... # userspace in NON-TRANSACTIONAL state
>   tbegin  # userspace in TRANSACTIONAL state
>   signal delivery # kernelspace in SUSPENDED state
>   handle_rt_signal64()
> get_tm_stackpointer()
>   treclaim# kernelspace in NON-TRANSACTIONAL state
> __put_user()
>   page fault happens. We will never get back here because of the TM Bad 
> Thing exception.
> 
>   page fault handling kicks in and we voluntarily preempt ourselves
>   do_page_fault()
> __schedule()
>   __switch_to(other_task)
> 
>   our task is rescheduled and we recheckpoint because the thread's MSR[TS] 
> was not cleared
>   __switch_to(our_task)
> switch_to_tm()
>   tm_recheckpoint_new_task()
> trechkpt  # kernelspace in SUSPENDED state
> 
>   The page fault handling resumes, but now we are in suspended transaction 
> state
>   do_page_fault()completes
>   rfid <- trying to get back where the page fault happened (we were 
> non-transactional back then)
>   TM Bad Thing# illegal transition from suspended to 
> non-transactional
> 
> This patch fixes that issue by clearing the current thread's MSR[TS] just 
> after
> treclaim in get_tm_stackpointer() so that we stay in non-transactional state 
> in
> case we are preempted. In order to make treclaim and clearing the thread's
> MSR[TS] atomic from a preemption perspective when CONFIG_PREEMPT is set,
> preempt_disable/enable() is used. It's also necessary to save the previous
> value of the thread's MSR before get_tm_stackpointer() is called so that it 
> can
> be exposed to the signal handler later in 

Re: [PATCH v3 1/3] powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery

2020-02-11 Thread Michael Neuling
> Found with tm-signal-context-force-tm kernel selftest.
> 
> v3: Subject and comment improvements.
> v2: Fix build failure when tm is disabled.
> 
> Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the
> signal context")
> Cc: sta...@vger.kernel.org # v3.9
> Signed-off-by: Gustavo Luiz Duarte 

Acked-By: Michael Neuling 


Re: [PATCH v3 1/3] powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery

2020-02-11 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 2b0a576d15e0 ("powerpc: Add new transactional memory state to 
the signal context").

The bot has tested the following trees: v5.5.2, v5.4.18, v4.19.102, v4.14.170, 
v4.9.213, v4.4.213.

v5.5.2: Build OK!
v4.19.102: Build OK!
v4.14.170: Failed to apply! Possible dependencies:
1c200e63d055 ("powerpc/tm: Fix endianness flip on trap")
92fb8690bd04 ("powerpc/tm: P9 disable transactionally suspended 
sigcontexts")

v4.9.213: Failed to apply! Possible dependencies:
1c200e63d055 ("powerpc/tm: Fix endianness flip on trap")
92fb8690bd04 ("powerpc/tm: P9 disable transactionally suspended 
sigcontexts")

v4.4.213: Failed to apply! Possible dependencies:
1c200e63d055 ("powerpc/tm: Fix endianness flip on trap")
92fb8690bd04 ("powerpc/tm: P9 disable transactionally suspended 
sigcontexts")
a7d623d4d053 ("powerpc: Move part of giveup_vsx into c")
b86fd2bd0302 ("powerpc: Simplify TM restore checks")
d11994314b2b ("powerpc: signals: Stop using current in signal code")
d96f234f47af ("powerpc: Avoid load hit store in setup_sigcontext()")
e1c0d66fcb17 ("powerpc: Set used_(vsr|vr|spe) in sigreturn path when MSR 
bits are active")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks,
Sasha