[RFC v2 00/19] utrace for 3.0 kernel
Another attempt. This version tries to decouple utrace and ptrace. This way it is much simpler to follow the upstream changes, afaics. TODO: - The single-stepping updates in ptrace_resume() can race with utrace_reset()-user_disable_single_step(). This was fixed by 20/20, but I noticed that this patch is buggy right before sending. - Perhaps PTRACE_SYSEMU/TIF_SYSCALL_EMU logic was broken, I need to recheck. - Testing. Oleg.
[PATCH 06/19] wait_task_inactive: treat task-state and match_state as bitmasks
Change wait_task_inactive() to check state match_state instead of state == match_state. This should not make any difference, but this allows us to add more stopped bits which can be set or cleared independently. IOW. wait_task_inactive() assumes that if task-state != 0, it can only be changed to TASK_RUNNING. Currently this is true, and in this case state match_state continues to work. But, unlike the current check, it also works if task-state has other bits set while the caller is only interested in, say, __TASK_TRACED. Note: I think wait_task_inactive() should be cleanuped upstrean anyway, nowadays we have TASK_WAKING and task-state != 0 doesn't necessarily mean it is TASK_RUNNING. It also makes sense to exclude the !TASK_REPORT bits during the check. Finally, probably this patch makes sense anyway even without utrace. For example, a stopped _and_ traced thread could have task-state = TASK_STOPPED | TASK_TRACED, this can be useful. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/sched.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 3f2e502..ade7997 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2277,7 +2277,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state) * is actually now running somewhere else! */ while (task_running(rq, p)) { - if (match_state unlikely(p-state != match_state)) + if (match_state !likely(p-state match_state)) return 0; cpu_relax(); } -- 1.5.5.1
[PATCH 04/19] introduce wake_up_quiescent()
No functional changes. Add the new helper, wake_up_quiescent(task, state), which simply returns wake_up_state(task, state). Change all callers which do wake_up_state(STOPPED/TRACED) to use the new helper. ptrace_stop() is a bit special, it does __set_current_state(RUNNING) in the very unlikely case, change it as well. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/signal.h |2 ++ kernel/ptrace.c|2 +- kernel/signal.c| 12 ++-- kernel/utrace.c|2 +- 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/include/linux/signal.h b/include/linux/signal.h index a822300..2be3712 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -239,6 +239,8 @@ static inline int valid_signal(unsigned long sig) struct timespec; struct pt_regs; +extern int wake_up_quiescent(struct task_struct *p, unsigned int state); + extern int next_signal(struct sigpending *pending, sigset_t *mask); extern int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p, bool group); diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 9988b13..26ae214 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -566,7 +566,7 @@ static int ptrace_resume(struct task_struct *child, long request, child-exit_code = data; if (lock_task_sighand(child, flags)) { - wake_up_state(child, __TASK_TRACED); + wake_up_quiescent(child, __TASK_TRACED); unlock_task_sighand(child, flags); } diff --git a/kernel/signal.c b/kernel/signal.c index 2138cee..4fcf1c7 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -652,6 +652,14 @@ void signal_wake_up(struct task_struct *t, int resume) } /* + * wakes up the STOPPED/TRACED task, must be called with -siglock held. + */ +int wake_up_quiescent(struct task_struct *p, unsigned int state) +{ + return wake_up_state(p, state); +} + +/* * Remove signals in mask from the pending set and queue. * Returns 1 if any signals were found. * @@ -811,7 +819,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns) do { task_clear_group_stop_pending(t); rm_from_queue(SIG_KERNEL_STOP_MASK, t-pending); - wake_up_state(t, __TASK_STOPPED); + wake_up_quiescent(t, __TASK_STOPPED); } while_each_thread(p, t); /* @@ -1800,7 +1808,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) do_notify_parent_cldstop(current, false, why); spin_lock_irq(current-sighand-siglock); - __set_current_state(TASK_RUNNING); + wake_up_quiescent(current, __TASK_TRACED); spin_unlock_irq(current-sighand-siglock); if (clear_code) diff --git a/kernel/utrace.c b/kernel/utrace.c index 6e7fafb..d7c547c 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -648,7 +648,7 @@ static void utrace_wakeup(struct task_struct *target, struct utrace *utrace) { lockdep_assert_held(utrace-lock); spin_lock_irq(target-sighand-siglock); - wake_up_state(target, __TASK_TRACED); + wake_up_quiescent(target, __TASK_TRACED); spin_unlock_irq(target-sighand-siglock); } -- 1.5.5.1
[PATCH 09/19] tracehooks: kill tracehook_finish_jctl(), add tracehook_finish_stop()
tracehook_finish_jctl() is needed to avoid the races with SIGKILL which wakes up UTRACED task, and thus it should be called every time after the STOPPED/TRACED/UTRACED returns from schedule(), remember that TASK_UTRACED can be added while the task is STOPPED/UTRACED. - rename it to tracehook_finish_stop(),jctl no longer matches the reality. - change do_signal_state() to call this helper right after schedule(), otherwise this logic is broken by the upstream changes - now that utrace doesn't control TASK_TRACED bit, ptrace_stop() must call this helper too. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/tracehook.h |6 +++--- kernel/signal.c |5 +++-- kernel/utrace.c |2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 7d7bdde..3c7b6b3 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -528,11 +528,11 @@ static inline int tracehook_get_signal(struct task_struct *task, } /** - * tracehook_finish_jctl - report about return from job control stop + * tracehook_finish_stop - report about return from STOPPED/TRACED * - * This is called by do_signal_stop() after wakeup. + * This is called by do_signal_stop() and ptrace_stop after wakeup. */ -static inline void tracehook_finish_jctl(void) +static inline void tracehook_finish_stop(void) { if (task_utrace_flags(current)) utrace_finish_stop(); diff --git a/kernel/signal.c b/kernel/signal.c index 4fcf1c7..a7979ad 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1816,6 +1816,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) read_unlock(tasklist_lock); } + tracehook_finish_stop(); /* * While in TASK_TRACED, we were considered frozen enough. * Now that we woke up, it's crucial if we're supposed to be @@ -1952,6 +1953,8 @@ retry: /* Now we don't run again until woken by SIGCONT or SIGKILL */ schedule(); + tracehook_finish_stop(); + spin_lock_irq(current-sighand-siglock); } else { ptrace_stop(current-group_stop GROUP_STOP_SIGMASK, @@ -1974,8 +1977,6 @@ retry: spin_unlock_irq(current-sighand-siglock); - tracehook_finish_jctl(); - return 1; } diff --git a/kernel/utrace.c b/kernel/utrace.c index be98607..daa96b9 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -741,7 +741,7 @@ static bool utrace_reset(struct task_struct *task, struct utrace *utrace) void utrace_finish_stop(void) { /* -* If we were task_is_traced() and then SIGKILL'ed, make +* If we were task_is_utraced() and then SIGKILL'ed, make * sure we do nothing until the tracer drops utrace-lock. */ if (unlikely(__fatal_signal_pending(current))) { -- 1.5.5.1
[PATCH 10/19] teach wake_up_quiescent() to do selective wake_up
Both utrace and ptrace can want the same thread to be quiescent, in this case its state is TASK_TRACED | TASK_UTRACED. And this also means that this task must not run unless both utrace and ptrace resume it. Change wake_up_quiescent(p, state) to do p-state = ~state and return false unless there is no more quiescent bits in task-state. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/signal.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index a7979ad..57552e6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -651,11 +651,26 @@ void signal_wake_up(struct task_struct *t, int resume) kick_process(t); } +#define STATE_QUIESCENT(__TASK_STOPPED | __TASK_TRACED | __TASK_UTRACED) /* * wakes up the STOPPED/TRACED task, must be called with -siglock held. */ int wake_up_quiescent(struct task_struct *p, unsigned int state) { + unsigned int quiescent = (p-state STATE_QUIESCENT); + + WARN_ON(state ~(STATE_QUIESCENT | TASK_INTERRUPTIBLE)); + + if (quiescent) { + state = ~TASK_INTERRUPTIBLE; + if ((quiescent ~state) != 0) { + p-state = ~state; + WARN_ON(!(p-state STATE_QUIESCENT)); + WARN_ON(!(p-state TASK_WAKEKILL)); + return 0; + } + } + return wake_up_state(p, state); } -- 1.5.5.1
[PATCH 11/19] ptrace_stop: do not assume the task is running after wake_up_quiescent()
If ptrace_stop() sets TASK_TRACED and then detects we should not stop, it can race with utrace_do_stop() which can see TASK_TRACED and add TASK_UTRACED. In this case we should stop for utrace needs. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/signal.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 57552e6..89e691d 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1829,6 +1829,14 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) if (clear_code) current-exit_code = 0; read_unlock(tasklist_lock); + + /* +* It is possible that __TASK_UTRACED was added by utrace +* while we were __TASK_TRACED and before we take -siglock +* for wake_up_quiescent(), we need to block in this case. +* Otherwise this is unnecessary but absolutely harmless. +*/ + schedule(); } tracehook_finish_stop(); -- 1.5.5.1
[PATCH 17/19] teach ptrace_set_syscall_trace() to play well with utrace
1. ptrace_set_syscall_trace(true)-set_tsk_thread_flag(TIF_SYSCALL_TRACE) can race with utrace_control()-utrace_reset() path which can miss PT_SYSCALL_TRACE and clear TIF_SYSCALL_TRACE after it was already set. 2. ptrace_set_syscall_trace(false) clears TIF_SYSCALL_TRACE and this is not utrace-friendly, it can need this flag. Change ptrace_set_syscall_trace() to take task_utrace_lock(), this is enough to fix the 1st problem. Check task_utrace_flags() before clearing TIF_SYSCALL_TRACE, this fixes 2. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/ptrace.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 0825a01..209ea2d 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -23,6 +23,7 @@ #include linux/uaccess.h #include linux/regset.h #include linux/hw_breakpoint.h +#include linux/utrace.h static void ptrace_signal_wake_up(struct task_struct *p, int quiescent) { @@ -39,13 +40,16 @@ static void ptrace_signal_wake_up(struct task_struct *p, int quiescent) static void ptrace_set_syscall_trace(struct task_struct *p, bool on) { + task_utrace_lock(p); if (on) { p-ptrace |= PT_SYSCALL_TRACE; set_tsk_thread_flag(p, TIF_SYSCALL_TRACE); } else { p-ptrace = ~PT_SYSCALL_TRACE; - clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); + if (!(task_utrace_flags(p) UTRACE_EVENT_SYSCALL)) + clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); } + task_utrace_unlock(p); } /* -- 1.5.5.1
[PATCH 18/19] introduce PT_SINGLE_STEP and PT_SINGLE_BLOCK
Add the new internal ptrace flags, PT_SINGLE_STEP and PT_SINGLE_BLOCK. Like PT_SYSCALL_TRACE, this is needed to avoid the unnecessary ptrace reports when TIF_SINGLESTEP was set by another engine, not by ptrace. Also, we need these bits to coordinate the user_*_single_step() calls from ptrace and utrace. TODO: update the !x86 ptrace code which does user_disable_single_step(). Signed-off-by: Oleg Nesterov o...@redhat.com --- arch/x86/kernel/ptrace.c |1 + include/linux/ptrace.h|5 - include/linux/tracehook.h |7 +-- kernel/ptrace.c |3 +++ 4 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 807c2a2..7ab475f 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -807,6 +807,7 @@ static int ioperm_get(struct task_struct *target, */ void ptrace_disable(struct task_struct *child) { + child-ptrace = ~(PT_SINGLE_STEP | PT_SINGLE_BLOCK); user_disable_single_step(child); #ifdef TIF_SYSCALL_EMU clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 98d995d..65b1e4f 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -91,6 +91,8 @@ #define PT_TRACE_MASK 0x03f4 #define PT_SYSCALL_TRACE 0x0001 +#define PT_SINGLE_STEP 0x0002 +#define PT_SINGLE_BLOCK0x0004 /* single stepping state bits (used on ARM and PA-RISC) */ #define PT_SINGLESTEP_BIT 31 @@ -189,7 +191,8 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace) child-ptrace = 0; if (unlikely(ptrace) (current-ptrace PT_PTRACED)) { child-ptrace = current-ptrace; - child-ptrace = ~PT_SYSCALL_TRACE; + child-ptrace = + ~(PT_SYSCALL_TRACE | PT_SINGLE_STEP | PT_SINGLE_BLOCK); __ptrace_link(child, current-parent); } diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 6ce7a37..06edb52 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -121,6 +121,9 @@ static inline __must_check int tracehook_report_syscall_entry( return 0; } +#define ptrace_wants_step()\ + (current-ptrace (PT_SINGLE_STEP | PT_SINGLE_BLOCK)) + /** * tracehook_report_syscall_exit - task has just finished a system call * @regs: user register state of current task @@ -143,7 +146,7 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) if (task_utrace_flags(current) UTRACE_EVENT(SYSCALL_EXIT)) utrace_report_syscall_exit(regs); - if (step) { + if (step ptrace_wants_step()) { siginfo_t info; user_single_step_siginfo(current, regs, info); force_sig_info(SIGTRAP, info, current); @@ -436,7 +439,7 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info, { if (task_utrace_flags(current)) utrace_signal_handler(current, stepping); - if (stepping) + if (stepping ptrace_wants_step()) ptrace_notify(SIGTRAP); } diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 209ea2d..44908d0 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -575,13 +575,16 @@ static int ptrace_resume(struct task_struct *child, long request, clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); #endif + child-ptrace = ~(PT_SINGLE_STEP | PT_SINGLE_BLOCK); if (is_singleblock(request)) { if (unlikely(!arch_has_block_step())) return -EIO; + child-ptrace |= PT_SINGLE_BLOCK; user_enable_block_step(child); } else if (is_singlestep(request) || is_sysemu_singlestep(request)) { if (unlikely(!arch_has_single_step())) return -EIO; + child-ptrace |= PT_SINGLE_STEP; user_enable_single_step(child); } else { user_disable_single_step(child); -- 1.5.5.1
[PATCH 12/19] get_signal_to_deliver: restructure utrace/ptrace signal reporting
get_signal_to_deliver() assumes that either tracehook_get_signal() does nothing (without CONFIG_UTRACE), or it also reports the signal to ptrace engine implemented on top of utrace. Now that ptrace works independently this doesn't work. Change the code to call ptrace_signal() after tracehook_get_signal(). Move -ptrace check from ptrace_signal() to get_signal_to_deliver(), we do not want to change *return_ka if it was initialized by utrace and the task is not traced. IOW, roughly, ptrace acts as if it is the last attached engine, it takes the final decision about the signal. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/signal.c | 24 +++- 1 files changed, 11 insertions(+), 13 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 89e691d..d0e0c67 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2006,9 +2006,6 @@ retry: static int ptrace_signal(int signr, siginfo_t *info, struct pt_regs *regs, void *cookie) { - if (!task_ptrace(current)) - return signr; - ptrace_signal_deliver(regs, cookie); /* Let the debugger run. */ @@ -2110,6 +2107,7 @@ relock: signr = tracehook_get_signal(current, regs, info, return_ka); if (unlikely(signr 0)) goto relock; + if (unlikely(signr != 0)) ka = return_ka; else { @@ -2117,18 +2115,18 @@ relock: GROUP_STOP_PENDING) do_signal_stop(0)) goto relock; - signr = dequeue_signal(current, current-blocked, - info); + signr = dequeue_signal(current, current-blocked, info); - if (!signr) - break; /* will return 0 */ + ka = sighand-action[signr-1]; + } - if (signr != SIGKILL) { - signr = ptrace_signal(signr, info, - regs, cookie); - if (!signr) - continue; - } + if (!signr) + break; /* will return 0 */ + + if (signr != SIGKILL current-ptrace) { + signr = ptrace_signal(signr, info, regs, cookie); + if (!signr) + continue; ka = sighand-action[signr-1]; } -- 1.5.5.1
Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework
On 06/21, Roland McGrath wrote: OK, so here's my (hacky) idea: (1) Forget ptrace-via-utrace. Have utrace be a separate thing. This way the recent ptrace changes won't matter. This is what V2 does. (2) But, what about ptrace co-existing well with utrace? Make them mutually exclusive - a ptraced-process can't be utraced and a utraced-process can't be ptraced. We had this situation before for a while. It has the substantial downside that e.g. you cannot do any system-wide systemtap tracing without making all strace and gdb use impossible. Yes, we can't make them mutually exclusive, this can't work. So V2 tries to teach them play well together. Assuming the above is a semi-reasonable idea, it might be a lot less work than updating the ptrace-via-utrace code to handle the new ptrace changes. That's for Oleg to say. (Sorry, Oleg. ;-) Oh, I am not sure what is simpler ;) Oleg.