Re: [PATCH 0/31] utrace for 3.1 kernel
On 02/06, Josh Stone wrote: On 02/06/2012 06:24 PM, Ying, Victor wrote: I have problem to checkout utrace via git clone git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git utrace-3.0, it always reports fatal: the remote end hung up unexpectedly; I tried both utrace-3.0 and utrace-3.1, none of them works. I don't think Oleg has restored his account since kernel.org was rebuilt. He's been posting utrace on github instead, so try: git://github.com/utrace/linux.git Yes, thanks Josh. There are utrace-3.0 utrace-3.1 utrace-3.2 utrace-3.3 branches for different kernel versions. Oleg.
[PATCH 0/1] (Was: utrace for 3.3 kernel)
On 01/25, Josh Boyer wrote: I'll get this into rawhide this evening. Thanks Josh. There is another simple change, I forgout about it. It is purely internal, has no effect outside of utrace.c. Oleg.
[PATCH utrace-3.1 32/31] ptrace_report_syscall: check if TIF_SYSCALL_EMU is defined
From: Tony Breeds t...@bakeyournoodle.com TIF_SYSCALL_EMU is x86 only, add ifdef into ptrace_report_syscall(). Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/tracehook.h |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 90ca578..a1bac95 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -59,8 +59,12 @@ static inline void ptrace_report_syscall(struct pt_regs *regs) { int ptrace = current-ptrace; - if (!(ptrace PT_SYSCALL_TRACE) !test_thread_flag(TIF_SYSCALL_EMU)) - return; + if (!(ptrace PT_SYSCALL_TRACE)) { +#ifdef TIF_SYSCALL_EMU + if (!test_thread_flag(TIF_SYSCALL_EMU)) +#endif + return; + } ptrace_notify(SIGTRAP | ((ptrace PT_TRACESYSGOOD) ? 0x80 : 0)); -- 1.5.5.1
[PATCH 0/31] utrace for 3.1 kernel
On 08/02, Oleg Nesterov wrote: utrace patches for 3.1 kernel. Untested, will try to do some tests tomorrow. I tried to test it a bit, seems to work. But see the new [PATCH 31/31] utrace_resume: check irqs_disabled() to shut up lockdep. The whole series is available in git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git utrace-3.1 Oleg.
[PATCH 04/31] tracehooks: reintroduce tracehook_consider_fatal_signal()
Add the killed tracehook_consider_fatal_signal() back. It has multiple callers and it is not easy add the necessary checks inline. Signed-off-by: Oleg Nesterov o...@redhat.com --- arch/s390/kernel/traps.c |4 ++-- include/linux/tracehook.h | 22 ++ kernel/signal.c |4 ++-- 3 files changed, 26 insertions(+), 4 deletions(-) diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c index ffabcd9..1018ab6 100644 --- a/arch/s390/kernel/traps.c +++ b/arch/s390/kernel/traps.c @@ -329,7 +329,7 @@ void __kprobes do_per_trap(struct pt_regs *regs) if (notify_die(DIE_SSTEP, sstep, regs, 0, 0, SIGTRAP) == NOTIFY_STOP) return; - if (!current-ptrace) + if (!tracehook_consider_fatal_signal(current, SIGTRAP)) return; info.si_signo = SIGTRAP; info.si_errno = 0; @@ -428,7 +428,7 @@ static void __kprobes illegal_op(struct pt_regs *regs, long pgm_int_code, if (get_user(*((__u16 *) opcode), (__u16 __user *) location)) return; if (*((__u16 *) opcode) == S390_BREAKPOINT_U16) { - if (current-ptrace) { + if (tracehook_consider_fatal_signal(current, SIGTRAP)) { info.si_signo = SIGTRAP; info.si_errno = 0; info.si_code = TRAP_BRKPT; diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 8cc28bc..ec2af67 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -161,6 +161,28 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info, ptrace_notify(SIGTRAP); } +/** + * tracehook_consider_fatal_signal - suppress special handling of fatal signal + * @task: task receiving the signal + * @sig: signal number being sent + * + * Return nonzero to prevent special handling of this termination signal. + * Normally handler for signal is %SIG_DFL. It can be %SIG_IGN if @sig is + * ignored, in which case force_sig() is about to reset it to %SIG_DFL. + * When this returns zero, this signal might cause a quick termination + * that does not give the debugger a chance to intercept the signal. + * + * Called with or without @task-sighand-siglock held. + */ +static inline int tracehook_consider_fatal_signal(struct task_struct *task, + int sig) +{ + if (unlikely(task_utrace_flags(task) (UTRACE_EVENT(SIGNAL_TERM) | + UTRACE_EVENT(SIGNAL_CORE + return 1; + return task-ptrace != 0; +} + #ifdef TIF_NOTIFY_RESUME /** * set_notify_resume - cause tracehook_notify_resume() to be called diff --git a/kernel/signal.c b/kernel/signal.c index 291c970..d7ef0da 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -494,7 +494,7 @@ int unhandled_signal(struct task_struct *tsk, int sig) if (handler != SIG_IGN handler != SIG_DFL) return 0; /* if ptraced, let the tracer determine */ - return !tsk-ptrace; + return !tracehook_consider_fatal_signal(tsk, sig); } /* @@ -982,7 +982,7 @@ static void complete_signal(int sig, struct task_struct *p, int group) if (sig_fatal(p, sig) !(signal-flags (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) !sigismember(t-real_blocked, sig) - (sig == SIGKILL || !t-ptrace)) { + (sig == SIGKILL || !tracehook_consider_fatal_signal(t, sig))) { /* * This signal will be fatal to the whole group. */ -- 1.5.5.1
[PATCH 08/31] restore the DEATH/REAP utrace hooks
Restore the necessary hooks in release_task() and exit_notify(), add the corresponding helpers into utrace.h. Note: the @signal argument passed to -report_death() does not match the previous behaviour. I think this shouldn't affect the current users, and I bet nobody can really understand what this magic argument should actually mean anyway. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/utrace.h | 22 ++ kernel/exit.c |4 2 files changed, 26 insertions(+), 0 deletions(-) diff --git a/include/linux/utrace.h b/include/linux/utrace.h index 9a2e2f4..cf13839 100644 --- a/include/linux/utrace.h +++ b/include/linux/utrace.h @@ -697,4 +697,26 @@ static inline __must_check int utrace_barrier_pid(struct pid *pid, #endif /* CONFIG_UTRACE */ +static inline void utrace_release_task(struct task_struct *task) +{ + /* see utrace_add_engine() about this barrier */ + smp_mb(); + if (task_utrace_flags(task)) + utrace_maybe_reap(task, task_utrace_struct(task), true); +} + +static inline void utrace_exit_notify(struct task_struct *task, + int signal, int group_dead) +{ + /* +* If utrace_set_events() was just called to enable +* UTRACE_EVENT(DEATH), then we are obliged to call +* utrace_report_death() and not miss it. utrace_set_events() +* checks @task-exit_state under tasklist_lock to synchronize +* with exit_notify(), the caller. +*/ + if (task_utrace_flags(task) _UTRACE_DEATH_EVENTS) + utrace_report_death(task, group_dead, signal); +} + #endif /* linux/utrace.h */ diff --git a/kernel/exit.c b/kernel/exit.c index c1b0ab6..ba5ba22 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -168,6 +168,8 @@ void release_task(struct task_struct * p) struct task_struct *leader; int zap_leader; repeat: + utrace_release_task(p); + /* don't need to get the RCU readlock here - the process is dead and * can't be modifying its own credentials. But shut RCU-lockdep up */ rcu_read_lock(); @@ -860,6 +862,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead) wake_up_process(tsk-signal-group_exit_task); write_unlock_irq(tasklist_lock); + utrace_exit_notify(tsk, autoreap ? -1 : SIGCHLD, group_dead); + /* If the process is dead, release it - nobody will wait for it */ if (autoreap) release_task(tsk); -- 1.5.5.1
[PATCH 09/31] utrace: remove jobctl bits
- change utrace_get_signal() to check JOBCTL_STOP_PENDING instead of signal-group_stop_count. With the recent changes group_stop_count doesn't necessarily mean this task should participate in group stop. - remove the participate in group stop code from utrace_wakeup() and utrace_stop(), this is no longer needed and wrong. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c | 16 ++-- 1 files changed, 2 insertions(+), 14 deletions(-) diff --git a/kernel/utrace.c b/kernel/utrace.c index 1e750ad..5d3974e 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -648,11 +648,7 @@ static void utrace_wakeup(struct task_struct *target, struct utrace *utrace) { lockdep_assert_held(utrace-lock); spin_lock_irq(target-sighand-siglock); - if (target-signal-flags SIGNAL_STOP_STOPPED || - target-signal-group_stop_count) - target-state = TASK_STOPPED; - else - wake_up_state(target, __TASK_TRACED); + wake_up_state(target, __TASK_TRACED); spin_unlock_irq(target-sighand-siglock); } @@ -805,14 +801,6 @@ relock: __set_current_state(TASK_TRACED); - /* -* If there is a group stop in progress, -* we must participate in the bookkeeping. -*/ - if (unlikely(task-signal-group_stop_count) - !--task-signal-group_stop_count) - task-signal-flags = SIGNAL_STOP_STOPPED; - spin_unlock_irq(task-sighand-siglock); spin_unlock(utrace-lock); @@ -2037,7 +2025,7 @@ int utrace_get_signal(struct task_struct *task, struct pt_regs *regs, ka = NULL; memset(return_ka, 0, sizeof *return_ka); } else if (!(task-utrace_flags UTRACE_EVENT_SIGNAL_ALL) || - unlikely(task-signal-group_stop_count)) { + unlikely(task-jobctl JOBCTL_STOP_PENDING)) { /* * If no engine is interested in intercepting signals or * we must stop, let the caller just dequeue them normally -- 1.5.5.1
[PATCH 05/31] add utrace hooks into sig_ignored() and recalc_sigpending()
Add the necessary and somewhat special hooks into sig_ignored() and recalc_sigpending(). Basically this restores _force_sigpending() and _consider_ignored_signal() tracehook logic. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/utrace.h |2 ++ kernel/signal.c|7 ++- 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/include/linux/utrace.h b/include/linux/utrace.h index f251efe..1b8da1c 100644 --- a/include/linux/utrace.h +++ b/include/linux/utrace.h @@ -107,6 +107,8 @@ bool utrace_report_syscall_entry(struct pt_regs *); void utrace_report_syscall_exit(struct pt_regs *); void utrace_signal_handler(struct task_struct *, int); +#define UTRACE_FLAG(task, ev) (task_utrace_flags(task) UTRACE_EVENT(ev)) + #ifndef CONFIG_UTRACE /* diff --git a/kernel/signal.c b/kernel/signal.c index d7ef0da..0f9af0b 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -87,7 +87,7 @@ static int sig_ignored(struct task_struct *t, int sig, int from_ancestor_ns) /* * Tracers may want to know about even ignored signals. */ - return !t-ptrace; + return !t-ptrace !UTRACE_FLAG(t, SIGNAL_IGN); } /* @@ -150,6 +150,11 @@ void recalc_sigpending_and_wake(struct task_struct *t) void recalc_sigpending(void) { + if (task_utrace_flags(current) utrace_interrupt_pending()) { + set_thread_flag(TIF_SIGPENDING); + return; + } + if (!recalc_sigpending_tsk(current) !freezing(current)) clear_thread_flag(TIF_SIGPENDING); -- 1.5.5.1
[PATCH 11/31] introduce wake_up_quiescent()
No functional changes. Add the new helper, wake_up_quiescent(task, state), which simply returns wake_up_state(task, state). Change all callers which do wake_up_state(STOPPED/TRACED) to use the new helper. ptrace_stop() is a bit special, it does __set_current_state(RUNNING) in the very unlikely case, change it as well. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/signal.h |2 ++ kernel/ptrace.c|2 +- kernel/signal.c| 12 ++-- kernel/utrace.c|2 +- 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/include/linux/signal.h b/include/linux/signal.h index a822300..2be3712 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -239,6 +239,8 @@ static inline int valid_signal(unsigned long sig) struct timespec; struct pt_regs; +extern int wake_up_quiescent(struct task_struct *p, unsigned int state); + extern int next_signal(struct sigpending *pending, sigset_t *mask); extern int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p, bool group); diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 56b8fc1..4194664 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -621,7 +621,7 @@ static int ptrace_resume(struct task_struct *child, long request, child-exit_code = data; if (lock_task_sighand(child, flags)) { - wake_up_state(child, __TASK_TRACED); + wake_up_quiescent(child, __TASK_TRACED); unlock_task_sighand(child, flags); } diff --git a/kernel/signal.c b/kernel/signal.c index 71f5cca..3e8e0b1 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -702,6 +702,14 @@ void signal_wake_up(struct task_struct *t, int resume) } /* + * wakes up the STOPPED/TRACED task, must be called with -siglock held. + */ +int wake_up_quiescent(struct task_struct *p, unsigned int state) +{ + return wake_up_state(p, state); +} + +/* * Remove signals in mask from the pending set and queue. * Returns 1 if any signals were found. * @@ -888,7 +896,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns) task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING); rm_from_queue(SIG_KERNEL_STOP_MASK, t-pending); if (likely(!(t-ptrace PT_SEIZED))) - wake_up_state(t, __TASK_STOPPED); + wake_up_quiescent(t, __TASK_STOPPED); else ptrace_trap_notify(t); } while_each_thread(p, t); @@ -1879,7 +1887,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) do_notify_parent_cldstop(current, false, why); spin_lock_irq(current-sighand-siglock); - __set_current_state(TASK_RUNNING); + wake_up_quiescent(current, __TASK_TRACED); spin_unlock_irq(current-sighand-siglock); if (clear_code) diff --git a/kernel/utrace.c b/kernel/utrace.c index 5d3974e..cebc390 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -648,7 +648,7 @@ static void utrace_wakeup(struct task_struct *target, struct utrace *utrace) { lockdep_assert_held(utrace-lock); spin_lock_irq(target-sighand-siglock); - wake_up_state(target, __TASK_TRACED); + wake_up_quiescent(target, __TASK_TRACED); spin_unlock_irq(target-sighand-siglock); } -- 1.5.5.1
[PATCH 16/31] reintroduce tracehook_finish_jctl() as utrace_end_stop()
utrace_finish_stop() is needed to avoid the races with SIGKILL which wakes up UTRACED task, and thus it should be called every time after the STOPPED/TRACED/UTRACED returns from schedule(), remember that TASK_UTRACED can be added while the task is STOPPED/UTRACED. - change do_signal_state() to call this helper right after schedule(), otherwise this logic is broken by the upstream changes - now that utrace doesn't control TASK_TRACED bit, ptrace_stop() must call this helper too. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/utrace.h | 11 +++ kernel/signal.c|5 + kernel/utrace.c|2 +- 3 files changed, 17 insertions(+), 1 deletions(-) diff --git a/include/linux/utrace.h b/include/linux/utrace.h index cf13839..0279c74 100644 --- a/include/linux/utrace.h +++ b/include/linux/utrace.h @@ -719,4 +719,15 @@ static inline void utrace_exit_notify(struct task_struct *task, utrace_report_death(task, group_dead, signal); } +/** + * utrace_end_stop - report about return from STOPPED/TRACED + * + * This is called by do_signal_stop() and ptrace_stop after wakeup. + */ +static inline void utrace_end_stop(void) +{ + if (task_utrace_flags(current)) + utrace_finish_stop(); +} + #endif /* linux/utrace.h */ diff --git a/kernel/signal.c b/kernel/signal.c index 0dc6abb..a625309 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1895,6 +1895,8 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) read_unlock(tasklist_lock); } + utrace_end_stop(); + /* * While in TASK_TRACED, we were considered frozen enough. * Now that we woke up, it's crucial if we're supposed to be @@ -2059,6 +2061,9 @@ static bool do_signal_stop(int signr) /* Now we don't run again until woken by SIGCONT or SIGKILL */ schedule(); + + utrace_end_stop(); + return true; } else { /* diff --git a/kernel/utrace.c b/kernel/utrace.c index 2097103..d41b982 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -741,7 +741,7 @@ static bool utrace_reset(struct task_struct *task, struct utrace *utrace) void utrace_finish_stop(void) { /* -* If we were task_is_traced() and then SIGKILL'ed, make +* If we were task_is_utraced() and then SIGKILL'ed, make * sure we do nothing until the tracer drops utrace-lock. */ if (unlikely(__fatal_signal_pending(current))) { -- 1.5.5.1
[PATCH 15/31] utrace: use TASK_UTRACED instead of TASK_TRACED
Change utrace.c to use TASK_UTRACED instead of TASK_TRACED. - utrace_stop/utrace_wakeup: simply use the new state - utrace_do_stop: do not clear STOPPED/TRACED, but add the new __TASK_UTRACED bit to state the fact that both ptrace and utrace want this task to be stopped - naturally, do not use task_is_traced() to check if this task was stopped by utrace, use the new task_is_utraced() helper which checks __TASK_UTRACED. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c | 26 ++ 1 files changed, 14 insertions(+), 12 deletions(-) diff --git a/kernel/utrace.c b/kernel/utrace.c index cebc390..2097103 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -462,6 +462,8 @@ static void put_detached_list(struct list_head *list) */ #define ENGINE_STOP(1UL _UTRACE_NEVENTS) +#define task_is_utraced(task) ((task-state __TASK_UTRACED) != 0) + static void mark_engine_wants_stop(struct task_struct *task, struct utrace_engine *engine) { @@ -576,7 +578,7 @@ int utrace_set_events(struct task_struct *target, ret = 0; if ((old_flags ~events) target != current - !task_is_stopped_or_traced(target) !target-exit_state) { + !task_is_utraced(target) !target-exit_state) { /* * This barrier ensures that our engine-flags changes * have hit before we examine utrace-reporting, @@ -623,21 +625,21 @@ static void mark_engine_detached(struct utrace_engine *engine) */ static bool utrace_do_stop(struct task_struct *target, struct utrace *utrace) { - if (task_is_stopped(target)) { + if (task_is_stopped_or_traced(target)) { /* * Stopped is considered quiescent; when it wakes up, it will * go through utrace_finish_stop() before doing anything else. */ spin_lock_irq(target-sighand-siglock); - if (likely(task_is_stopped(target))) - __set_task_state(target, TASK_TRACED); + if (likely(task_is_stopped_or_traced(target))) + target-state |= TASK_UTRACED; spin_unlock_irq(target-sighand-siglock); } else if (utrace-resume UTRACE_REPORT) { utrace-resume = UTRACE_REPORT; set_notify_resume(target); } - return task_is_traced(target); + return task_is_utraced(target); } /* @@ -648,7 +650,7 @@ static void utrace_wakeup(struct task_struct *target, struct utrace *utrace) { lockdep_assert_held(utrace-lock); spin_lock_irq(target-sighand-siglock); - wake_up_quiescent(target, __TASK_TRACED); + wake_up_quiescent(target, __TASK_UTRACED); spin_unlock_irq(target-sighand-siglock); } @@ -710,7 +712,7 @@ static bool utrace_reset(struct task_struct *task, struct utrace *utrace) /* * If no more engines want it stopped, wake it up. */ - if (task_is_traced(task) !(flags ENGINE_STOP)) { + if (task_is_utraced(task) !(flags ENGINE_STOP)) { /* * It just resumes, so make sure single-step * is not left set. @@ -749,7 +751,7 @@ void utrace_finish_stop(void) } /* - * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up. + * Perform %UTRACE_STOP, i.e. block in TASK_UTRACED until woken up. * @task == current, @utrace == current-utrace, which is not locked. * Return true if we were woken up by SIGKILL even though some utrace * engine may still want us to stay stopped. @@ -799,7 +801,7 @@ relock: return; } - __set_current_state(TASK_TRACED); + __set_current_state(TASK_UTRACED); spin_unlock_irq(task-sighand-siglock); spin_unlock(utrace-lock); @@ -809,14 +811,14 @@ relock: utrace_finish_stop(); /* -* While in TASK_TRACED, we were considered frozen enough. +* While in TASK_UTRACED, we were considered frozen enough. * Now that we woke up, it's crucial if we're supposed to be * frozen that we freeze now before running anything substantial. */ try_to_freeze(); /* -* While we were in TASK_TRACED, complete_signal() considered +* While we were in TASK_UTRACED, complete_signal() considered * us uninterested in signal wakeups. Now make sure our * TIF_SIGPENDING state is correct for normal running. */ @@ -1087,7 +1089,7 @@ int utrace_control(struct task_struct *target, if (unlikely(IS_ERR(utrace))) return PTR_ERR(utrace); - reset = task_is_traced(target); + reset = task_is_utraced(target); ret = 0; /* -- 1.5.5.1
[PATCH 12/31] introduce ptrace_signal_wake_up()
Add the new helper, ptrace_signal_wake_up(), change ptrace.c/signal.c to use it instead of signal_wake_up() to wake up a STOPPED/TRACED task. The new helper does almost the same, except: - it doesn't use the TASK_WAKEKILL bit to wake up the TRACED or STOPPED task, it uses __TASK_STOPPED | __TASK_TRACED explicitly. This is what ptrace actually wants, it should never wake up a TASK_KILLABLE task. This should be cleanuped upatream, signal_wake_up() should take the state as an argument, not a boolean. Until then we add a new static helper. - it uses wake_up_quiescent() instead of wake_up_state(). Thereafter every change from STOPPED/TRACED to RUNNING is done via wake_up_quiescent(). Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/ptrace.h |1 + kernel/ptrace.c| 20 kernel/signal.c|2 +- 3 files changed, 18 insertions(+), 5 deletions(-) diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 800f113..6d9282a 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -113,6 +113,7 @@ #include linux/compiler.h/* For unlikely. */ #include linux/sched.h /* For struct task_struct. */ +extern void ptrace_signal_wake_up(struct task_struct *p, int quiescent); extern long arch_ptrace(struct task_struct *child, long request, unsigned long addr, unsigned long data); diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 4194664..1a50090 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -25,6 +25,18 @@ #include linux/hw_breakpoint.h #include linux/cn_proc.h +void ptrace_signal_wake_up(struct task_struct *p, int quiescent) +{ + unsigned int state; + + set_tsk_thread_flag(p, TIF_SIGPENDING); + + state = TASK_INTERRUPTIBLE; + if (quiescent) + state |= (__TASK_STOPPED | __TASK_TRACED); + if (!wake_up_quiescent(p, state)) + kick_process(p); +} static int ptrace_trapping_sleep_fn(void *flags) { @@ -106,7 +118,7 @@ void __ptrace_unlink(struct task_struct *child) * TASK_KILLABLE sleeps. */ if (child-jobctl JOBCTL_STOP_PENDING || task_is_traced(child)) - signal_wake_up(child, task_is_traced(child)); + ptrace_signal_wake_up(child, task_is_traced(child)); spin_unlock(child-sighand-siglock); } @@ -296,7 +308,7 @@ static int ptrace_attach(struct task_struct *task, long request, */ if (task_is_stopped(task) task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) - signal_wake_up(task, 1); + ptrace_signal_wake_up(task, 1); spin_unlock(task-sighand-siglock); @@ -731,7 +743,7 @@ int ptrace_request(struct task_struct *child, long request, * tracee into STOP. */ if (likely(task_set_jobctl_pending(child, JOBCTL_TRAP_STOP))) - signal_wake_up(child, child-jobctl JOBCTL_LISTENING); + ptrace_signal_wake_up(child, child-jobctl JOBCTL_LISTENING); unlock_task_sighand(child, flags); ret = 0; @@ -760,7 +772,7 @@ int ptrace_request(struct task_struct *child, long request, * of this trap and now. Trigger re-trap immediately. */ if (child-jobctl JOBCTL_TRAP_NOTIFY) - signal_wake_up(child, true); + ptrace_signal_wake_up(child, true); unlock_task_sighand(child, flags); ret = 0; diff --git a/kernel/signal.c b/kernel/signal.c index 3e8e0b1..0dc6abb 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -854,7 +854,7 @@ static void ptrace_trap_notify(struct task_struct *t) assert_spin_locked(t-sighand-siglock); task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY); - signal_wake_up(t, t-jobctl JOBCTL_LISTENING); + ptrace_signal_wake_up(t, t-jobctl JOBCTL_LISTENING); } /* -- 1.5.5.1
[PATCH 18/31] ptrace_stop: do not assume the task is running after wake_up_quiescent()
If ptrace_stop() sets TASK_TRACED and then detects we should not stop, it can race with utrace_do_stop() which can see TASK_TRACED and add TASK_UTRACED. In this case we should stop for utrace needs. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/signal.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 0d1675a..249760f 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1908,6 +1908,14 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) if (clear_code) current-exit_code = 0; read_unlock(tasklist_lock); + + /* +* It is possible that __TASK_UTRACED was added by utrace +* while we were __TASK_TRACED and before we take -siglock +* for wake_up_quiescent(), we need to block in this case. +* Otherwise this is unnecessary but absolutely harmless. +*/ + schedule(); } utrace_end_stop(); -- 1.5.5.1
[PATCH 14/31] introduce TASK_UTRACED state
Introduce TASK_UTRACED state, will be used by utrace instead of TASK_TRACED. Note: this state is reported as t (tracing stop) to the user-space to avoid the confusion. IOW, it looks like TASK_TRACED in /proc/pid/status. Signed-off-by: Oleg Nesterov o...@redhat.com --- fs/proc/array.c | 11 ++- include/linux/sched.h | 20 +++- 2 files changed, 17 insertions(+), 14 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index f0c0ea2..e0daec4 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -138,11 +138,12 @@ static const char * const task_state_array[] = { D (disk sleep), /* 2 */ T (stopped), /* 4 */ t (tracing stop), /* 8 */ - Z (zombie), /* 16 */ - X (dead), /* 32 */ - x (dead), /* 64 */ - K (wakekill), /* 128 */ - W (waking), /* 256 */ + t (tracing stop), /* 16 (stopped by utrace) */ + Z (zombie), /* 32 */ + X (dead), /* 64 */ + x (dead), /* 128 */ + K (wakekill), /* 256 */ + W (waking), /* 512 */ }; static inline const char *get_task_state(struct task_struct *tsk) diff --git a/include/linux/sched.h b/include/linux/sched.h index c6d79af..f3f0a77 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -184,16 +184,17 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) #define TASK_UNINTERRUPTIBLE 2 #define __TASK_STOPPED 4 #define __TASK_TRACED 8 +#define __TASK_UTRACED 16 /* in tsk-exit_state */ -#define EXIT_ZOMBIE16 -#define EXIT_DEAD 32 +#define EXIT_ZOMBIE32 +#define EXIT_DEAD 64 /* in tsk-state again */ -#define TASK_DEAD 64 -#define TASK_WAKEKILL 128 -#define TASK_WAKING256 -#define TASK_STATE_MAX 512 +#define TASK_DEAD 128 +#define TASK_WAKEKILL 256 +#define TASK_WAKING512 +#define TASK_STATE_MAX 1024 -#define TASK_STATE_TO_CHAR_STR RSDTtZXxKW +#define TASK_STATE_TO_CHAR_STR RSDTtUZXxKW extern char ___assert_task_state[1 - 2*!!( sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)]; @@ -202,15 +203,16 @@ extern char ___assert_task_state[1 - 2*!!( #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) #define TASK_TRACED(TASK_WAKEKILL | __TASK_TRACED) +#define TASK_UTRACED (TASK_WAKEKILL | __TASK_UTRACED) /* Convenience macros for the sake of wake_up */ #define TASK_NORMAL(TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE) -#define TASK_ALL (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED) +#define TASK_ALL (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED | __TASK_UTRACED) /* get_task_state() */ #define TASK_REPORT(TASK_RUNNING | TASK_INTERRUPTIBLE | \ TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \ -__TASK_TRACED) +__TASK_TRACED | __TASK_UTRACED) #define task_is_traced(task) ((task-state __TASK_TRACED) != 0) #define task_is_stopped(task) ((task-state __TASK_STOPPED) != 0) -- 1.5.5.1
[PATCH 19/31] get_signal_to_deliver: restore/restructure utrace/ptrace signal reporting
- Reintroduce tracehook_get_signal() as utrace_hook_signal(). - Change get_signal_to_deliver() to call utrace_hook_signal() first, before dequeue_signal() - Always call ptrace_signal() if signal != SIGKILL, no matter whether this signal comes from utrace or not. Since this can change signr again, update struct k_sigaction *ka in this case. IOW, roughly, ptrace acts as if it is the last attached engine, it takes the final decision about the signal. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/utrace.h | 31 +++ kernel/signal.c| 30 -- 2 files changed, 51 insertions(+), 10 deletions(-) diff --git a/include/linux/utrace.h b/include/linux/utrace.h index 0279c74..63103e2 100644 --- a/include/linux/utrace.h +++ b/include/linux/utrace.h @@ -730,4 +730,35 @@ static inline void utrace_end_stop(void) utrace_finish_stop(); } +/** + * utrace_hook_signal - deliver synthetic signal to traced task + * @task: @current + * @regs: task_pt_regs(@current) + * @info: details of synthetic signal + * @return_ka: sigaction for synthetic signal + * + * Return zero to check for a real pending signal normally. + * Return -1 after releasing the siglock to repeat the check. + * Return a signal number to induce an artificial signal delivery, + * setting *@info and *@return_ka to specify its details and behavior. + * + * The @return_ka-sa_handler value controls the disposition of the + * signal, no matter the signal number. For %SIG_DFL, the return value + * is a representative signal to indicate the behavior (e.g. %SIGTERM + * for death, %SIGQUIT for core dump, %SIGSTOP for job control stop, + * %SIGTSTP for stop unless in an orphaned pgrp), but the signal number + * reported will be @info-si_signo instead. + * + * Called with @task-sighand-siglock held, before dequeuing pending signals. + */ +static inline int utrace_hook_signal(struct task_struct *task, + struct pt_regs *regs, + siginfo_t *info, + struct k_sigaction *return_ka) +{ + if (unlikely(task_utrace_flags(task))) + return utrace_get_signal(task, regs, info, return_ka); + return 0; +} + #endif /* linux/utrace.h */ diff --git a/kernel/signal.c b/kernel/signal.c index 249760f..3c783d3 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2234,17 +2234,27 @@ relock: for (;;) { struct k_sigaction *ka; - if (unlikely(current-jobctl JOBCTL_STOP_PENDING) - do_signal_stop(0)) + signr = utrace_hook_signal(current, regs, info, return_ka); + if (unlikely(signr 0)) goto relock; - if (unlikely(current-jobctl JOBCTL_TRAP_MASK)) { - do_jobctl_trap(); - spin_unlock_irq(sighand-siglock); - goto relock; - } + if (unlikely(signr != 0)) + ka = return_ka; + else { + if (unlikely(current-jobctl JOBCTL_STOP_PENDING) + do_signal_stop(0)) + goto relock; - signr = dequeue_signal(current, current-blocked, info); + if (unlikely(current-jobctl JOBCTL_TRAP_MASK)) { + do_jobctl_trap(); + spin_unlock_irq(sighand-siglock); + goto relock; + } + + signr = dequeue_signal(current, current-blocked, info); + + ka = sighand-action[signr-1]; + } if (!signr) break; /* will return 0 */ @@ -2254,9 +2264,9 @@ relock: regs, cookie); if (!signr) continue; - } - ka = sighand-action[signr-1]; + ka = sighand-action[signr-1]; + } /* Trace actually delivered signals. */ trace_signal_deliver(signr, info, ka); -- 1.5.5.1
[PATCH 20/31] utrace_get_signal: s/JOBCTL_STOP_PENDING/JOBCTL_PENDING_MASK/
utrace_get_signal() checks JOBCTL_STOP_PENDING to detect the case when we should not try to dequeue the signal but should try to participate in the group-stop. With the recent changes this is not enough, everything which contrbutes to recalc_sigpending_tsk() should be respected. Check JOBCTL_PENDING_MASK instead. This matches the JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK code in the caller, get_signal_to_deliver(). Note that this code won't run if utrace_get_signal() returns signr 0. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/utrace.c b/kernel/utrace.c index d41b982..0bb0a06 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -2027,7 +2027,7 @@ int utrace_get_signal(struct task_struct *task, struct pt_regs *regs, ka = NULL; memset(return_ka, 0, sizeof *return_ka); } else if (!(task-utrace_flags UTRACE_EVENT_SIGNAL_ALL) || - unlikely(task-jobctl JOBCTL_STOP_PENDING)) { + unlikely(task-jobctl JOBCTL_PENDING_MASK)) { /* * If no engine is interested in intercepting signals or * we must stop, let the caller just dequeue them normally -- 1.5.5.1
[PATCH 17/31] teach wake_up_quiescent() to do selective wake_up
Both utrace and ptrace can want the same thread to be quiescent, in this case its state is TASK_TRACED | TASK_UTRACED. And this also means that this task must not run unless both utrace and ptrace resume it. Change wake_up_quiescent(p, state) to do p-state = ~state and return false unless there is no more quiescent bits in task-state. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/signal.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index a625309..0d1675a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -701,11 +701,26 @@ void signal_wake_up(struct task_struct *t, int resume) kick_process(t); } +#define STATE_QUIESCENT(__TASK_STOPPED | __TASK_TRACED | __TASK_UTRACED) /* * wakes up the STOPPED/TRACED task, must be called with -siglock held. */ int wake_up_quiescent(struct task_struct *p, unsigned int state) { + unsigned int quiescent = (p-state STATE_QUIESCENT); + + WARN_ON(state ~(STATE_QUIESCENT | TASK_INTERRUPTIBLE)); + + if (quiescent) { + state = ~TASK_INTERRUPTIBLE; + if ((quiescent ~state) != 0) { + p-state = ~state; + WARN_ON(!(p-state STATE_QUIESCENT)); + WARN_ON(!(p-state TASK_WAKEKILL)); + return 0; + } + } + return wake_up_state(p, state); } -- 1.5.5.1
[PATCH 22/31] introduce PT_SYSCALL_TRACE flag
Currently tracehooks assume that if the ptraced task has TIF_SYSCALL_TRACE set, the tracee should report the syscall. This is not true, this thread flag can be set by utrace. Add the new internal ptrace flag, PT_SYSCALL_TRACE. Change ptrace_set_syscall_trace() to set/clear this bit along with TIF_SYSCALL_TRACE, change ptrace_report_syscall() to check this flag instead of PT_PTRACED. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/ptrace.h|3 +++ include/linux/tracehook.h |2 +- kernel/ptrace.c |7 +-- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 6d9282a..c10f610 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -104,6 +104,8 @@ #define PT_TRACE_MASK 0x03f4 +#define PT_SYSCALL_TRACE 0x0002 + /* single stepping state bits (used on ARM and PA-RISC) */ #define PT_SINGLESTEP_BIT 31 #define PT_SINGLESTEP (1PT_SINGLESTEP_BIT) @@ -227,6 +229,7 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace) if (unlikely(ptrace) current-ptrace) { child-ptrace = current-ptrace; + child-ptrace = ~PT_SYSCALL_TRACE; __ptrace_link(child, current-parent); if (child-ptrace PT_SEIZED) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index ec2af67..eb9fe30 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -59,7 +59,7 @@ static inline void ptrace_report_syscall(struct pt_regs *regs) { int ptrace = current-ptrace; - if (!(ptrace PT_PTRACED)) + if (!(ptrace PT_SYSCALL_TRACE)) return; ptrace_notify(SIGTRAP | ((ptrace PT_TRACESYSGOOD) ? 0x80 : 0)); diff --git a/kernel/ptrace.c b/kernel/ptrace.c index dc2ad34..7deb292 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -40,10 +40,13 @@ void ptrace_signal_wake_up(struct task_struct *p, int quiescent) static void ptrace_set_syscall_trace(struct task_struct *p, bool on) { - if (on) + if (on) { + p-ptrace |= PT_SYSCALL_TRACE; set_tsk_thread_flag(p, TIF_SYSCALL_TRACE); - else + } else { + p-ptrace = ~PT_SYSCALL_TRACE; clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); + } } static int ptrace_trapping_sleep_fn(void *flags) -- 1.5.5.1
[PATCH 21/31] introduce ptrace_set_syscall_trace()
No functional changes. Add the new helper, ptrace_set_syscall_trace(), which should be used to set/clear TIF_SYSCALL_TRACE in ptrace code. Currently it does nothing more. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/ptrace.c | 15 ++- 1 files changed, 10 insertions(+), 5 deletions(-) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 1a50090..dc2ad34 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -38,6 +38,14 @@ void ptrace_signal_wake_up(struct task_struct *p, int quiescent) kick_process(p); } +static void ptrace_set_syscall_trace(struct task_struct *p, bool on) +{ + if (on) + set_tsk_thread_flag(p, TIF_SYSCALL_TRACE); + else + clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); +} + static int ptrace_trapping_sleep_fn(void *flags) { schedule(); @@ -418,7 +426,7 @@ static int ptrace_detach(struct task_struct *child, unsigned int data) /* Architecture-specific hardware disable .. */ ptrace_disable(child); - clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); + ptrace_set_syscall_trace(child, false); write_lock_irq(tasklist_lock); /* @@ -606,10 +614,7 @@ static int ptrace_resume(struct task_struct *child, long request, if (!valid_signal(data)) return -EIO; - if (request == PTRACE_SYSCALL) - set_tsk_thread_flag(child, TIF_SYSCALL_TRACE); - else - clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); + ptrace_set_syscall_trace(child, request == PTRACE_SYSCALL); #ifdef TIF_SYSCALL_EMU if (request == PTRACE_SYSEMU || request == PTRACE_SYSEMU_SINGLESTEP) -- 1.5.5.1
[PATCH 23/31] utrace: don't clear TIF_SYSCALL_TRACE if it is needed by ptrace
TIF_SYSCALL_TRACE should be cleared only if both ptrace and utrace do not want it, change utrace_reset() to check PT_SYSCALL_TRACE before clear_tsk_thread_flag(TIF_SYSCALL_TRACE). Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/utrace.c b/kernel/utrace.c index 0bb0a06..bebf6de 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -697,6 +697,7 @@ static bool utrace_reset(struct task_struct *task, struct utrace *utrace) BUG_ON(utrace-death); flags = UTRACE_EVENT(REAP); } else if (!(flags UTRACE_EVENT_SYSCALL) + !(task-ptrace PT_SYSCALL_TRACE) test_tsk_thread_flag(task, TIF_SYSCALL_TRACE)) { clear_tsk_thread_flag(task, TIF_SYSCALL_TRACE); } -- 1.5.5.1
[PATCH 02/31] utrace: add utrace_init_task/utrace_free_task calls
Add the necessary copy_process()-utrace_init_task() and free_task()-utrace_free_task() calls. Originally this was the part of utrace core patch, but since tracehooks are dying it doesn't make sense to reintroduce them. Instead, just call the utrace_ helpers directly. This is fine even without CONFIG_UTRACE, gcc is smart enough to optimize out the code in free_task(). Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/fork.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index e7ceaca..a9891da 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -66,6 +66,7 @@ #include linux/user-return-notifier.h #include linux/oom.h #include linux/khugepaged.h +#include linux/utrace.h #include asm/pgtable.h #include asm/pgalloc.h @@ -167,6 +168,8 @@ void free_task(struct task_struct *tsk) free_thread_info(tsk-stack); rt_mutex_debug_task_free(tsk); ftrace_graph_exit_task(tsk); + if (task_utrace_struct(tsk)) + utrace_free_task(tsk); free_task_struct(tsk); } EXPORT_SYMBOL(free_task); @@ -1096,6 +1099,8 @@ static struct task_struct *copy_process(unsigned long clone_flags, if (!p) goto fork_out; + utrace_init_task(p); + ftrace_graph_init_task(p); rt_mutex_init_task(p); -- 1.5.5.1
[PATCH 24/31] introduce task_utrace_lock/task_utrace_unlock
- Add task_utrace_lock(task). It simply takes task-utrace-lock if this task was ever utraced. Otherwise it takes task_lock(), this serializes with utrace_attach_task()-utrace_task_alloc(). In both case the caller can be sure it can't race with anything which needs utrace-lock. - Add task_utrace_unlock(task), it releases the corresponding lock. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/utrace.h |9 + kernel/utrace.c| 26 ++ 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/include/linux/utrace.h b/include/linux/utrace.h index 63103e2..f37373b 100644 --- a/include/linux/utrace.h +++ b/include/linux/utrace.h @@ -117,6 +117,12 @@ void utrace_signal_handler(struct task_struct *, int); #ifndef CONFIG_UTRACE +static inline void task_utrace_lock(struct task_struct *task) +{ +} +static inline void task_utrace_unlock(struct task_struct *task) +{ +} /* * linux/tracehook.h uses these accessors to avoid #ifdef CONFIG_UTRACE. */ @@ -139,6 +145,9 @@ static inline void task_utrace_proc_status(struct seq_file *m, #else /* CONFIG_UTRACE */ +extern void task_utrace_lock(struct task_struct *task); +extern void task_utrace_unlock(struct task_struct *task); + static inline unsigned long task_utrace_flags(struct task_struct *task) { return task-utrace_flags; diff --git a/kernel/utrace.c b/kernel/utrace.c index bebf6de..960dd9e 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -79,6 +79,32 @@ static int __init utrace_init(void) } module_init(utrace_init); +void task_utrace_lock(struct task_struct *task) +{ + struct utrace *utrace = task_utrace_struct(task); + + if (!utrace) { + task_lock(task); + utrace = task_utrace_struct(task); + if (!utrace) + return; + + task_unlock(task); + } + + spin_lock(utrace-lock); +} + +void task_utrace_unlock(struct task_struct *task) +{ + struct utrace *utrace = task_utrace_struct(task); + + if (utrace) + spin_unlock(utrace-lock); + else + task_unlock(task); +} + /* * Set up @task.utrace for the first time. We can have races * between two utrace_attach_task() calls here. The task_lock() -- 1.5.5.1
[PATCH 27/31] utrace: finish_resume_report: don't do user_xxx_step() if ptrace_wants_step()
finish_resume_report() should not enable/disable the stepping if ptrace_wants_step() == T. If ptrace wants block_step while utrace wants single_step we could promote the stepping, but I do not think this really makes sense. Unless the tracee is killed this can't race with ptrace, this is called by the tracee itself. If it is killed we do not care. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/tracehook.h |8 kernel/utrace.c |9 ++--- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 21c8ca2..b6812d4 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -104,8 +104,8 @@ static inline __must_check int tracehook_report_syscall_entry( return 0; } -#define ptrace_wants_step()\ - (current-ptrace (PT_SINGLE_STEP | PT_SINGLE_BLOCK)) +#define ptrace_wants_step(task)\ + ((task)-ptrace (PT_SINGLE_STEP | PT_SINGLE_BLOCK)) /** * tracehook_report_syscall_exit - task has just finished a system call @@ -129,7 +129,7 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) if (task_utrace_flags(current) UTRACE_EVENT(SYSCALL_EXIT)) utrace_report_syscall_exit(regs); - if (step ptrace_wants_step()) { + if (step ptrace_wants_step(current)) { siginfo_t info; user_single_step_siginfo(current, regs, info); force_sig_info(SIGTRAP, info, current); @@ -160,7 +160,7 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info, { if (task_utrace_flags(current)) utrace_signal_handler(current, stepping); - if (stepping ptrace_wants_step()) + if (stepping ptrace_wants_step(current)) ptrace_notify(SIGTRAP); } diff --git a/kernel/utrace.c b/kernel/utrace.c index 960dd9e..05e8532 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -1829,7 +1829,8 @@ static void finish_resume_report(struct task_struct *task, case UTRACE_BLOCKSTEP: if (likely(arch_has_block_step())) { - user_enable_block_step(task); + if (!ptrace_wants_step(task)) + user_enable_block_step(task); break; } @@ -1842,7 +1843,8 @@ static void finish_resume_report(struct task_struct *task, case UTRACE_SINGLESTEP: if (likely(arch_has_single_step())) { - user_enable_single_step(task); + if (!ptrace_wants_step(task)) + user_enable_single_step(task); } else { /* * This means some callback is to blame for failing @@ -1857,7 +1859,8 @@ static void finish_resume_report(struct task_struct *task, case UTRACE_REPORT: case UTRACE_RESUME: default: - user_disable_single_step(task); + if (!ptrace_wants_step(task)) + user_disable_single_step(task); break; } } -- 1.5.5.1
[PATCH 30/31] ptrace_report_syscall: check TIF_SYSCALL_EMU
4d16a64 introduce PT_SYSCALL_TRACE flag breaks PTRACE_SYSEMU which doesn't set PT_SYSCALL_TRACE. Change ptrace_report_syscall() to check TIF_SYSCALL_EMU as well. This can't conflict with utrace, this flag can only be set by ptrace. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/tracehook.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index b6812d4..90ca578 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -59,7 +59,7 @@ static inline void ptrace_report_syscall(struct pt_regs *regs) { int ptrace = current-ptrace; - if (!(ptrace PT_SYSCALL_TRACE)) + if (!(ptrace PT_SYSCALL_TRACE) !test_thread_flag(TIF_SYSCALL_EMU)) return; ptrace_notify(SIGTRAP | ((ptrace PT_TRACESYSGOOD) ? 0x80 : 0)); -- 1.5.5.1
[PATCHSET] utrace for 3.1 kernel
Hello. utrace patches for 3.1 kernel. Untested, will try to do some tests tomorrow. I do not want to spam you all and the lists, please look at git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git utrace-3.1 Oleg Nesterov (29): utrace: add utrace_init_task/utrace_free_task calls tracehooks: add utrace hooks tracehooks: reintroduce tracehook_consider_fatal_signal() add utrace hooks into sig_ignored() and recalc_sigpending() restore the EXEC/EXIT/CLONE utrace hooks utrace: utrace_report_death() can use task_utrace_struct() restore the DEATH/REAP utrace hooks utrace: remove jobctl bits ptrace: take -siglock around s/TRACED/RUNNING/ introduce wake_up_quiescent() introduce ptrace_signal_wake_up() wait_task_inactive: treat task-state and match_state as bitmasks introduce TASK_UTRACED state utrace: use TASK_UTRACED instead of TASK_TRACED reintroduce tracehook_finish_jctl() as utrace_end_stop() teach wake_up_quiescent() to do selective wake_up ptrace_stop: do not assume the task is running after wake_up_quiescent() get_signal_to_deliver: restore/restructure utrace/ptrace signal reporting utrace_get_signal: s/JOBCTL_STOP_PENDING/JOBCTL_PENDING_MASK/ introduce ptrace_set_syscall_trace() introduce PT_SYSCALL_TRACE flag utrace: don't clear TIF_SYSCALL_TRACE if it is needed by ptrace introduce task_utrace_lock/task_utrace_unlock teach ptrace_set_syscall_trace() to play well with utrace introduce PT_SINGLE_STEP and PT_SINGLE_BLOCK utrace: finish_resume_report: don't do user_xxx_step() if ptrace_wants_step() ptrace: shift user_*_step() from ptrace_resume() to ptrace_stop() ptrace_disable: no need to disable stepping ptrace_report_syscall: check TIF_SYSCALL_EMU Roland McGrath (1): utrace core Documentation/DocBook/Makefile|2 +- Documentation/DocBook/utrace.tmpl | 589 + arch/s390/kernel/traps.c |4 +- arch/x86/kernel/ptrace.c |1 - fs/exec.c |5 +- fs/proc/array.c | 14 +- include/linux/ptrace.h|7 + include/linux/sched.h | 25 +- include/linux/signal.h|2 + include/linux/tracehook.h | 53 +- include/linux/utrace.h| 773 init/Kconfig |9 + kernel/Makefile |1 + kernel/exit.c |5 + kernel/fork.c |9 + kernel/ptrace.c | 57 +- kernel/sched.c|2 +- kernel/signal.c | 97 ++- kernel/utrace.c | 2461 + 19 files changed, 4062 insertions(+), 54 deletions(-) create mode 100644 Documentation/DocBook/utrace.tmpl create mode 100644 include/linux/utrace.h create mode 100644 kernel/utrace.c
[PATCH 0/2] update ptrace_disable() + PTRACE_SYSEMU/TIF_SYSCALL_EMU fix
On 07/01, Oleg Nesterov wrote: - Perhaps PTRACE_SYSEMU/TIF_SYSCALL_EMU logic was broken, I need to recheck. Yes, it was. Fixed by 2/2. ptrace: shift user_*_step() from ptrace_resume() to ptrace_stop() forgot to change ptrace_disable(), see 1/2. Oleg.
[PATCH 2/2] ptrace_report_syscall: check TIF_SYSCALL_EMU
4d16a64 introduce PT_SYSCALL_TRACE flag breaks PTRACE_SYSEMU which doesn't set PT_SYSCALL_TRACE. Change ptrace_report_syscall() to check TIF_SYSCALL_EMU as well. This can't conflict with utrace, this flag can only be set by ptrace. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/tracehook.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 5612d2d..ac833de 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -76,7 +76,7 @@ static inline void ptrace_report_syscall(struct pt_regs *regs) { int ptrace = task_ptrace(current); - if (!(ptrace PT_SYSCALL_TRACE)) + if (!(ptrace PT_SYSCALL_TRACE) !test_thread_flag(TIF_SYSCALL_EMU)) return; ptrace_notify(SIGTRAP | ((ptrace PT_TRACESYSGOOD) ? 0x80 : 0)); -- 1.5.5.1
[PATCH 1/2] utrace: finish_resume_report: don't do user_xxx_step() if ptrace_wants_step()
finish_resume_report() should not enable/disable the stepping if ptrace_wants_step() == T. If ptrace wants block_step while utrace wants single_step we could promote the stepping, but I do not think this really makes sense. Unless the tracee is killed this can't race with ptrace, this is called by the tracee itself. If it is killed we do not care. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/tracehook.h |8 kernel/utrace.c |9 ++--- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 06edb52..5612d2d 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -121,8 +121,8 @@ static inline __must_check int tracehook_report_syscall_entry( return 0; } -#define ptrace_wants_step()\ - (current-ptrace (PT_SINGLE_STEP | PT_SINGLE_BLOCK)) +#define ptrace_wants_step(task)\ + ((task)-ptrace (PT_SINGLE_STEP | PT_SINGLE_BLOCK)) /** * tracehook_report_syscall_exit - task has just finished a system call @@ -146,7 +146,7 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) if (task_utrace_flags(current) UTRACE_EVENT(SYSCALL_EXIT)) utrace_report_syscall_exit(regs); - if (step ptrace_wants_step()) { + if (step ptrace_wants_step(current)) { siginfo_t info; user_single_step_siginfo(current, regs, info); force_sig_info(SIGTRAP, info, current); @@ -439,7 +439,7 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info, { if (task_utrace_flags(current)) utrace_signal_handler(current, stepping); - if (stepping ptrace_wants_step()) + if (stepping ptrace_wants_step(current)) ptrace_notify(SIGTRAP); } diff --git a/kernel/utrace.c b/kernel/utrace.c index 508c13c..fbabb81 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -1828,7 +1828,8 @@ static void finish_resume_report(struct task_struct *task, case UTRACE_BLOCKSTEP: if (likely(arch_has_block_step())) { - user_enable_block_step(task); + if (!ptrace_wants_step(task)) + user_enable_block_step(task); break; } @@ -1841,7 +1842,8 @@ static void finish_resume_report(struct task_struct *task, case UTRACE_SINGLESTEP: if (likely(arch_has_single_step())) { - user_enable_single_step(task); + if (!ptrace_wants_step(task)) + user_enable_single_step(task); } else { /* * This means some callback is to blame for failing @@ -1856,7 +1858,8 @@ static void finish_resume_report(struct task_struct *task, case UTRACE_REPORT: case UTRACE_RESUME: default: - user_disable_single_step(task); + if (!ptrace_wants_step(task)) + user_disable_single_step(task); break; } } -- 1.5.5.1
[PATCH 2/2] ptrace: shift user_*_step() from ptrace_resume() to ptrace_stop()
1. ptrace_resume() plays with the stopped task which can be also task_is_utraced(). In this case user_enable_xxx_step() can race with utrace_reset()-user_disable_single_step(). We could change utrace_reset() to check ptrace_wants_step() and add the task_utrace_lock + unlock barrier after ptrace_resume() sets PT_SINGLE_STEP. But it is better to reassign enable/desable from the tracer to the tracee, it can check its PT_SINGLE_ bits after wakeup. This also makes sense because enable_step(task) can be faster if task == current. 2. ptrace can do user_disable_single_step() while utrace needs the stepping. Set TIF_NOTIFY_RESUME after user_disable_single_step() to ensure the tracee can't return to the user-space without utrace_resume(). Any correct engine which wants the stepping should reassert it. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/ptrace.c |4 kernel/signal.c | 12 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 44908d0..d37b30d 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -580,14 +580,10 @@ static int ptrace_resume(struct task_struct *child, long request, if (unlikely(!arch_has_block_step())) return -EIO; child-ptrace |= PT_SINGLE_BLOCK; - user_enable_block_step(child); } else if (is_singlestep(request) || is_sysemu_singlestep(request)) { if (unlikely(!arch_has_single_step())) return -EIO; child-ptrace |= PT_SINGLE_STEP; - user_enable_single_step(child); - } else { - user_disable_single_step(child); } child-exit_code = data; diff --git a/kernel/signal.c b/kernel/signal.c index d0e0c67..bc40bd7 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1840,6 +1840,18 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) } tracehook_finish_stop(); + + if (current-ptrace PT_SINGLE_BLOCK) + user_enable_block_step(current); + else if (current-ptrace PT_SINGLE_STEP) + user_enable_single_step(current); + else { + user_disable_single_step(current); + /* if utrace needs the stepping it should reassert */ + if (task_utrace_flags(current)) + set_thread_flag(TIF_NOTIFY_RESUME); + } + /* * While in TASK_TRACED, we were considered frozen enough. * Now that we woke up, it's crucial if we're supposed to be -- 1.5.5.1
[RFC v2 00/19] utrace for 3.0 kernel
Another attempt. This version tries to decouple utrace and ptrace. This way it is much simpler to follow the upstream changes, afaics. TODO: - The single-stepping updates in ptrace_resume() can race with utrace_reset()-user_disable_single_step(). This was fixed by 20/20, but I noticed that this patch is buggy right before sending. - Perhaps PTRACE_SYSEMU/TIF_SYSCALL_EMU logic was broken, I need to recheck. - Testing. Oleg.
[PATCH 06/19] wait_task_inactive: treat task-state and match_state as bitmasks
Change wait_task_inactive() to check state match_state instead of state == match_state. This should not make any difference, but this allows us to add more stopped bits which can be set or cleared independently. IOW. wait_task_inactive() assumes that if task-state != 0, it can only be changed to TASK_RUNNING. Currently this is true, and in this case state match_state continues to work. But, unlike the current check, it also works if task-state has other bits set while the caller is only interested in, say, __TASK_TRACED. Note: I think wait_task_inactive() should be cleanuped upstrean anyway, nowadays we have TASK_WAKING and task-state != 0 doesn't necessarily mean it is TASK_RUNNING. It also makes sense to exclude the !TASK_REPORT bits during the check. Finally, probably this patch makes sense anyway even without utrace. For example, a stopped _and_ traced thread could have task-state = TASK_STOPPED | TASK_TRACED, this can be useful. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/sched.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 3f2e502..ade7997 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2277,7 +2277,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state) * is actually now running somewhere else! */ while (task_running(rq, p)) { - if (match_state unlikely(p-state != match_state)) + if (match_state !likely(p-state match_state)) return 0; cpu_relax(); } -- 1.5.5.1
[PATCH 04/19] introduce wake_up_quiescent()
No functional changes. Add the new helper, wake_up_quiescent(task, state), which simply returns wake_up_state(task, state). Change all callers which do wake_up_state(STOPPED/TRACED) to use the new helper. ptrace_stop() is a bit special, it does __set_current_state(RUNNING) in the very unlikely case, change it as well. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/signal.h |2 ++ kernel/ptrace.c|2 +- kernel/signal.c| 12 ++-- kernel/utrace.c|2 +- 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/include/linux/signal.h b/include/linux/signal.h index a822300..2be3712 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -239,6 +239,8 @@ static inline int valid_signal(unsigned long sig) struct timespec; struct pt_regs; +extern int wake_up_quiescent(struct task_struct *p, unsigned int state); + extern int next_signal(struct sigpending *pending, sigset_t *mask); extern int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p, bool group); diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 9988b13..26ae214 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -566,7 +566,7 @@ static int ptrace_resume(struct task_struct *child, long request, child-exit_code = data; if (lock_task_sighand(child, flags)) { - wake_up_state(child, __TASK_TRACED); + wake_up_quiescent(child, __TASK_TRACED); unlock_task_sighand(child, flags); } diff --git a/kernel/signal.c b/kernel/signal.c index 2138cee..4fcf1c7 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -652,6 +652,14 @@ void signal_wake_up(struct task_struct *t, int resume) } /* + * wakes up the STOPPED/TRACED task, must be called with -siglock held. + */ +int wake_up_quiescent(struct task_struct *p, unsigned int state) +{ + return wake_up_state(p, state); +} + +/* * Remove signals in mask from the pending set and queue. * Returns 1 if any signals were found. * @@ -811,7 +819,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns) do { task_clear_group_stop_pending(t); rm_from_queue(SIG_KERNEL_STOP_MASK, t-pending); - wake_up_state(t, __TASK_STOPPED); + wake_up_quiescent(t, __TASK_STOPPED); } while_each_thread(p, t); /* @@ -1800,7 +1808,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) do_notify_parent_cldstop(current, false, why); spin_lock_irq(current-sighand-siglock); - __set_current_state(TASK_RUNNING); + wake_up_quiescent(current, __TASK_TRACED); spin_unlock_irq(current-sighand-siglock); if (clear_code) diff --git a/kernel/utrace.c b/kernel/utrace.c index 6e7fafb..d7c547c 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -648,7 +648,7 @@ static void utrace_wakeup(struct task_struct *target, struct utrace *utrace) { lockdep_assert_held(utrace-lock); spin_lock_irq(target-sighand-siglock); - wake_up_state(target, __TASK_TRACED); + wake_up_quiescent(target, __TASK_TRACED); spin_unlock_irq(target-sighand-siglock); } -- 1.5.5.1
[PATCH 09/19] tracehooks: kill tracehook_finish_jctl(), add tracehook_finish_stop()
tracehook_finish_jctl() is needed to avoid the races with SIGKILL which wakes up UTRACED task, and thus it should be called every time after the STOPPED/TRACED/UTRACED returns from schedule(), remember that TASK_UTRACED can be added while the task is STOPPED/UTRACED. - rename it to tracehook_finish_stop(),jctl no longer matches the reality. - change do_signal_state() to call this helper right after schedule(), otherwise this logic is broken by the upstream changes - now that utrace doesn't control TASK_TRACED bit, ptrace_stop() must call this helper too. Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/tracehook.h |6 +++--- kernel/signal.c |5 +++-- kernel/utrace.c |2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 7d7bdde..3c7b6b3 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -528,11 +528,11 @@ static inline int tracehook_get_signal(struct task_struct *task, } /** - * tracehook_finish_jctl - report about return from job control stop + * tracehook_finish_stop - report about return from STOPPED/TRACED * - * This is called by do_signal_stop() after wakeup. + * This is called by do_signal_stop() and ptrace_stop after wakeup. */ -static inline void tracehook_finish_jctl(void) +static inline void tracehook_finish_stop(void) { if (task_utrace_flags(current)) utrace_finish_stop(); diff --git a/kernel/signal.c b/kernel/signal.c index 4fcf1c7..a7979ad 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1816,6 +1816,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) read_unlock(tasklist_lock); } + tracehook_finish_stop(); /* * While in TASK_TRACED, we were considered frozen enough. * Now that we woke up, it's crucial if we're supposed to be @@ -1952,6 +1953,8 @@ retry: /* Now we don't run again until woken by SIGCONT or SIGKILL */ schedule(); + tracehook_finish_stop(); + spin_lock_irq(current-sighand-siglock); } else { ptrace_stop(current-group_stop GROUP_STOP_SIGMASK, @@ -1974,8 +1977,6 @@ retry: spin_unlock_irq(current-sighand-siglock); - tracehook_finish_jctl(); - return 1; } diff --git a/kernel/utrace.c b/kernel/utrace.c index be98607..daa96b9 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -741,7 +741,7 @@ static bool utrace_reset(struct task_struct *task, struct utrace *utrace) void utrace_finish_stop(void) { /* -* If we were task_is_traced() and then SIGKILL'ed, make +* If we were task_is_utraced() and then SIGKILL'ed, make * sure we do nothing until the tracer drops utrace-lock. */ if (unlikely(__fatal_signal_pending(current))) { -- 1.5.5.1
[PATCH 10/19] teach wake_up_quiescent() to do selective wake_up
Both utrace and ptrace can want the same thread to be quiescent, in this case its state is TASK_TRACED | TASK_UTRACED. And this also means that this task must not run unless both utrace and ptrace resume it. Change wake_up_quiescent(p, state) to do p-state = ~state and return false unless there is no more quiescent bits in task-state. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/signal.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index a7979ad..57552e6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -651,11 +651,26 @@ void signal_wake_up(struct task_struct *t, int resume) kick_process(t); } +#define STATE_QUIESCENT(__TASK_STOPPED | __TASK_TRACED | __TASK_UTRACED) /* * wakes up the STOPPED/TRACED task, must be called with -siglock held. */ int wake_up_quiescent(struct task_struct *p, unsigned int state) { + unsigned int quiescent = (p-state STATE_QUIESCENT); + + WARN_ON(state ~(STATE_QUIESCENT | TASK_INTERRUPTIBLE)); + + if (quiescent) { + state = ~TASK_INTERRUPTIBLE; + if ((quiescent ~state) != 0) { + p-state = ~state; + WARN_ON(!(p-state STATE_QUIESCENT)); + WARN_ON(!(p-state TASK_WAKEKILL)); + return 0; + } + } + return wake_up_state(p, state); } -- 1.5.5.1
[PATCH 11/19] ptrace_stop: do not assume the task is running after wake_up_quiescent()
If ptrace_stop() sets TASK_TRACED and then detects we should not stop, it can race with utrace_do_stop() which can see TASK_TRACED and add TASK_UTRACED. In this case we should stop for utrace needs. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/signal.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 57552e6..89e691d 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1829,6 +1829,14 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info) if (clear_code) current-exit_code = 0; read_unlock(tasklist_lock); + + /* +* It is possible that __TASK_UTRACED was added by utrace +* while we were __TASK_TRACED and before we take -siglock +* for wake_up_quiescent(), we need to block in this case. +* Otherwise this is unnecessary but absolutely harmless. +*/ + schedule(); } tracehook_finish_stop(); -- 1.5.5.1
[PATCH 17/19] teach ptrace_set_syscall_trace() to play well with utrace
1. ptrace_set_syscall_trace(true)-set_tsk_thread_flag(TIF_SYSCALL_TRACE) can race with utrace_control()-utrace_reset() path which can miss PT_SYSCALL_TRACE and clear TIF_SYSCALL_TRACE after it was already set. 2. ptrace_set_syscall_trace(false) clears TIF_SYSCALL_TRACE and this is not utrace-friendly, it can need this flag. Change ptrace_set_syscall_trace() to take task_utrace_lock(), this is enough to fix the 1st problem. Check task_utrace_flags() before clearing TIF_SYSCALL_TRACE, this fixes 2. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/ptrace.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 0825a01..209ea2d 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -23,6 +23,7 @@ #include linux/uaccess.h #include linux/regset.h #include linux/hw_breakpoint.h +#include linux/utrace.h static void ptrace_signal_wake_up(struct task_struct *p, int quiescent) { @@ -39,13 +40,16 @@ static void ptrace_signal_wake_up(struct task_struct *p, int quiescent) static void ptrace_set_syscall_trace(struct task_struct *p, bool on) { + task_utrace_lock(p); if (on) { p-ptrace |= PT_SYSCALL_TRACE; set_tsk_thread_flag(p, TIF_SYSCALL_TRACE); } else { p-ptrace = ~PT_SYSCALL_TRACE; - clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); + if (!(task_utrace_flags(p) UTRACE_EVENT_SYSCALL)) + clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); } + task_utrace_unlock(p); } /* -- 1.5.5.1
[PATCH 18/19] introduce PT_SINGLE_STEP and PT_SINGLE_BLOCK
Add the new internal ptrace flags, PT_SINGLE_STEP and PT_SINGLE_BLOCK. Like PT_SYSCALL_TRACE, this is needed to avoid the unnecessary ptrace reports when TIF_SINGLESTEP was set by another engine, not by ptrace. Also, we need these bits to coordinate the user_*_single_step() calls from ptrace and utrace. TODO: update the !x86 ptrace code which does user_disable_single_step(). Signed-off-by: Oleg Nesterov o...@redhat.com --- arch/x86/kernel/ptrace.c |1 + include/linux/ptrace.h|5 - include/linux/tracehook.h |7 +-- kernel/ptrace.c |3 +++ 4 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 807c2a2..7ab475f 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -807,6 +807,7 @@ static int ioperm_get(struct task_struct *target, */ void ptrace_disable(struct task_struct *child) { + child-ptrace = ~(PT_SINGLE_STEP | PT_SINGLE_BLOCK); user_disable_single_step(child); #ifdef TIF_SYSCALL_EMU clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 98d995d..65b1e4f 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -91,6 +91,8 @@ #define PT_TRACE_MASK 0x03f4 #define PT_SYSCALL_TRACE 0x0001 +#define PT_SINGLE_STEP 0x0002 +#define PT_SINGLE_BLOCK0x0004 /* single stepping state bits (used on ARM and PA-RISC) */ #define PT_SINGLESTEP_BIT 31 @@ -189,7 +191,8 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace) child-ptrace = 0; if (unlikely(ptrace) (current-ptrace PT_PTRACED)) { child-ptrace = current-ptrace; - child-ptrace = ~PT_SYSCALL_TRACE; + child-ptrace = + ~(PT_SYSCALL_TRACE | PT_SINGLE_STEP | PT_SINGLE_BLOCK); __ptrace_link(child, current-parent); } diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 6ce7a37..06edb52 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -121,6 +121,9 @@ static inline __must_check int tracehook_report_syscall_entry( return 0; } +#define ptrace_wants_step()\ + (current-ptrace (PT_SINGLE_STEP | PT_SINGLE_BLOCK)) + /** * tracehook_report_syscall_exit - task has just finished a system call * @regs: user register state of current task @@ -143,7 +146,7 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) if (task_utrace_flags(current) UTRACE_EVENT(SYSCALL_EXIT)) utrace_report_syscall_exit(regs); - if (step) { + if (step ptrace_wants_step()) { siginfo_t info; user_single_step_siginfo(current, regs, info); force_sig_info(SIGTRAP, info, current); @@ -436,7 +439,7 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info, { if (task_utrace_flags(current)) utrace_signal_handler(current, stepping); - if (stepping) + if (stepping ptrace_wants_step()) ptrace_notify(SIGTRAP); } diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 209ea2d..44908d0 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -575,13 +575,16 @@ static int ptrace_resume(struct task_struct *child, long request, clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); #endif + child-ptrace = ~(PT_SINGLE_STEP | PT_SINGLE_BLOCK); if (is_singleblock(request)) { if (unlikely(!arch_has_block_step())) return -EIO; + child-ptrace |= PT_SINGLE_BLOCK; user_enable_block_step(child); } else if (is_singlestep(request) || is_sysemu_singlestep(request)) { if (unlikely(!arch_has_single_step())) return -EIO; + child-ptrace |= PT_SINGLE_STEP; user_enable_single_step(child); } else { user_disable_single_step(child); -- 1.5.5.1
[PATCH 12/19] get_signal_to_deliver: restructure utrace/ptrace signal reporting
get_signal_to_deliver() assumes that either tracehook_get_signal() does nothing (without CONFIG_UTRACE), or it also reports the signal to ptrace engine implemented on top of utrace. Now that ptrace works independently this doesn't work. Change the code to call ptrace_signal() after tracehook_get_signal(). Move -ptrace check from ptrace_signal() to get_signal_to_deliver(), we do not want to change *return_ka if it was initialized by utrace and the task is not traced. IOW, roughly, ptrace acts as if it is the last attached engine, it takes the final decision about the signal. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/signal.c | 24 +++- 1 files changed, 11 insertions(+), 13 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 89e691d..d0e0c67 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2006,9 +2006,6 @@ retry: static int ptrace_signal(int signr, siginfo_t *info, struct pt_regs *regs, void *cookie) { - if (!task_ptrace(current)) - return signr; - ptrace_signal_deliver(regs, cookie); /* Let the debugger run. */ @@ -2110,6 +2107,7 @@ relock: signr = tracehook_get_signal(current, regs, info, return_ka); if (unlikely(signr 0)) goto relock; + if (unlikely(signr != 0)) ka = return_ka; else { @@ -2117,18 +2115,18 @@ relock: GROUP_STOP_PENDING) do_signal_stop(0)) goto relock; - signr = dequeue_signal(current, current-blocked, - info); + signr = dequeue_signal(current, current-blocked, info); - if (!signr) - break; /* will return 0 */ + ka = sighand-action[signr-1]; + } - if (signr != SIGKILL) { - signr = ptrace_signal(signr, info, - regs, cookie); - if (!signr) - continue; - } + if (!signr) + break; /* will return 0 */ + + if (signr != SIGKILL current-ptrace) { + signr = ptrace_signal(signr, info, regs, cookie); + if (!signr) + continue; ka = sighand-action[signr-1]; } -- 1.5.5.1
Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework
On 06/21, Roland McGrath wrote: OK, so here's my (hacky) idea: (1) Forget ptrace-via-utrace. Have utrace be a separate thing. This way the recent ptrace changes won't matter. This is what V2 does. (2) But, what about ptrace co-existing well with utrace? Make them mutually exclusive - a ptraced-process can't be utraced and a utraced-process can't be ptraced. We had this situation before for a while. It has the substantial downside that e.g. you cannot do any system-wide systemtap tracing without making all strace and gdb use impossible. Yes, we can't make them mutually exclusive, this can't work. So V2 tries to teach them play well together. Assuming the above is a semi-reasonable idea, it might be a lot less work than updating the ptrace-via-utrace code to handle the new ptrace changes. That's for Oleg to say. (Sorry, Oleg. ;-) Oh, I am not sure what is simpler ;) Oleg.
[PATCH 0/4] utrace for 3.0 kernel
Hello. Utrace patches for 3.0 kernel 0001-ptrace-temporary-revert-the-recent-ptrace-jobctl-re.patch 0002-tracehooks-preparation-for-ptrace-utrace.patch 0003-utrace-core.patch 0004-implement-utrace-ptrace.patch also available in the following git branch git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git utrace-3.0 Note: 1/4 is the temporary hack to keep utrace working, we need to completely rework ptrace/utrace interaction. Oleg.
[PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework
Temporary revert the following patches to keep utrace/utrace-ptrace working: 40ae717d1e78d982bd469b2013a4cbf4ec1ca434 ptrace: fix signal-wait_chldexit usage in task_clear_group_stop_trapping() 321fb561971ba0f10ce18c0f8a4b9fbfc7cef4b9 ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/ ee77f075921730b2b465880f9fd4367003bdab39 signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED 780006eac2fe7f4d2582da16a096e5a44c4767ff signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending() 244056f9dbbc6dc4126a301c745fa3dd67d8af3c job control: Don't send duplicate job control stop notification while ptraced ceb6bd67f9b9db765e1c29405f26e8460391badd job control: Notify the real parent of job control events regardless of ptrace 62bcf9d992ecc19ea4f37ff57ee0be3e843e job control: Job control stop notifications should always go to the real parent 75b95953a56969a990e6ce154b260be83818fe71 job control: Add @for_ptrace to do_notify_parent_cldstop() 45cb24a1da53beb70f09efccc0373f6a47a9efe0 job control: Allow access to job control events through ptracees 9b84cca2564b9a5b2d064fb44d2a55a5b44473a0 job control: Fix ptracer wait(2) hang and explain notask_error clearing 408a37de6c95832a4880a88a742f89f0cc554d06 job control: Don't set group_stop exit_code if re-entering job control stop 0e9f0a4abfd80f8adca624538d479d95159b16d8 ptrace: Always put ptracee into appropriate execution state e3bd058f62896ec7a2c605ed62a0a811e9147947 ptrace: Collapse ptrace_untrace() into __ptrace_unlink() d79fdd6d96f46fabb779d86332e3677c6f5c2a4f ptrace: Clean transitions between TASK_STOPPED and TRACED 5224fa3660ad3881d2f2ad726d22614117963f10 ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced 0ae8ce1c8c5b9007ce6bfc83ec2aa0dfce5bbed3 ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for group stop 39efa3ef3a376a4e53de2f82fc91182459d34200 signal: Use GROUP_STOP_PENDING to stop once for a single group stop e5c1902e9260a0075ea52cb5ef627a8d9aaede89 signal: Fix premature completion of group stop when interfered by ptrace fe1bc6a0954611b806f9e158eb0817cf8ba21660 ptrace: Add @why to ptrace_stop() edf2ed153bcae52de70db00a98b0e81a5668e563 ptrace: Kill tracehook_notify_jctl() This obviously reverts some user-visible fixes, but the fixed problems are very old and minor, they were never reported. In the long term we need another solution. Signed-off-by: Oleg Nesterov o...@redhat.com --- fs/exec.c |1 - include/linux/sched.h | 17 +-- include/linux/tracehook.h | 27 kernel/exit.c | 77 ++- kernel/ptrace.c | 116 +--- kernel/signal.c | 339 + 6 files changed, 148 insertions(+), 429 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 6075a1e..82b5379 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1769,7 +1769,6 @@ static int zap_process(struct task_struct *start, int exit_code) t = start; do { - task_clear_group_stop_pending(t); if (t != current t-mm) { sigaddset(t-pending.signal, SIGKILL); signal_wake_up(t, 1); diff --git a/include/linux/sched.h b/include/linux/sched.h index a837b20..6c42e24 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -663,8 +663,9 @@ struct signal_struct { * Bits in flags field of signal_struct. */ #define SIGNAL_STOP_STOPPED0x0001 /* job control stop in effect */ -#define SIGNAL_STOP_CONTINUED 0x0002 /* SIGCONT since WCONTINUED reap */ -#define SIGNAL_GROUP_EXIT 0x0004 /* group exit in progress */ +#define SIGNAL_STOP_DEQUEUED 0x0002 /* stop signal dequeued */ +#define SIGNAL_STOP_CONTINUED 0x0004 /* SIGCONT since WCONTINUED reap */ +#define SIGNAL_GROUP_EXIT 0x0008 /* group exit in progress */ /* * Pending notifications to parent. */ @@ -1283,7 +1284,6 @@ struct task_struct { int exit_state; int exit_code, exit_signal; int pdeath_signal; /* The signal sent when the parent dies */ - unsigned int group_stop;/* GROUP_STOP_*, siglock protected */ /* ??? */ unsigned int personality; unsigned did_exec:1; @@ -1803,17 +1803,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t * #define tsk_used_math(p) ((p)-flags PF_USED_MATH) #define used_math() tsk_used_math(current) -/* - * task-group_stop flags - */ -#define GROUP_STOP_SIGMASK 0x/* signr of the last group stop */ -#define GROUP_STOP_PENDING (1 16) /* task should
[PATCH 4/4] implement utrace-ptrace
The patch adds the new file, kernel/ptrace-utrace.c, which contains the new implementation of ptrace over utrace. It's supposed to be an invisible implementation change, nothing should change to userland when CONFIG_UTRACE is enabled. Signed-off-by: Roland McGrath rol...@redhat.com Signed-off-by: Oleg Nesterov o...@redhat.com --- include/linux/ptrace.h |3 + kernel/Makefile|1 + kernel/ptrace-utrace.c | 1173 kernel/ptrace.c| 638 +- kernel/utrace.c| 16 + 5 files changed, 1513 insertions(+), 318 deletions(-) create mode 100644 kernel/ptrace-utrace.c diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 446ed8f..5b0562e 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -101,6 +101,9 @@ extern bool __ptrace_detach(struct task_struct *tracer, struct task_struct *tracee); +extern int ptrace_traceme(void); +extern int ptrace_attach(struct task_struct *tsk); +extern void ptrace_notify_stop(struct task_struct *tracee); extern long arch_ptrace(struct task_struct *child, long request, unsigned long addr, unsigned long data); diff --git a/kernel/Makefile b/kernel/Makefile index 4a22e81..5c280dc 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -69,6 +69,7 @@ obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o obj-$(CONFIG_SMP) += stop_machine.o obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o obj-$(CONFIG_UTRACE) += utrace.o +obj-$(CONFIG_UTRACE) += ptrace-utrace.o obj-$(CONFIG_AUDIT) += audit.o auditfilter.o obj-$(CONFIG_AUDITSYSCALL) += auditsc.o obj-$(CONFIG_AUDIT_WATCH) += audit_watch.o diff --git a/kernel/ptrace-utrace.c b/kernel/ptrace-utrace.c new file mode 100644 index 000..7a9b396 --- /dev/null +++ b/kernel/ptrace-utrace.c @@ -0,0 +1,1173 @@ +/* + * linux/kernel/ptrace.c + * + * (C) Copyright 1999 Linus Torvalds + * + * Common interfaces for ptrace() which we do not want + * to continually duplicate across every architecture. + */ + +#include linux/capability.h +#include linux/module.h +#include linux/sched.h +#include linux/errno.h +#include linux/mm.h +#include linux/highmem.h +#include linux/pagemap.h +#include linux/ptrace.h +#include linux/utrace.h +#include linux/security.h +#include linux/signal.h +#include linux/audit.h +#include linux/pid_namespace.h +#include linux/syscalls.h +#include linux/uaccess.h + +/* + * unptrace a task: move it back to its original parent and + * remove it from the ptrace list. + * + * Must be called with the tasklist lock write-held. + */ +void __ptrace_unlink(struct task_struct *child) +{ + BUG_ON(!child-ptrace); + + child-ptrace = 0; + child-parent = child-real_parent; + list_del_init(child-ptrace_entry); +} + +struct ptrace_context { + int options; + + int signr; + siginfo_t *siginfo; + + int stop_code; + unsigned long eventmsg; + + enum utrace_resume_action resume; +}; + +#define PT_UTRACED 0x1000 + +#define PTRACE_O_SYSEMU0x100 +#define PTRACE_O_DETACHED 0x200 + +#define PTRACE_EVENT_SYSCALL (1 16) +#define PTRACE_EVENT_SIGTRAP (2 16) +#define PTRACE_EVENT_SIGNAL(3 16) +/* events visible to user-space */ +#define PTRACE_EVENT_MASK 0x + +static inline bool ptrace_event_pending(struct ptrace_context *ctx) +{ + return ctx-stop_code != 0; +} + +static inline int get_stop_event(struct ptrace_context *ctx) +{ + return ctx-stop_code 8; +} + +static inline void set_stop_code(struct ptrace_context *ctx, int event) +{ + ctx-stop_code = (event 8) | SIGTRAP; +} + +static inline struct ptrace_context * +ptrace_context(struct utrace_engine *engine) +{ + return engine-data; +} + +static const struct utrace_engine_ops ptrace_utrace_ops; /* forward decl */ + +static struct utrace_engine *ptrace_lookup_engine(struct task_struct *tracee) +{ + return utrace_attach_task(tracee, UTRACE_ATTACH_MATCH_OPS, + ptrace_utrace_ops, NULL); +} + +static int utrace_barrier_uninterruptible(struct task_struct *target, + struct utrace_engine *engine) +{ + for (;;) { + int err = utrace_barrier(target, engine); + + if (err != -ERESTARTSYS) + return err; + + schedule_timeout_uninterruptible(1); + } +} + +static struct utrace_engine * +ptrace_reuse_engine(struct task_struct *tracee) +{ + struct utrace_engine *engine; + struct ptrace_context *ctx; + int err = -EPERM; + + engine = ptrace_lookup_engine(tracee); + if (IS_ERR(engine)) + return engine; + + ctx
Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework
On 06/20, Kyle McMartin wrote: On Mon, Jun 20, 2011 at 06:07:44PM +0200, Oleg Nesterov wrote: Temporary revert the following patches to keep utrace/utrace-ptrace working: huge list of patches here This obviously reverts some user-visible fixes, but the fixed problems are very old and minor, they were never reported. In the long term we need another solution. Dude, that's just not acceptable, that's way too much offset to deal with against upstream, Yes, this reverts 20 patches. But they only touch the ptrace/stop paths, there won't be more changes in this area until 3.1. especially since it's looking like uprobes will get merged in 3.1... Probably yes. In any case, this series should be dropped when fedora switches to 3.1. I'll try to do something more clever for 3.1 if utrace is still needed. Until then we need something for systemtap... Oleg.
[PATCH 0/5] ptrace-utrace: fix exit_ptrace() vs ptrace_report_signal() races
In short: exit_ptrace()-ptrace_detach_task() is very wrong when it tries to detach the !stopped tracee, we can not trust get_stop_event() in this case. This means that in the case like ptrace(PTRACE_CONT, ..., SIGXXX); exit(); // --- calls ptrace_detach_task() the tracee can miss SIGXXX if ptrace_detach_task() does utrace_control(UTRACE_DETACH) before the tracee calls -report_signal(). 5/5 is the actual fix. 1-4 are preparations to simplify the review and document the changes. Oleg.
[PATCH 3/5] ptrace-utrace: don't assume resume != UTRACE_RESUME means stepping
No functional changes. Currently ptrace_report_syscall_exit() and ptrace_report_signal(UTRACE_SIGNAL_HANDLER) can see either UTRACE_RESUME or UTRACE_BLOCKSTEP/UTRACE_SINGLESTEP, but we are going to change this. Change the code to not assume that resume != UTRACE_RESUME means stepping. Add the trivial helper, is_step_resume(), which does the check. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/ptrace-utrace.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) --- kstub/kernel/ptrace-utrace.c~3_resume_step_helper 2010-09-20 03:53:31.0 +0200 +++ kstub/kernel/ptrace-utrace.c2010-09-20 03:53:31.0 +0200 @@ -418,6 +418,11 @@ static u32 ptrace_report_syscall_entry(u return UTRACE_SYSCALL_RUN | UTRACE_STOP; } +static inline bool is_step_resume(enum utrace_resume_action resume) +{ + return resume == UTRACE_BLOCKSTEP || resume == UTRACE_SINGLESTEP; +} + static u32 ptrace_report_syscall_exit(u32 action, struct utrace_engine *engine, struct pt_regs *regs) { @@ -426,10 +431,7 @@ static u32 ptrace_report_syscall_exit(u3 if (ptrace_event_pending(ctx)) return UTRACE_STOP; - if (ctx-resume != UTRACE_RESUME) { - WARN_ON(ctx-resume != UTRACE_BLOCKSTEP - ctx-resume != UTRACE_SINGLESTEP); - + if (is_step_resume(ctx-resume)) { ctx-signr = SIGTRAP; return UTRACE_INTERRUPT; } @@ -517,10 +519,7 @@ static u32 ptrace_report_signal(u32 acti if (WARN_ON(ctx-siginfo)) ctx-siginfo = NULL; - if (resume != UTRACE_RESUME) { - WARN_ON(resume != UTRACE_BLOCKSTEP - resume != UTRACE_SINGLESTEP); - + if (is_step_resume(resume)) { set_stop_code(ctx, PTRACE_EVENT_SIGTRAP); return UTRACE_STOP | UTRACE_SIGNAL_IGN; }
[PATCH 5/5] ptrace-utrace: fix exit_ptrace() vs ptrace_report_signal() races
Finally, the actual fix. ptrace_detach_task(sig = -1) is very buggy. Somehow I completely forgot that implicit detach can race with the running tracee. Depending on how exactly it races with ptrace_report_signal() we can have the following problems: 1) If the tracer exits right after ptrace(PTRACE_CONT, SIGXXX) the tracee can miss this signal. 2) the tracee can report a signal and stop after it was already detached. Change ptrace_detach_task(sig = -1) to set ctx-resume = UTRACE_DETACH before anything else, change ptrace_report_signal() to check ctx-resume before reporting a signal. This fixes the 2nd problem. To fix the 1st problem, change !explicit case to check -siginfo != NULL instead of relying on get_stop_event() which can't work with the running tracee. Reported-by: Glen Johnson bugpr...@us.ibm.com Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/ptrace-utrace.c | 41 ++--- 1 file changed, 34 insertions(+), 7 deletions(-) --- kstub/kernel/ptrace-utrace.c~5_implicit_detach_races2010-09-20 03:53:32.0 +0200 +++ kstub/kernel/ptrace-utrace.c2010-09-20 03:53:32.0 +0200 @@ -262,7 +262,7 @@ static void ptrace_detach_task(struct ta * If true, the caller is PTRACE_DETACH, otherwise * the tracer detaches implicitly during exit. */ - bool voluntary = (sig = 0); + bool explicit = (sig = 0); struct utrace_engine *engine = ptrace_lookup_engine(tracee); enum utrace_resume_action action = UTRACE_DETACH; struct ptrace_context *ctx; @@ -272,24 +272,46 @@ static void ptrace_detach_task(struct ta ctx = ptrace_context(engine); - if (sig) { + if (!explicit) { + int err; + /* +* We are going to detach, the tracee can be running. +* Ensure ptrace_report_signal() won't report a signal. +*/ + ctx-resume = UTRACE_DETACH; + err = utrace_barrier_uninterruptible(tracee, engine); + + if (!err ctx-siginfo) { + /* +* The tracee has already reported a signal +* before utrace_barrier(). +* +* Resume it like we do in PTRACE_EVENT_SIGNAL +* case below. The difference is that we can race +* with ptrace_report_signal() if the tracee is +* running but this doesn't matter. In any case +* UTRACE_SIGNAL_REPORT must be pending and it +* can return nothing but UTRACE_DETACH. +*/ + action = UTRACE_RESUME; + } + + } else if (sig) { switch (get_stop_event(ctx)) { case PTRACE_EVENT_SYSCALL: - if (voluntary) - send_sig_info(sig, SEND_SIG_PRIV, tracee); + send_sig_info(sig, SEND_SIG_PRIV, tracee); break; case PTRACE_EVENT_SIGNAL: - if (voluntary) - ctx-signr = sig; + ctx-signr = sig; ctx-resume = UTRACE_DETACH; action = UTRACE_RESUME; break; } } - ptrace_wake_up(tracee, engine, action, voluntary); + ptrace_wake_up(tracee, engine, action, explicit); if (action != UTRACE_DETACH) ctx-options = PTRACE_O_DETACHED; @@ -553,6 +575,11 @@ static u32 ptrace_report_signal(u32 acti } WARN_ON(ctx-siginfo); + + /* Raced with the exiting tracer ? */ + if (resume == UTRACE_DETACH) + return action; + ctx-siginfo = info; /* * ctx-siginfo points to the caller's stack.
Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting
On 10/22, Oleg Nesterov wrote: On 10/19, Roland McGrath wrote: Yes. But, with this patch, in this case utrace_barrier()-get_utrace_lock(attached = false) returns success and then we check utrace-reporting == engine. I guess that's OK if -reporting == engine is always set when any kind of callback might be in progress. I think utrace_maybe_reap() is the only exception, (Hmm. Probably utrace-reap = T case needs more attention, utrace_maybe_reap() doesn't set utrace-reporting). That would be a problem. Yes, but the current code has the same problem? Hmm, I need to recheck this all once again tomorrow. Suddenly I started to suspect that even -ops == NULL case is not right. And the more I read this code now, the more I confused. Yes. And get_utrace_lock() must not succeed if utrace-reap == T c93fecc9 makes things worse. With this patch utrace_barrier() can return -ESRCH while -report_death() or report_reap() is in progress. But even before that comment, I do not think that !engine-ops check can prevent the race with -report_reap(), we need at least mb() in utrace_maybe_reap() before engine-ops = NULL. But in any case. If engine-ops == utrace_detached_ops and utrace-reporting != engine, I do not see why utrace_barrier() can't return success, even if utrace-reap is set. No! it must not return success! I still think it should not spin if -reporting != engine, but it should not return zero. Somehow I forgot that utrace_barrier() != reporting != engine. This means that my patch is very wrong. Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 10/22, Roland McGrath wrote: Hmm. If task is not stopped then it is current (except utrace_control(DETACH) can play with the dying task). Right, asynchronous detach was the problematic case I was concerned with. but asynchronous detach doesn't do utrace_reset(), unless the tracee is stopped or exiting (-exit_state != 0). But the whole problem we have is that we aren't getting to that path when we've done a detach, right? Confused. We already discussed this before, OK, I'll do some testing and resend right now. In UTRACE_DETACH case reset can be true but the tracee is running. But I don't think it makes sense to check target-exit_state == 0, correct? I think it's OK if it's running with -exit_state set (or even with just PF_EXITING set), i.e. already in kernel mode and never going back to user mode. (Then it's essentially equivalent to calling user_*_step while racing with a SIGKILL death, which has to be OK.) Any other kind of running would not be OK. Probably we misunderstood each other. In any case I agree, that patch was not good. Oleg.
Re: utrace unwanted traps
On 10/19, Roland McGrath wrote: But that's how it used to be, and then we changed it. So it merits digging up our old discussion threads that led to doing that and see what all we had in mind then. I tried to grep my mail archives and google, but failed to find anything related. Based on the date of the commit you just mentioned, I browsed in: https://www.redhat.com/archives/utrace-devel/2009-October/thread.html and saw: https://www.redhat.com/archives/utrace-devel/2009-October/msg00287.html which is the day before that commit. I didn't read through all that to refamiliarize myself with all the details. Yes, I saw this thread, but didn't read it carefully. When I read it again, I started to believe that yes, it can explain the current -resume logic. However, I think this is false alarm. With a lot of efforts, I seem to fully understand our discussion. In short: I believe we never discussed why utrace-resume should be *STEP, especially if task is stopped or it is going to stop. Although I recall I thought this is obviously nice when I tried to review utrace: sticky resume action. As for this thread. Yes, I tried to suggest the (ugly) utrace-set_singlestep which should be checked after wakeup. But my motivation was quite different, what I was trying to preserve was TIF_SINGLESTEP, not resume_action. This is why the top email says I no longer think utrace_control() should just turn SINGLESTEP into REPORT silently. Please recall that, by that time 1. utrace_control(SINGLESTEP) called user_enable_single_step() 2. To handle the stepping over syscall correctly, both ptrace and ptrace-utrace relied on syscall_trace_leave() paths which checked TIF_SINGLESTEP to send SIGTRAP if needed. 1 was wrong, user_enable_single_step() was called under utrace-lock. It was decided that the tracee itself should call this helper before it returns to user-mode. To implement this, we could change utrace_control(SINGLESTEP) to set TIF_NOTIFY_RESUME, so that -report_quiesce() could return UTRACE_SINGLESTEP. However, this breaks 2, the stepping over syscall. By the time -report_quiesce() is called we already passed syscall_trace_leave() without TIF_SINGLESTEP, and -report_quiesce() can't just return UTRACE_SINGLESTEP but otoh it can't know that we should synthesize a trap before return to user-mode. So. I do not think there is any reason to ever keep UTRACE_*STEP in utrace-resume. Except, if utrace_control(STEP) sets -resume = STEP before wakeup, we avoid the unnecessary reporting loop in utrace_resume. But at the same time, this leads to the problems we are discussing. I even tested the kernel with the patch below, everything _seems_ to work (not that I think this can proves something). What do you think? Oleg. --- kstub/kernel/utrace.c~XXX_RESUME2010-10-20 18:18:40.0 +0200 +++ kstub/kernel/utrace.c 2010-10-21 18:48:52.0 +0200 @@ -768,11 +768,13 @@ relock: /* * Ensure a reporting pass when we're resumed. */ - utrace-resume = action; if (action == UTRACE_INTERRUPT) set_thread_flag(TIF_SIGPENDING); - else + else { + action = UTRACE_REPORT; set_thread_flag(TIF_NOTIFY_RESUME); + } + utrace-resume = action; } /* @@ -1195,6 +1197,7 @@ int utrace_control(struct task_struct *t * In that case, utrace_get_signal() will be reporting soon. */ clear_engine_wants_stop(engine); + action = UTRACE_REPORT; if (action utrace-resume) { utrace-resume = action; set_notify_resume(target); @@ -1381,11 +1384,13 @@ static void finish_report(struct task_st if (resume utrace-resume) { spin_lock(utrace-lock); - utrace-resume = resume; if (resume == UTRACE_INTERRUPT) set_tsk_thread_flag(task, TIF_SIGPENDING); - else + else { + resume = UTRACE_REPORT; set_tsk_thread_flag(task, TIF_NOTIFY_RESUME); + } + utrace-resume = resume; spin_unlock(utrace-lock); }
Re: utrace unwanted traps
On 10/14, Roland McGrath wrote: In short: I no longer understand why utrace-resume can be UTRACE_*STEP when the tracee is stopped, and in fact I think this is not right. I think we didn't have that originally. Yes. The rationale that I remember is the idea that you should be able to implement a debugger interface feature like PTRACE_SINGLESTEP without noticeably more overhead than it had in pre-utrace ptrace. That is, in vanilla ptrace, the tracer thread calls user_enable_single_step itself and then just lets the tracee resume directly to user mode. We wanted to avoid always having an additional trip through tracehook_notify_resume and the utrace reporting loop and then back through the return-to-user assembly path a second time, between the tracee waking up and actually going to user mode. Probably, I am not sure. I can't recall the previous discussions. IIRC, the main problem with user_enable_single_step() was: we wanted to move the callsite from tracer to tracee by many reasons. I can't recall if avoid another reporting loop was the target, and additional trip through tracehook_notify_resume() is not avoidable anyway. Afaics, this was introduced by 26fefca955cc7c3c83014be2e10b773209d32eea utrace: sticky resume action. Before that patch the stopped tracee could only remember the pending UTRACE_REPORT or UTRACE_INTERRUPT. Right. Said another way, any time an engine that previously used UTRACE_*STEP now uses UTRACE_RESUME, we must degrade utrace-resume back to UTRACE_REPORT at most. Since we don't keep track of which engine it was that put UTRACE_*STEP into utrace-resume before, we'd have to just always degrade it any time anybody uses UTRACE_RESUME. Agreed. That has almost the whole effect of punting utrace-resume back to just a utrace-report flag. (see below) But that's how it used to be, and then we changed it. So it merits digging up our old discussion threads that led to doing that and see what all we had in mind then. I tried to grep my mail archives and google, but failed to find anything related. _Perhaps_ this was not intended actually. Before the commit above, another problem was fixed by 8ad60bbd4c665a11be179a0bff41483cca3b3560 utrace_stop: preserve report/interrupt requests across stop/resume, until this one it was possible to lose UTRACE_INTERRUPT request if it races with UTRACE_STOP from another engine. So, perhaps cleanup was the main reason for 26fefca9. See also http://www.mail-archive.com/utrace-devel@redhat.com/msg01647.html Oleg.
Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting
On 10/12, Roland McGrath wrote: If engine is detached (has utrace_detached_ops), utrace_barrier(engine) spins until engine-ops becomes NULL. This is just wrong. Yes, this happens because get_utrace_lock returns -ERESTARTSYS and utrace_barrier checks for this and loops. I agree these long-spin scenarios would be wrong. The reason it tries to wait for fully detached state is that after utrace_control(task,engine,UTRACE_DETACH), @task could still be in the middle of a callback for @engine. Yes. But, with this patch, in this case utrace_barrier()-get_utrace_lock(attached = false) returns success and then we check utrace-reporting == engine. This patch only makes a difference if ops == utrace_detached_ops, Before this patch spin until -ops goes away After this patch spin until -ops goes away OR -reporting != engine (Hmm. Probably utrace-reap = T case needs more attention, utrace_maybe_reap() doesn't set utrace-reporting). Also, it is not clear why utrace_barrier() needs utrace-lock, except to ensure it is safe to dereference target/utrace. Well, wouldn't that be reason enough? The comment in utrace_barrier talks about needing the lock. This corresponds to the comment in the UTRACE_DETACH case of finish_callback_report. Yes agreed. Honestly, I can't even recall what I meant. Note: we should also reconsider() utrace_barrier()-signal_pending() check. IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait (even moreso since it's really a spin). If a buggy callback gets stuck blocking or spinning and fails to return promptly, then you wedge any debugger thread trying to synchronize with it via utrace_barrier. If you can't even interrupt that debugger thread, then there will really be no chance to recover from the deadlock. Completely agreed. But, unfortunately, this signal_pending() check assumes the caller can always handle this ERESTARTSYS. But this is not true, one example is the exiting debugger doing exit_ptrace(). Oleg.
Re: gdbstub initial code, v7
On 10/13, Roland McGrath wrote: On 09/10, Roland McGrath wrote: ugdb sets please stop flag and does utrace_control(INTERRUPT). However, in unlikely case the tracee can stop before -report_signal() reporting I don't think this is the right thing to do. When the intent is explicitly to interrupt, there is no reason to stop before the interruption is complete, i.e. report_signal. This means that ugdb_report_quiesce() should never return UTRACE_STOP, and that is all. I'm not sure about this. Ignoring the problems below, why? But what about multitracing? Suppose that (gdb) interrupt happens just before, say, do_report_syscall_entry() and another engine wants to stop. If ugdb_report_quiesce() doesn't return UTRACE_STOP, then gdb will wait until another debugger resumes the tracee. Yes, I do think that's a problem. We want gdb to report back promptly. One possibility is to have report_quiesce notice its argument is UTRACE_EVENT(SYSCALL_ENTRY) and roll back to before the syscall. That is, it enables SYSCALL_ENTRY and SYSCALL_EXIT reporting, then its report_syscall_entry uses UTRACE_SIGNAL_ABORT, report_syscall_exit does syscall_set_return_value(-ERESTARTNOHAND, 0) and then returns UTRACE_INTERRUPT. Now, we'll reenter a UTRACE_SIGNAL_REPORT callback before the system call, and we can stop there without being in any sticky situation. Well, but this doesn't look friendly to other engines... And at first glance this looks a bit too hairy. And, this doesn't cover another case: gdb asks to stop the tracee when it is already stopped by another engine and sleeps in utrace_resume() path. So, I think ugdb should be changed so that signal SIG always works (without reporing this signal) even when the stopped tracee doesn't have the signal context. As for $_siginfo (qXfer:siginfo:read::), I do not know what ugdb should do. Probably it can just report the all-zeroes siginfo or report si_signo = SIGSTOP. Oleg.
[PATCH] utrace: utrace_resume()-start_callback() must clear -reporting
utrace_resume()-start_callback() can return without clearing -reporting, this is very wrong. The bug was introduced by me in 47c593ee avoid the unnecessary utrace_resume()-utrace_reset() commit. Revert this patch and change start_callback() to check event right after we call -report_quiesce(). If it is zero we can just clear -spurious and return without playing with -reporting and -flags. No need to worry about ENGINE_STOP, finish_callback() has already updated engine-flags and report-action correctly. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) --- kstub/kernel/utrace.c~10_47c593ee_fix 2010-10-11 12:48:51.0 +0200 +++ kstub/kernel/utrace.c 2010-10-12 21:19:33.0 +0200 @@ -1528,6 +1528,12 @@ static const struct utrace_engine_ops *s engine, event))) return NULL; + if (!event) { + /* We only got here to report QUIESCE */ + report-spurious = false; + return NULL; + } + /* * finish_callback() reset utrace-reporting after the * quiesce callback. Now we set it again (as above) @@ -1543,7 +1549,7 @@ static const struct utrace_engine_ops *s if (want ENGINE_STOP) report-action = UTRACE_STOP; - if (want (event ?: UTRACE_EVENT(QUIESCE))) { + if (want event) { report-spurious = false; return ops; }
Re: [PATCH?] avoid the unnecessary utrace_resume()-utrace_reset()
On 08/18, Oleg Nesterov wrote: utrace_resume(UTRACE_REPORT) always calls utrace_reset() because start_callback() obviously can't clear report-spurious when event == 0. Change start_callback() to correctly clear -spurious in this case. Correctly? I am stupid, and this patch is wrong (47c593ee in your tree). --- kstub/kernel/utrace.c~10_utrace_resume_and_spurious 2010-08-18 19:00:50.0 +0200 +++ kstub/kernel/utrace.c 2010-08-18 19:41:05.0 +0200 @@ -1540,7 +1540,7 @@ static const struct utrace_engine_ops *s if (want ENGINE_STOP) report-action = UTRACE_STOP; - if (want event) { + if (want (event ?: UTRACE_EVENT(QUIESCE))) { report-spurious = false; return ops; } with this change utrace_resume()-start_callback() returns with utrace-reporting != NULL !!! This obviously breaks utrace_barrier(), it can hang forever. I noticed this by accident, when I was trying to understand the problems with vCont changes. I'll send the fix tomorrow. Damn, the fix is trivial but I'd like to avoid another ugly check in start_callback(), and I don't think utrace_resume() should clear -reporting. Oleg.
Re: gdbstub initial code, v13
On 10/08, Oleg Nesterov wrote: - full vCont support. Well, this was easy, except even after 3 hours of debugging I can't understand why this change breaks the stepping over pthread_create :/ Oh, it doesn't break. I did a lot of mistakes when I was trying to understand what happens. In short, GDBCAT was buggy, it used the wrong socket for TCP_NODELAY. Damn, it took me almost 2 days. Oleg.
Re: BUG: gdb notification packets (Was: gdbstub initial code, v12)
On 10/05, Pedro Alves wrote: (reordered) On Tuesday 05 October 2010 18:27:29, Oleg Nesterov wrote: So, I strongly believe gdb is buggy and should be fixed. Fix your stub to implement vCont;s/c(/S/C). First of all, I confirm that when I added the (incomplete right now) support for vCont;s the problem goes away, gdb never forgets to send the necessary vStopped to consume all stop-reply packets. Thanks Pedro. The more or less typical transcript is: [... snip ...] = s This is already wrong. The stub must support @samp{vCont} if it reports support for multiprocess extensions (@pxref{multiprocess extensions}). Cough. Previously I was told here (on arc...@sourceware.org) that Hc + s/c is enough and I shouldn't worry about vCont;s/c ;) Currently ugdb only supports vCont;t because otherwise there is obviously no way to stop all threads. The stub must also support vCont for non-stop, though I'll give you that it doesn't appear to be mentioned in the manual, Yes, the manual doesn't explain this. Quite contrary, the decsription of 'vCont?' definitely looks as if the stub is not obliged to implement all vCont commands. And, if the stub must support vCont for non-stop, then why gdb doesn't complain after 'vCont?' but falls back to '$s' ? Look at remote.c:remote_resume, and you'll see that gdb does not wait for the OK after 'c'/'s'/'S'/'C' in non-stop mode. Then gdbserver should be fixed? It does send OK in response to '$s', that is why ugdb does this. And what should be replied back to '$s', nothing? Very strange. But seems to work... And yes, this explains the hang, gdb thinks that this OK is connected to vStopped. Again, the documentation is very confusing. Looking at remote_resume()-remote_vcont_resume()-getpkt() I think that vCont;s needs OK. Looking at D.3 Stop Reply Packets in gdb.info I do not see any difference between `s' and `vCont'. So. I still belive that something is wrong with gdb/gdbserever but I don't care. In any case ugdb should fully support vCont, hopefully I'll finish this tomorrow. Could you answer a couple of questions? 1. Say, $vCont;s or $vCont;s:p-1.-1 I assume, this should ignore the running threads, correct? IOW, iiuc this 's' applies to all threads which we already reported as stopped. 2. Say, $vCont;c:pPID.TID;s:p-1.-1 Can I assume that gdb can never send this request as $vCont;s:p-1.-1;c:pPID.TID ? If yes, then the implementation will be much simpler, I can add something like gencounters to ugdb_thread/process. Otherwise this needs more complications to figure out what should be done with each tracee. Thanks, Oleg.
utrace unwanted traps
On 09/23, Roland McGrath wrote: OK, finish_callback_report() and utrace_control(DETACH) can set TIF_NOTIFY_RESUME. Right. Those utrace_resume has the report.action==UTRACE_RESUME bail-out case. So either that would change or detach would also do UTRACE_REPORT. But what if there are no more attached engines? Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise utrace_resume/utrace_get_signal won't be called. Right. Or else tracehook_notify_resume could call utrace_resume unconditionally, but I'm not at all sure that is not worse. The original theory was that it should always be OK to have some utrace_flags bits stay set when they are stale, because any kind of reporting pass that got enabled would hit the report-spurious case and clean the state up synchronously when it's safe. I tried to make the patch which addresses this issue, but surprisingly I failed. Because I think there are more (minor) problems here. When it comes to multitracing, we can have the unwanted trap even without detaching. In short: I no longer understand why utrace-resume can be UTRACE_*STEP when the tracee is stopped, and in fact I think this is not right. For example. Consider two engines, E1 and E2. To simplify the discussion suppose that both engines have engine-data-resume and -report_quiesce() just returns -resume. So, if the debugger wants, say, single-step, it does engine-data-resume = UTRACE_SINGLESTEP; utrace_control(UTRACE_SINGLESTEP); Actually, any engine should do something like this. Now suppose that E1 wants UTRACE_SINGLESTEP, E2 asks for UTRACE_STOP. In this case utrace_resume() results in utrace_stop() which sets utrace-resume = UTRACE_SINGLESTEP before sleeping. First of all - why? Yes, the theory is that we have another reporting loop after wake_up, but utrace-resume = UTRACE_REPORT is enough, E1 should always acquire UTRACE_SINGLESTEP if needed or it is buggy. But more importantly, this doesn't look right or I missed something. Suppose that, before E2 resumes the tracee, E1 does utrace_control(STOP) which immediately succeeds and allows E1 to play with the stopped tracee. Again, to simplify the discussion, suppose that E2 does UTRACE_RESUME and nothing more after that. Now. E1 wants to resume the tracee too and it does engine-data-resume = UTRACE_RESUME; utrace_control(UTRACE_RESUME); utrace_control(RESUME) correctly clears TIF_SINGLESTEP, but it doesn't change utrace-resume. This means utrace_resume() doesn't do the reporting loop, it just calls user_enable_single_step() and returns to user-mode. So. Could you explain why utrace_stop() should ever keep UTRACE_STEP in utrace-resume? finish_report() does the same but this looks fine. Also. I am not sure utrace_control(UTRACE_STEP) should always set utrace-resume = UTRACE_STEP. If the tracee is stopped but there is another ENGINE_STOP engine (iow, the tracee won't be resumed right now) we have the same problem. Off-topic, #define UTRACE_RESUME_BITS (ilog2(UTRACE_RESUME_MAX) + 1) why UTRACE_RESUME_MAX? it should be (ilog2(UTRACE_RESUME_MAX - 1) + 1), no? Oleg.
Re: gdbstub initial code, v11
On 09/22, Frank Ch. Eigler wrote: oleg wrote: [...] Honestly, I don't really know how do the right thing here. Anyway, most probably this code will be changed. Like ptrace, ugdb uses -report_syscall_exit() to synthesize a trap. Unlike ptrace, ugdb_report_signal() doesn't send SIGTRAP to itself but reports SIGTRAP using siginfo_t we have. In any case, whatever we do, multiple tracers can confuse each other. (It seems to me that a pure gdb report, without a synthetic self-injected SIGTRAP, should be fine.) What do you mean? Next: fully implement g/G/p/P, currently I am a bit confused... But what about features? [...] You could dig out the old fishing plan. One demonstrated improvement was from simulating (software) watchpoints within the gdb stub, instead of having gdb fall back to issing countless single-steps with memory-fetch inquiries in between. When I do 'watch', gdb sends '$Z2'. I am a bit confused, iirc it was decided I shouldn't play with Z packets now. But I won't argue. Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/21, Roland McGrath wrote: OK, so what should we do for now? If we can't come to a straightforward answer for the general case fairly quickly, then we can do the simple change to start with. I think we should start with changing utrace_control(DETACH) anyway, then try to improve. I'll ressend the one-liner I already showed. Without the multitracing utrace_resume()-user_disable_single_step() fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP. No, no, we don't want that. We don't need more state. And, remember, we really don't need to microoptimize cases when single-step was used recently--the cost of taking one more single-step trap and percolating through the signal paths was going to be pretty large anyway. OK, It's better to have a spurious report (preferably just spurious TIF_NOTIFY_RESUME with no actual callbacks) following any detach, or whatever it takes to ensure user_disable_single_step() always runs if user_enable_*_step did before and there is no engine around to see a report_quiesce callback pass. If there is a report_quiesce or report_signal callback pass where nobody returns UTRACE_*STEP, then we should never leave stepping enabled when we return to user mode. Hmm. I'll try to think more. Right now I don't really understand how to do this correctly. OK, finish_callback_report() and utrace_control(DETACH) can set TIF_NOTIFY_RESUME. But what if there are no more attached engines? Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise utrace_resume/utrace_get_signal won't be called. So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset() should do user_disable_single_step() too if no more engines. Confused. And in fact I don't understand why this is important. When it comes to multiracing, any engine can hit the unwanted/unexpected trap because another engine can ask for UTRACE_*STEP. The only really important (I think) case is when the last engine detaches. IOW. Suppose that eninge E does utrace_control(STEP) or its callback returns UTRACE_*STEP. If we do not detach this engine, other engines will see the trap. So why it is so important to clear X86_EFLAGS_TF if we detach E ? Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/23, Oleg Nesterov wrote: On 09/21, Roland McGrath wrote: It's better to have a spurious report (preferably just spurious TIF_NOTIFY_RESUME with no actual callbacks) following any detach, or whatever it takes to ensure user_disable_single_step() always runs if user_enable_*_step did before and there is no engine around to see a report_quiesce callback pass. If there is a report_quiesce or report_signal callback pass where nobody returns UTRACE_*STEP, then we should never leave stepping enabled when we return to user mode. Hmm. I'll try to think more. Right now I don't really understand how to do this correctly. OK, finish_callback_report() and utrace_control(DETACH) can set TIF_NOTIFY_RESUME. But what if there are no more attached engines? Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise utrace_resume/utrace_get_signal won't be called. So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset() should do user_disable_single_step() too if no more engines. Confused. Or, detach can always do user_disable_single_step() and set TIF_NOTIFY_RESUME. If another engine wants stepping its report_quiesce() should re-ack UTRACE_SINGLESTEP anyway, otherwise it is buggy. And in fact I don't understand why this is important. When it comes to multiracing, any engine can hit the unwanted/unexpected trap because another engine can ask for UTRACE_*STEP. The only really important (I think) case is when the last engine detaches. Aaah. please ignore. Somehow I assumed every engine should hook SIGTRAP. Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/21, Roland McGrath wrote: Change utrace_reset() to do user_disable_single_step(). Alternatively we could change ptrace layer, but I think this is not ptrace specific. Yes, it's right to fix this in the utrace layer. But I'm not sure that's the right place in the code to fix it. The expectation is that either we'll get to finish_resume_report, which calls user_*_step, or that utrace_control is resuming us without any expectation of a report. For UTRACE_RESUME in that case, utrace_control calls user_disable_single_step. So probably the UTRACE_DETACH case there should just do it to. Yes, initially I was going to change utrace_control(DETACH), and this certainly looks more clean. I was worried about the case when the TIF_SINGLESTEP tracee detaches itself and escapes finish_resume_report()-user_disable_single_step(), say, utrace_report_exec(). But probably we can ignore this. Another concern was implicit detach, but thinking more I do not see why utrace_resume() is better. OK, I'll do some testing and resend right now. In UTRACE_DETACH case reset can be true but the tracee is running. But I don't think it makes sense to check target-exit_state == 0, correct? Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/21, Roland McGrath wrote: I was worried about the case when the TIF_SINGLESTEP tracee detaches itself and escapes finish_resume_report()-user_disable_single_step(), say, utrace_report_exec(). But probably we can ignore this. No, I think that is indeed a problem. If no engine is still attached whose last callback returned UTRACE_SINGLESTEP, we should never return to user mode with single-step enabled. OK, so what should we do for now? Without the multitracing utrace_resume()-user_disable_single_step() fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP. Another concern was implicit detach, but thinking more I do not see why utrace_resume() is better. OK, I'll do some testing and resend right now. In UTRACE_DETACH case reset can be true but the tracee is running. But I don't think it makes sense to check target-exit_state == 0, correct? I think it's OK if it's running with -exit_state set (or even with just PF_EXITING set), i.e. already in kernel mode and never going back to user mode. (Then it's essentially equivalent to calling user_*_step while racing with a SIGKILL death, which has to be OK.) Any other kind of running would not be OK. Yes, my concern was running and !current, not exiting. This is OK on x86 but user_disable_single_step() is arch specific. Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/22, Oleg Nesterov wrote: On 09/21, Roland McGrath wrote: I was worried about the case when the TIF_SINGLESTEP tracee detaches itself and escapes finish_resume_report()-user_disable_single_step(), say, utrace_report_exec(). But probably we can ignore this. No, I think that is indeed a problem. If no engine is still attached whose last callback returned UTRACE_SINGLESTEP, we should never return to user mode with single-step enabled. OK, so what should we do for now? Without the multitracing utrace_resume()-user_disable_single_step() fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP. As expected, the patch below equally fixes the problem and passes ptrace-tests. Which one do you prefer? Note that, since we are going to change UTRACE_RESUME to remove ENGINE_STOP from utrace_flags, we can probably unify DETACH/RESUME case UTRACE_DETACH: mark_engine_detached(engine); reset = reset || utrace_do_stop(target, utrace); if (!reset) { /* * As in utrace_set_events(), this barrier ensures * that our engine-flags changes have hit before we * examine utrace-reporting, pairing with the barrier * in start_callback(). If @target has not yet hit * finish_callback() to clear utrace-reporting, we * might be in the middle of a callback to @engine. */ smp_mb(); if (utrace-reporting == engine) ret = -EINPROGRESS; } /* fall */ case UTRACE_RESUME: clear_engine_wants_stop(engine); target-utrace_flags = ~ENGINE_STOP; /* * This and all other cases imply resuming if stopped. * There might not be another report before it just * resumes, so make sure single-step is not left set. */ if (likely(reset)) user_disable_single_step(target); break; --- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss 2010-09-20 03:55:12.0 +0200 +++ kstub/kernel/utrace.c 2010-09-22 01:54:24.0 +0200 @@ -1139,7 +1139,9 @@ int utrace_control(struct task_struct *t target-utrace_flags = ~ENGINE_STOP; mark_engine_detached(engine); reset = reset || utrace_do_stop(target, utrace); - if (!reset) { + if (reset) { + user_disable_single_step(target); + } else { /* * As in utrace_set_events(), this barrier ensures * that our engine-flags changes have hit before we
gdbstub initial code, v11
Changes: syscall stepping + minor cleanups. Seems to work, more or less, but surely there are some bugs. Honestly, I don't really know how do the right thing here. Anyway, most probably this code will be changed. Like ptrace, ugdb uses -report_syscall_exit() to synthesize a trap. Unlike ptrace, ugdb_report_signal() doesn't send SIGTRAP to itself but reports SIGTRAP using siginfo_t we have. In any case, whatever we do, multiple tracers can confuse each other. Next: fully implement g/G/p/P, currently I am a bit confused... But what about features? What should I do next? all-stop, thread-specific breakpoints (currently I have no idea what this means), or what? Oleg. #include linux/module.h #include linux/proc_fs.h #include linux/utrace.h #include linux/poll.h #include linux/mm.h #include linux/regset.h #include asm/uaccess.h static int o_remote_debug; module_param_named(echo, o_remote_debug, bool, 0); #define BUFFER_SIZE 1024 #define PACKET_SIZE 1024 struct pbuf { char*cur, *pkt; charbuf[BUFFER_SIZE]; }; static inline void pb_init(struct pbuf *pb) { pb-cur = pb-buf; pb-pkt = NULL; } enum { U_STOP_IDLE = 0, U_STOP_PENDING, U_STOP_SENT, }; struct ugdb { struct list_headu_processes; struct list_headu_stopped; int u_stop_state; struct mutexu_mutex; spinlock_t u_slock; struct ugdb_thread *u_cur_tinfo, *u_cur_hg, *u_cur_hc; wait_queue_head_t u_wait; int u_err; struct pbuf u_pbuf; charu_cbuf[PACKET_SIZE]; int u_clen; sigset_tu_sig_ign; unsigned int u_no_ack:1, u_allstop:1; }; static inline void ugdb_ck_stopped(struct ugdb *ugdb) { spin_lock(ugdb-u_slock); WARN_ON(!list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_IDLE); WARN_ON(list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_PENDING); spin_unlock(ugdb-u_slock); } static struct ugdb *ugdb_create(void) { struct ugdb *ugdb; int err; err = -ENODEV; // XXX: ugly. proc_reg_open() should take care. if (!try_module_get(THIS_MODULE)) goto out; err = -ENOMEM; ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL); if (!ugdb) goto put_module; INIT_LIST_HEAD(ugdb-u_processes); INIT_LIST_HEAD(ugdb-u_stopped); mutex_init(ugdb-u_mutex); spin_lock_init(ugdb-u_slock); init_waitqueue_head(ugdb-u_wait); pb_init(ugdb-u_pbuf); return ugdb; put_module: module_put(THIS_MODULE); out: return ERR_PTR(err); } #define P_DETACHING (1 1) #define P_ZOMBIE(1 2) struct ugdb_process { int p_pid; int p_state; struct list_headp_threads; struct ugdb *p_ugdb; struct list_headp_processes; }; static inline bool process_alive(struct ugdb_process *process) { return !(process-p_state P_ZOMBIE); } static inline void mark_process_dead(struct ugdb_process *process) { process-p_state |= P_ZOMBIE; } static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr) { struct ugdb_process *process; process = kzalloc(sizeof(*process), GFP_KERNEL); if (!process) return NULL; process-p_pid = pid_nr; process-p_ugdb = ugdb; INIT_LIST_HEAD(process-p_threads); list_add_tail(process-p_processes, ugdb-u_processes); return process; } #define T_STOP_RUN 0 #define T_STOP_REQ (1 0)/* requested by gdb */ #define T_STOP_ALL (1 1)/* vCont;c:pX.-1, for report_clone */ #define T_STOP_ACK (1 2)/* visible to vStopped */ #define T_STOP_STOPPED (1 3)/* reported as stopped to gdb */ /* TASK_TRACED + deactivated ? */ #define T_EV_NONE 0 #define T_EV_EXIT (1 24) #define T_EV_SIGN (2 24) #define T_EV_TYPE(event)((0xff 24) (event)) #define T_EV_DATA(event)(~(0xff 24) (event)) struct ugdb_thread { int t_tid; int t_stop_state; int t_stop_event; boolt_step; // XXX: move me into t_stop_event siginfo_t *t_siginfo; struct ugdb *t_ugdb; struct ugdb_process *t_process; struct
[PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
Test-case: #include stdio.h #include signal.h #include unistd.h #include sys/ptrace.h #include sys/wait.h #include assert.h int main(void) { int pid, status; pid = fork(); if (!pid) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); kill(getpid(), SIGSTOP); return 0x23; } assert(wait(status) == pid); assert(WIFSTOPPED(status) WSTOPSIG(status) == SIGSTOP); assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); assert(wait(status) == pid); assert(WIFSTOPPED(status) WSTOPSIG(status) == SIGTRAP); assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0); assert(wait(status) == pid); if (WIFEXITED(status) WEXITSTATUS(status) == 0x23) return 0; printf(ERR!! exit status: %04x\n, status); return 1; } It fails because TIF_SINGLESTEP is not cleared after PTRACE_DETACH and the tracee gets another trap after resume. Change utrace_reset() to do user_disable_single_step(). Alternatively we could change ptrace layer, but I think this is not ptrace specific. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c |1 + 1 file changed, 1 insertion(+) --- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss 2010-09-20 03:55:12.0 +0200 +++ kstub/kernel/utrace.c 2010-09-20 21:33:47.0 +0200 @@ -709,6 +709,7 @@ static bool utrace_reset(struct task_str */ utrace-resume = UTRACE_RESUME; utrace-signal_handler = 0; + user_disable_single_step(task); } /*
gdbstub initial code, v10
Back to ugdb. Changes: implement stepi, this also means breakpoints work too. Notes: - almost untested, probably needs more fixes - syscall-stepping doesn't work correctly (should be simple) - don't look at handle_setregs/handle_set_one_reg, I did this on purpose to be sure I really understand what gdb actually does Well. This wasn't simple because I nearly got heart attack twice when I was writing this change ;) However, it turns out that ptrace-utrace (old or recently changed) is innocent. I found 2 problems, the 1st one is /usr/bin/gdb bug, another one (I believe) is utrace core bug. I'll report tomorrow. Oleg. #include linux/module.h #include linux/proc_fs.h #include linux/utrace.h #include linux/poll.h #include linux/mm.h #include linux/regset.h #include asm/uaccess.h static int o_remote_debug; module_param_named(echo, o_remote_debug, bool, 0); #define BUFFER_SIZE 1024 #define PACKET_SIZE 1024 struct pbuf { char*cur, *pkt; charbuf[BUFFER_SIZE]; }; static inline void pb_init(struct pbuf *pb) { pb-cur = pb-buf; pb-pkt = NULL; } enum { U_STOP_IDLE = 0, U_STOP_PENDING, U_STOP_SENT, }; struct ugdb { struct list_headu_processes; struct list_headu_stopped; int u_stop_state; struct mutexu_mutex; spinlock_t u_slock; struct ugdb_thread *u_cur_tinfo, *u_cur_hg, *u_cur_hc; wait_queue_head_t u_wait; int u_err; struct pbuf u_pbuf; charu_cbuf[PACKET_SIZE]; int u_clen; sigset_tu_sig_ign; unsigned int u_no_ack:1, u_allstop:1; }; static inline void ugdb_ck_stopped(struct ugdb *ugdb) { spin_lock(ugdb-u_slock); WARN_ON(!list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_IDLE); WARN_ON(list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_PENDING); spin_unlock(ugdb-u_slock); } static struct ugdb *ugdb_create(void) { struct ugdb *ugdb; int err; err = -ENODEV; // XXX: ugly. proc_reg_open() should take care. if (!try_module_get(THIS_MODULE)) goto out; err = -ENOMEM; ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL); if (!ugdb) goto put_module; INIT_LIST_HEAD(ugdb-u_processes); INIT_LIST_HEAD(ugdb-u_stopped); mutex_init(ugdb-u_mutex); spin_lock_init(ugdb-u_slock); init_waitqueue_head(ugdb-u_wait); pb_init(ugdb-u_pbuf); return ugdb; put_module: module_put(THIS_MODULE); out: return ERR_PTR(err); } #define P_DETACHING (1 1) #define P_ZOMBIE(1 2) struct ugdb_process { int p_pid; int p_state; struct list_headp_threads; struct ugdb *p_ugdb; struct list_headp_processes; }; static inline bool process_alive(struct ugdb_process *process) { return !(process-p_state P_ZOMBIE); } static inline void mark_process_dead(struct ugdb_process *process) { process-p_state |= P_ZOMBIE; } static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr) { struct ugdb_process *process; process = kzalloc(sizeof(*process), GFP_KERNEL); if (!process) return NULL; process-p_pid = pid_nr; process-p_ugdb = ugdb; INIT_LIST_HEAD(process-p_threads); list_add_tail(process-p_processes, ugdb-u_processes); return process; } #define T_STOP_RUN 0 #define T_STOP_REQ (1 0)/* requested by gdb */ #define T_STOP_ALL (1 1)/* vCont;c:pX.-1, for report_clone */ #define T_STOP_ACK (1 2)/* visible to vStopped */ #define T_STOP_STOPPED (1 3)/* reported as stopped to gdb */ /* TASK_TRACED + deactivated ? */ #define T_EV_NONE 0 #define T_EV_EXIT (1 24) #define T_EV_SIGN (2 24) #define T_EV_TYPE(event)((0xff 24) (event)) #define T_EV_DATA(event)(~(0xff 24) (event)) struct ugdb_thread { int t_tid; int t_stop_state; int t_stop_event; boolt_step; // XXX: move me into t_stop_event siginfo_t *t_siginfo; struct ugdb *t_ugdb; struct ugdb_process *t_process; struct
Re: exit_ptrace() ptrace_report_signal() problems
On 09/14, Oleg Nesterov wrote: Oh, please take a look at this (untested) patch. I tried to test the cleanuped version, seems to work. But I think I should make another attempt, probably spinlock (-siglock ?) would be cleaner. No. I think more locking buys nothing. I'll split this patch and resend today. Oleg.
Re: exit_ptrace() ptrace_report_signal() problems
On 09/14, Oleg Nesterov wrote: On 09/13, Roland McGrath wrote: Locks just between the tracer and ptrace_report_signal are not bad. OK, but let me think a bit more. I'd really like to avoid adding the new lock to avoid the very unlikely race with the exiting tracer. Oh, please take a look at this (untested) patch. But I think I should make another attempt, probably spinlock (-siglock ?) would be cleaner. Oleg. --- kstub/kernel/ptrace-utrace.c~9_EXIT_PTRACE_CAN_LOSE_SIG 2010-08-24 14:31:17.0 +0200 +++ kstub/kernel/ptrace-utrace.c2010-09-14 23:37:59.0 +0200 @@ -67,6 +67,7 @@ struct ptrace_context { #define PT_UTRACED 0x1000 #define PTRACE_O_SYSEMU0x100 +#define PTRACE_O_REUSABLE 0x200 #define PTRACE_EVENT_SYSCALL (1 16) #define PTRACE_EVENT_SIGTRAP (2 16) @@ -115,7 +116,7 @@ ptrace_reuse_engine(struct task_struct * return engine; ctx = ptrace_context(engine); - if (unlikely(ctx-resume == UTRACE_DETACH)) { + if (unlikely(ctx-options == PTRACE_O_REUSABLE)) { /* * Try to reuse this self-detaching engine. * The only caller which can hit this case is ptrace_attach(), @@ -246,24 +247,48 @@ static void ptrace_detach_task(struct ta */ bool voluntary = (sig = 0); struct utrace_engine *engine = ptrace_lookup_engine(tracee); + struct ptrace_context *ctx = ptrace_context(engine); enum utrace_resume_action action = UTRACE_DETACH; if (unlikely(IS_ERR(engine))) return; - if (sig) { - struct ptrace_context *ctx = ptrace_context(engine); + if (!voluntary) { + int err; + + ctx-resume = UTRACE_DETACH; + /* synchronize with ptrace_report_signal() */ + do { + err = utrace_barrier(tracee, engine); + } while (err == -ERESTARTSYS); + + if (!err) { + /* +* The tracee has already reported a signal. If we +* see -siginfo != NULL it is safe to mark this ctx +* as reusable and resume the tracee. We can race +* with ptrace_report_signal(UTRACE_SIGNAL_REPORT) +* already in progress but this doesn't matter, it +* must see -resume = UTRACE_DETACH. +*/ + if (ctx-siginfo) { + smp_wmb(); + ctx-options = PTRACE_O_REUSABLE; + action = UTRACE_RESUME; + } + } + } else if (sig) { switch (get_stop_event(ctx)) { case PTRACE_EVENT_SYSCALL: - if (voluntary) - send_sig_info(sig, SEND_SIG_PRIV, tracee); + send_sig_info(sig, SEND_SIG_PRIV, tracee); break; case PTRACE_EVENT_SIGNAL: - if (voluntary) - ctx-signr = sig; + ctx-signr = sig; ctx-resume = UTRACE_DETACH; + smp_wmb(); + ctx-options = PTRACE_O_REUSABLE; action = UTRACE_RESUME; break; } @@ -410,6 +435,9 @@ static u32 ptrace_report_syscall_exit(u3 return UTRACE_STOP; if (ctx-resume != UTRACE_RESUME) { + if (ctx-resume == UTRACE_DETACH) + return UTRACE_RESUME; + WARN_ON(ctx-resume != UTRACE_BLOCKSTEP ctx-resume != UTRACE_SINGLESTEP); ctx-resume = UTRACE_RESUME; @@ -502,6 +530,9 @@ static u32 ptrace_report_signal(u32 acti ctx-siginfo = NULL; if (resume != UTRACE_RESUME) { + if (resume == UTRACE_DETACH) + return action; + WARN_ON(resume != UTRACE_BLOCKSTEP resume != UTRACE_SINGLESTEP); @@ -531,6 +562,11 @@ static u32 ptrace_report_signal(u32 acti } WARN_ON(ctx-siginfo); + + /* Raced with the exiting tracer ? */ + if (resume == UTRACE_DETACH) + return action; + ctx-siginfo = info; /* * ctx-siginfo points to the caller's stack. @@ -759,7 +795,7 @@ void exit_ptrace(struct task_struct *tra static int ptrace_set_options(struct task_struct *tracee, struct utrace_engine *engine, long data) { - BUILD_BUG_ON(PTRACE_O_MASK PTRACE_O_SYSEMU); + BUILD_BUG_ON(PTRACE_O_MASK (PTRACE_O_SYSEMU | PTRACE_O_REUSABLE
Re: gdbstub initial code, v9
On 09/10, Roland McGrath wrote: Oleg == Oleg Nesterov o...@redhat.com writes: Oleg (gdb) set var 0 You need: set variable var = 0 The variable can be abbreviated. I've always just used: (gdb) set var=0 No, I tried this too, doesn't work. (gdb) set var=0 A syntax error in expression, near `=0'. But, it turns out I choosed a bad name for the variable when I tested the fix in unxex(). (gdb) set xxx=0 This works. (gdb) set var var=0 This works too. I guess, when gdb sees set var it expects the full set variable ... command. Oleg.
Re: gdbstub initial code, v7
On 09/10, Roland McGrath wrote: But I meant another case, when the stopped tracee doesn't have siginfo. Currently ugdb just sends this signal to tracee, and then it will be reported to gdb. Not sure if this is right or not, I can change this. (or perhap this doesn't matter, I dunno). What do you mean by doesn't have siginfo? You mean non-signal stops? Yes. What non-signal stops does ugdb report? (gdb) interrupt ugdb sets please stop flag and does utrace_control(INTERRUPT). However, in unlikely case the tracee can stop before -report_signal() reporting loop (especially in multitracing case). Or it can be already stopped (note: this needs a separate discussion, currently ugdb intentionally doesn't handle this case). And. With the current implementation, even if the tracee stops after ugdb_report_signal() was called, it doesn't setup -t_siginfo. IOW. If the tracee actually recieves a signal, then - qXfer:siginfo:read works - signal SIG works as expected (delivered to tracee) Otherwise - qXfer:siginfo:read reports E01 - signal XX means TXX report. Once again, this can be changed (fixed?), but I am not sure this should be changed. Oleg.
Re: gdbstub initial code, v8 ptrace
On 09/10, Roland McGrath wrote: I am a bit confused... OK, ugdb is wrong wrt multitracing. UTRACE_SIGNAL_REPORT case shouldn't return UTRACE_STOP | UTRACE_SIGNAL_IGN, it should return UTRACE_STOP | UTRACE_SIGNAL_REPORT to keep report-result. No, UTRACE_SIGNAL_REPORT is not meant for a return value. Its only use is in the incoming argument to tell you that a given report_signal call is standing in for a report_quiesce(0) call. Yes, that is why initially ugdb returned | UTRACE_SIGNAL_IGN. But you misunerstood my concerns (or me your reply ;) But. Suppose we have to attached engines. The first engine gets UTRACE_SIGNAL_REPORT and returns UTRACE_STOP | UTRACE_SIGNAL_IGN. Or, But it needs to return UTRACE_SIGNAL_DELIVER? That's what you return when you want the signal delivered. or yes, it returns UTRACE_SIGNAL_DELIVER after gdb does signal XX. Now. The second engine gets UTRACE_SIGNAL_DELIVER or UTRACE_SIGNAL_IGN, not UTRACE_SIGNAL_REPORT. That is why ugdb_signal_report(UTRACE_SIGNAL_REPORT) tries to return UTRACE_STOP | utrace_signal_action(action) to not change report-result (passed to the next tracee) inside the reporting loop. The worst case is UTRACE_SIGNAL_HANDLER. Single-stepping should know about this case. This means that any engine should always return UTRACE_resume_action | UTRACE_SIGNAL_HANDLER. Unless we are going to change utrace_get_signal(). Probably we can check orig_ka != NULL and treat orig_ka == NULL as UTRACE_SIGNAL_REPORT. Hopefully this can't be confused with UTRACE_SIGNAL_HANDLER. I'm not really sure what you mean here. If -report_signal(orig_ka) was calles with orig_ka == NULL, then, whatever utrace_signal_action(action) we see, originally it was either UTRACE_SIGNAL_HANDLER or UTRACE_SIGNAL_REPORT but another engine returned, say, UTRACE_SIGNAL_DELIVER/UTRACE_SIGNAL_IGN to deliver/stop. Not sure about UTRACE_SIGNAL_HOLD, but this is very unlikely race. You probably don't want to use that, but I'm not entirely sure. ptrace doesn't use it, and the signal interception model is pretty much the same. Yes, agreed. But there is one corner case. Until we generalize utrace_stop()-ptrace_notify_stop() hook, the tracee can be reported as stopped to gdb, but before it actually stops it can recieve a signal. The remote protocol doesn't allow to send TSIG after we alreay sent T00 (at least this actually confuses gdb), and we can't just stop, the tracee should report this signal to debugger. So, currently ugdb stops but uses UTRACE_SIGNAL_HOLD to report this signal later. Oleg.
Re: gdbstub initial code, v7
On 09/10, Roland McGrath wrote: ugdb sets please stop flag and does utrace_control(INTERRUPT). However, in unlikely case the tracee can stop before -report_signal() reporting I don't think this is the right thing to do. When the intent is explicitly to interrupt, there is no reason to stop before the interruption is complete, i.e. report_signal. This means that ugdb_report_quiesce() should never return UTRACE_STOP, and that is all. But what about multitracing? Suppose that (gdb) interrupt happens just before, say, do_report_syscall_entry() and another engine wants to stop. If ugdb_report_quiesce() doesn't return UTRACE_STOP, then gdb will wait until another debugger resumes the tracee. What do you think? If you only stop there, then you can always process a signal injection with complete flexibility. Yes, sure (again, currently ugdb does not injection a signal even if the tracee was stopped in report_signal, but of course we can change this). Oleg.
Re: gdbstub initial code, v9
On 09/09, Frank Ch. Eigler wrote: Oleg Nesterov o...@redhat.com writes: [...] But, Jan. Implementing the memory writes does not mean breakpoints automatically start to work! It approximately should though. Yes, gdb writes cc, and yes the tracee reports SIGTRAP. But after that continue does nothing except $c, and the tracee naturally gets SIGILL. I expected that, since ugdb doesn't even know the code was changed, gdb should write the original byte back before continue, but this doesn't happen. In normal all-stop mode, Currently ugdb only supports non-stop gdb does normally replace the old instruction, in order to single-step over it with the 's' packet. Yes, probably single-stepping is needed... I am still trying to understand how this works with gdbserver, but I see vCont:s packets. Perhaps you're testing some buggy non-stop aspect that only works with 'Z' breakpoint management packets? No. Just a trivial test-case which printfs in a loop. A fuller packet trace would help explain. Please see below. But the only important part is: $M4005ba,1:cc --- set bp $c --- resume of course, this can't work. Full trace: = qSupported:multiprocess+ = PacketSize=400;QStartNoAckMode+;QNonStop+;multiprocess+;QPassS... = QStartNoAckMode = OK = ! = OK = Hgp0.0 = E01 = QNonStop:1 = OK = qfThreadInfo = E01 = ? = OK = qSymbol:: = = vAttach;95b = OK = qfThreadInfo = mp95b.95b = qsThreadInfo = l = Hgp95b.95b = OK = vCont? = vCont;t = vCont;t:p95b.-1 = OK = %Stop:T00thread:p95b.95b; = vStopped = OK = g = fcfd90ad5329ff7f00... = m600880,8 = 403c6d7d007f = m7f007d6d3c48,8 = 00106d7d007f = m7f007d6d1000,28 = f6e04c7d007fe807600080156d7d007f00... = m7f007d6d1580,28 = 00f0ef29ff7ff6e04c7d007f50f45f29ff7f00c06c7d007f00... = m7f007d4ce0f4,4 = 090a0069 = m7f007d6cc000,28 = 0030167d007f781f6d7d007f400b4b7d007fe8346d7d007f00... = m7f007d6d1f78,4 = 2f6c6962 = m7f007d6d1f7c,4 = 2f6c6962 = m7f007d6d1f80,4 = 632e736f = m7f007d6d1f84,4 = 2e36 = m7f007d6d34e8,28 = 00704b7d007f0002482e6d7d007f00... = m400200,4 = 2f6c6962 = m400204,4 = 2f6c642d = m400208,4 = 6c696e75 = m40020c,4 = 782d7838 = m400210,4 = 362d3634 = m400214,4 = 2e736f2e = m400218,4 = 3200 = m7f007d6d3c40,4 = 0100 = m7f007d6d3c48,8 = 00106d7d007f = m7f007d6d3c50,8 = c04e4c7d007f = Z0,7f007d4c4ec0,1 = = m7f007d4c4ec0,1 = f3 = X7f007d4c4ec0,0: = = M7f007d4c4ec0,1:cc = OK = m600880,8 = 403c6d7d007f = m7f007d6d3c48,8 = 00106d7d007f = m7f007d6d1000,28 = f6e04c7d007fe807600080156d7d007f00... = m7f007d6d1580,28 = 00f0ef29ff7ff6e04c7d007f50f45f29ff7f00c06c7d007f00... = m7f007d4ce0f4,4 = 090a0069 = m7f007d6cc000,28 = 0030167d007f781f6d7d007f400b4b7d007fe8346d7d007f00... = m7f007d6d1f78,4 = 2f6c6962 = m7f007d6d1f7c,4 = 2f6c6962 = m7f007d6d1f80,4 = 632e736f = m7f007d6d1f84,4 = 2e36 = m7f007d6d34e8,28 = 00704b7d007f0002482e6d7d007f00... = m400200,4 = 2f6c6962 = m400204,4 = 2f6c642d = m400208,4 = 6c696e75 = m40020c,4 = 782d7838 = m400210,4 = 362d3634 = m400214,4 = 2e736f2e = m400218,4 = 3200 = m7f007d6d3c40,4 = 0100 = vCont;t:p95b.-1 = OK = m7f007d201f40,1 = 48 = m7f007d201f40,1 = 48 = g = fcfd90ad5329ff7f00... = m7f007d201f40,1 = 48 = m7f007d201f40,1 = 48 = m40056c,12 = 554889e5e8e3fe89c6ba0700bfdc = m40056c,1 = 55 = m40056d,3 = 4889e5 = m40056c,12 = 554889e5e8e3fe89c6ba0700bfdc = m40056c,1 = 55 = m40056d,3 = 4889e5 = m4005ba,1 = e8 = m4005ba,1 = e8 (gdb) b BP.c:13 Breakpoint 1 at 0x4005ba: file BP.c
Re: gdbstub initial code, v9
On 09/09, Oleg Nesterov wrote: the tracee hits this bp and reports SIGTRAP = vStopped = OK = g = 00064000401f207d007f00... = P10=ba054000 = = G00064000401f207d007f0... = May be this can explain... Probably I need to implement G/P first, otherwise gdb can't change ip. Still, I'd appreciate if someone can explain me what gdb needs/expects to handle breakpoints before I start to read the sources. Oleg.
gdbstub initial code, v9
Changes: - partly fix the multitracing problems. ugdb still can't work with ptrace, ptrace-utrace.c needs changes. But at least multiple udgb tracers can coexist, more or less. But of course they can confuse each other anyway, no matter what ugdb does. - implement memory writes ($M). - refactor memory reads to avoid the Too late to report the error case. But, Jan. Implementing the memory writes does not mean breakpoints automatically start to work! Yes, gdb writes cc, and yes the tracee reports SIGTRAP. But after that continue does nothing except $c, and the tracee naturally gets SIGILL. I expected that, since ugdb doesn't even know the code was changed, gdb should write the original byte back before continue, but this doesn't happen. Tried to understand how this works with gdbserver, but failed so far. Will continue, but any hint is very much appreciated ;) Oleg. #include linux/module.h #include linux/proc_fs.h #include linux/utrace.h #include linux/poll.h #include linux/mm.h #include linux/regset.h #include asm/uaccess.h static int o_remote_debug; module_param_named(echo, o_remote_debug, bool, 0); #define BUFFER_SIZE 1024 #define PACKET_SIZE 1024 struct pbuf { char*cur, *pkt; charbuf[BUFFER_SIZE]; }; static inline void pb_init(struct pbuf *pb) { pb-cur = pb-buf; pb-pkt = NULL; } enum { U_STOP_IDLE = 0, U_STOP_PENDING, U_STOP_SENT, }; struct ugdb { struct list_headu_processes; struct list_headu_stopped; int u_stop_state; struct mutexu_mutex; spinlock_t u_slock; struct ugdb_thread *u_cur_tinfo, *u_cur_hg, *u_cur_hc; wait_queue_head_t u_wait; int u_err; struct pbuf u_pbuf; charu_cbuf[PACKET_SIZE]; int u_clen; sigset_tu_sig_ign; unsigned int u_no_ack:1, u_allstop:1; }; static inline void ugdb_ck_stopped(struct ugdb *ugdb) { spin_lock(ugdb-u_slock); WARN_ON(!list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_IDLE); WARN_ON(list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_PENDING); spin_unlock(ugdb-u_slock); } static struct ugdb *ugdb_create(void) { struct ugdb *ugdb; int err; err = -ENODEV; // XXX: ugly. proc_reg_open() should take care. if (!try_module_get(THIS_MODULE)) goto out; err = -ENOMEM; ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL); if (!ugdb) goto put_module; INIT_LIST_HEAD(ugdb-u_processes); INIT_LIST_HEAD(ugdb-u_stopped); mutex_init(ugdb-u_mutex); spin_lock_init(ugdb-u_slock); init_waitqueue_head(ugdb-u_wait); pb_init(ugdb-u_pbuf); return ugdb; put_module: module_put(THIS_MODULE); out: return ERR_PTR(err); } #define P_DETACHING (1 1) #define P_ZOMBIE(1 2) struct ugdb_process { int p_pid; int p_state; struct list_headp_threads; struct ugdb *p_ugdb; struct list_headp_processes; }; static inline bool process_alive(struct ugdb_process *process) { return !(process-p_state P_ZOMBIE); } static inline void mark_process_dead(struct ugdb_process *process) { process-p_state |= P_ZOMBIE; } static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr) { struct ugdb_process *process; process = kzalloc(sizeof(*process), GFP_KERNEL); if (!process) return NULL; process-p_pid = pid_nr; process-p_ugdb = ugdb; INIT_LIST_HEAD(process-p_threads); list_add_tail(process-p_processes, ugdb-u_processes); return process; } #define T_STOP_RUN 0 #define T_STOP_REQ (1 0)/* requested by gdb */ #define T_STOP_ALL (1 1)/* vCont;c:pX.-1, for report_clone */ #define T_STOP_ACK (1 2)/* visible to vStopped */ #define T_STOP_STOPPED (1 3)/* reported as stopped to gdb */ /* TASK_TRACED + deactivated ? */ #define T_EV_NONE 0 #define T_EV_EXIT (1 24) #define T_EV_SIGN (2 24) #define T_EV_TYPE(event)((0xff 24) (event)) #define T_EV_DATA(event)(~(0xff 24) (event)) struct ugdb_thread { int t_tid; int
Re: gdbstub initial code, v8
On 09/06, Frank Ch. Eigler wrote: Oleg Nesterov o...@redhat.com writes: [...] Therefore until you track some ugdb-specific software(*) breakpoints ugdb does not need to support Z0 IMO. I guess ugdb will never have to support these: thread-related(?) and tracepoint ones. Good! I thought ugdb should somehow handle this all transparently for gdb. I thought (I don't know why) that writing int 3 from gdb side should be avoided in favour of some better method unknown to me. Please note that last year's gdbstub prototype used kernel uprobes as an optional gdb breakpoint implementation (i.e., a backend for the Z packets). When/if the lkml uprobes patches actually get merged, ugdb should also use them. Yes, agreed. Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 09/03, Roland McGrath wrote: So, it is technically kosher enough to use UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure about all those caveats above. How? see below. I mentioned the examples. I misunderstood the UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) above as if you mean it is possible to get UTRACE_SIGNAL_REPORT without QUIESCE. But, it's only UTRACE_EVENT(QUIESCE) that gives you any expectation of an extra callback for no event from either UTRACE_REPORT or UTRACE_INTERRUPT. Yes, this is clear too. Apparently not. ;-) Perhaps ;) At least I certainly missed your point. QUIESCE is the only event type for no event. If you only asked for signal events, then you only get a callback when there is an actual signal event. If you want an extra callback before dequeuing any signal, then that is what QUIESCE is for. OK. This means ugdb should set QUIESCE, just to ensure its -report_signal() will be called. Once again, I am not asking to change utrace now, but could you explain what is wrong with the patch below? That gives an extra unrequested report_signal callback to every engine that is only interested in some signal event. Consider a crash-catcher engine. It would only track UTRACE_EVENT(SIGNAL_CORE) (and perhaps CLONE for its bookkeeping). This engine never asked for a callback before each and every attempt to dequeue any signal, I see your point (I hope). which is what you would give it with your change. No ;) My change was wrong. event == 0 in this case, and then later utrace_get_signal() checks if ((want (event | UTRACE_EVENT(QUIESCE))) == 0) continue; OK, please forget. I must admit, this still looks a bit, well, unnatural to me. May be it makes sense to add _UTRACE_EVENT_SIGNAL_REPORT, _UTRACE_EVENT_SIGNAL_HANDLER, into utrace_events. Then ugdb could ask for UTRACE_SIGNAL_REPORT and avoid the unnecessary -report_quiesce() calls. But at least I can see the logic now, thanks. Roland, could you also comment this patch? https://www.redhat.com/archives/utrace-devel/2010-August/msg00108.html Again, this looks like a bug to me, but I won't insist if it is not. Oleg.
Re: gdbstub initial code, v8
On 09/05, Jan Kratochvil wrote: On Sat, 04 Sep 2010 00:40:47 +0200, Oleg Nesterov wrote: - implement qXfer:siginfo:read - implement continue with signal. OK, thanks, just it was a bit premature to ask for it I see. I miss at least memory writes Yes. This is simple. (also to put in breakpoints): And this is not clear to me, I need your help ;) What should ugdb know about breakpoints? I played with the real gdbserver, and afaics gdb just changes the tracee's memory and inserts 0xcc (int 3). But gdb.info mentions Z TYPE,ADDR,KIND packets. So, what should ugdb do? Just implement memory write (M or X) and then report SIGTRAP like gdbserver does? Oleg.
Re: gdbstub initial code, v8
On 09/06, Jan Kratochvil wrote: On Mon, 06 Sep 2010 20:18:08 +0200, Oleg Nesterov wrote: On 09/05, Jan Kratochvil wrote: (also to put in breakpoints): And this is not clear to me, I need your help ;) Sorry, I just meant that by implementing the memory writes breakpoints automatically start to work. What should ugdb know about breakpoints? I played with the real gdbserver, and afaics gdb just changes the tracee's memory and inserts 0xcc (int 3). But gdb.info mentions Z TYPE,ADDR,KIND packets. I believe it is described by: /* If GDB wanted this thread to single step, we always want to report the SIGTRAP, and let GDB handle it. Watchpoints should always be reported. So should signals we can't explain. A SIGTRAP we can't explain could be a GDB breakpoint --- we may or not support Z0 breakpoints. If we do, we're be able to handle GDB breakpoints on top of internal breakpoints, by handling the internal breakpoint and still reporting the event to GDB. If we don't, we're out of luck, GDB won't see the breakpoint hit. */ So, what should ugdb do? Just implement memory write (M or X) and then report SIGTRAP like gdbserver does? Therefore until you track some ugdb-specific software(*) breakpoints ugdb does not need to support Z0 IMO. I guess ugdb will never have to support these: thread-related(?) and tracepoint ones. Good! I thought ugdb should somehow handle this all transparently for gdb. I thought (I don't know why) that writing int 3 from gdb side should be avoided in favour of some better method unknown to me. Thanks Jan. Oleg.
Re: gdbstub initial code, v7
Sorry for the delay, I was distracted. Trying to switch back to ugdb. On 08/31, Jan Kratochvil wrote: ugdb should support qXfer:siginfo, currently accessible only via $_siginfo print/set, though. Still sure this feature should be also implemented one day. Yes sure. This should be simple, although I didn't expect qXfer needs remote_escape_output() and x86_siginfo_fixup(). I assume that qXfer:siginfo:read always mean Hg thread. It is not clear to me what should ugdb report if there is no a valid siginfo. linux_xfer_siginfo() return E01, but gdbserver uses SIGSTOP to stop the tracee, so it always has something to report. But ugdb stop the tracee somewhere else, not in get_signal_to_deliver() path. Likewise, it is not clear what should ugdb do if gdb sends $CSIG in this case. But this all is minor, I think. I was going to send v8 which implements qXfer:siginfo:read and continue with signal, but (oh, as always) hit the unexpected problems. To the point, I had to add printk's to utrace.c to understand what is wrong. Hopefully tomorrow. Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
I was going to ping you ;) This is connected to the problem I hit today. Despite the fact I already complained about utrace_get_signal() QUIESCE, I forgot about this and figured out it doesn't work in practice. On 09/02, Roland McGrath wrote: If an engine used UTRACE_INTERRUPT without having first set QUIESCE, then it has no rightful expectation of getting any callback. Hmm. This is certainly new to me. Could you confirm? Well, I didn't say it precisely correctly. UTRACE_INTERRUPT serves two purposes. First, it just interrupts things like a signal would. You could use that on its own without even tracing any events at all, just do achieve fault injection or similar. Second, it's like UTRACE_REPORT. Yes, this is clear. As with UTRACE_REPORT, after UTRACE_INTERRUPT you can expect to get some report_quiesce callback even if there is no other event you were tracing that happens. For this to actually happen, you need to have UTRACE_EVENT(QUIESCE) set. Hmm. I am not sure I understand. If -report_signal != NULL, then you can't expect -report_quiesce() after utrace_control(INTERRUPT) ? So, it is technically kosher enough to use UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure about all those caveats above. How? see below. But, it's only UTRACE_EVENT(QUIESCE) that gives you any expectation of an extra callback for no event from either UTRACE_REPORT or UTRACE_INTERRUPT. Yes, this is clear too. So. Now I am even more confused. First of all, ugdb (at least currently) does not need report_quiesce() and UTRACE_EVENT(QUIESCE) at all. OK, this is not 100% true due to multitracing/etc, lets ignore this. Anyway it makes sense to clear QUIESCE after $c to avoid the unnecessary callbacks. All it needs is -report_signal(). But, the problem is, it is not called after utrace_control(INTERRUPT) ! You need QUIESCE to ensure report_signal() will be called, and this looks at least strange to me. Once again, I am not asking to change utrace now, but could you explain what is wrong with the patch below? Oleg. --- x/kernel/utrace.c +++ x/kernel/utrace.c @@ -2029,7 +2029,8 @@ int utrace_get_signal(struct task_struct report.spurious = false; finish_resume_report(task, utrace, report); return -1; - } else if (!(task-utrace_flags UTRACE_EVENT(QUIESCE))) { + } else if (!(task-utrace_flags + (UTRACE_EVENT(QUIESCE) | UTRACE_EVENT_SIGNAL_ALL))) { /* * We only got here to clear utrace-signal_handler. */
gdbstub initial code, v7
Changes: - report signals. A bit more code changes than I expected. - implement QPassSignals, trivial. Note: $CSIG is not supported yet, and I am not sure I understand how it should work. Next time... #include linux/module.h #include linux/proc_fs.h #include linux/utrace.h #include linux/poll.h #include linux/mm.h #include linux/regset.h #include asm/uaccess.h static int o_remote_debug; module_param_named(echo, o_remote_debug, bool, 0); #define BUFFER_SIZE 1024 #define PACKET_SIZE 1024 struct pbuf { char*cur, *pkt; charbuf[BUFFER_SIZE]; }; static inline void pb_init(struct pbuf *pb) { pb-cur = pb-buf; pb-pkt = NULL; } enum { U_STOP_IDLE = 0, U_STOP_PENDING, U_STOP_SENT, }; struct ugdb { struct list_headu_processes; struct list_headu_stopped; int u_stop_state; struct mutexu_mutex; spinlock_t u_slock; struct ugdb_thread *u_cur_tinfo, *u_cur_hg, *u_cur_hc; wait_queue_head_t u_wait; int u_err; struct pbuf u_pbuf; charu_cbuf[PACKET_SIZE]; int u_clen; sigset_tu_sig_ign; unsigned int u_no_ack:1, u_allstop:1; }; static inline void ugdb_ck_stopped(struct ugdb *ugdb) { spin_lock(ugdb-u_slock); WARN_ON(!list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_IDLE); WARN_ON(list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_PENDING); spin_unlock(ugdb-u_slock); } static struct ugdb *ugdb_create(void) { struct ugdb *ugdb; int err; err = -ENODEV; // XXX: ugly. proc_reg_open() should take care. if (!try_module_get(THIS_MODULE)) goto out; err = -ENOMEM; ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL); if (!ugdb) goto put_module; INIT_LIST_HEAD(ugdb-u_processes); INIT_LIST_HEAD(ugdb-u_stopped); mutex_init(ugdb-u_mutex); spin_lock_init(ugdb-u_slock); init_waitqueue_head(ugdb-u_wait); pb_init(ugdb-u_pbuf); return ugdb; put_module: module_put(THIS_MODULE); out: return ERR_PTR(err); } #define P_DETACHING (1 1) #define P_ZOMBIE(1 2) struct ugdb_process { int p_pid; int p_state; struct list_headp_threads; struct ugdb *p_ugdb; struct list_headp_processes; }; static inline bool process_alive(struct ugdb_process *process) { return !(process-p_state P_ZOMBIE); } static inline void mark_process_dead(struct ugdb_process *process) { process-p_state |= P_ZOMBIE; } static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr) { struct ugdb_process *process; process = kzalloc(sizeof(*process), GFP_KERNEL); if (!process) return NULL; process-p_pid = pid_nr; process-p_ugdb = ugdb; INIT_LIST_HEAD(process-p_threads); list_add_tail(process-p_processes, ugdb-u_processes); return process; } #define T_STOP_RUN 0 #define T_STOP_REQ (1 0)/* requested by gdb */ #define T_STOP_ALL (1 1)/* vCont;c:pX.-1, for report_clone */ #define T_STOP_ACK (1 2)/* visible to vStopped */ #define T_STOP_STOPPED (1 3)/* reported as stopped to gdb */ /* TASK_TRACED + deactivated ? */ #define T_EV_NONE 0 #define T_EV_EXIT (1 24) #define T_EV_SIGN (2 24) #define T_EV_TYPE(event)((0xff 24) (event)) #define T_EV_DATA(event)(~(0xff 24) (event)) struct ugdb_thread { int t_tid; int t_stop_state; int t_stop_event; struct ugdb *t_ugdb; struct ugdb_process *t_process; struct list_headt_threads; struct list_headt_stopped; struct pid *t_spid; struct utrace_engine*t_engine; }; static inline bool thread_alive(struct ugdb_thread *thread) { WARN_ON((thread-t_tid != 0) != process_alive(thread-t_process)); return thread-t_tid != 0; } static inline void mark_thread_dead(struct ugdb_thread *thread) { mark_process_dead(thread-t_process); thread-t_tid = 0; } /* * The thread should be alive, and it can't pass ugdb_report_death() * if the caller holds ugdb-u_mutex. However
gdbstub initial code, v6
Changes: - adapt to no more utrace changes. Whatever I think about utrace I have to agree, it is not practical to change utrace now - attach to the thread-group with the dead leader - fix the double-attaching bug - do not assume we can trust pid_task(), the tracee can be reaped even if it can't pass -report_death() - size_t is unsigned! -EAGAIN from -read() makes this fd unusable from rw_verify_area() pov Next: report signals. #include linux/module.h #include linux/proc_fs.h #include linux/utrace.h #include linux/poll.h #include linux/mm.h #include linux/regset.h #include asm/uaccess.h static int o_remote_debug; module_param_named(echo, o_remote_debug, bool, 0); #define BUFFER_SIZE 1024 #define PACKET_SIZE 1024 struct pbuf { char*cur, *pkt; charbuf[BUFFER_SIZE]; }; static inline void pb_init(struct pbuf *pb) { pb-cur = pb-buf; pb-pkt = NULL; } enum { U_STOP_IDLE = 0, U_STOP_PENDING, U_STOP_SENT, }; struct ugdb { struct list_headu_processes; struct list_headu_stopped; int u_stop_state; struct mutexu_mutex; spinlock_t u_slock; struct ugdb_thread *u_cur_tinfo, *u_cur_hg, *u_cur_hc; wait_queue_head_t u_wait; int u_err; struct pbuf u_pbuf; charu_cbuf[PACKET_SIZE]; int u_clen; unsigned int u_no_ack:1, u_allstop:1; }; static inline void ugdb_ck_stopped(struct ugdb *ugdb) { spin_lock(ugdb-u_slock); WARN_ON(!list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_IDLE); WARN_ON(list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_PENDING); spin_unlock(ugdb-u_slock); } static struct ugdb *ugdb_create(void) { struct ugdb *ugdb; int err; err = -ENODEV; // XXX: ugly. proc_reg_open() should take care. if (!try_module_get(THIS_MODULE)) goto out; err = -ENOMEM; ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL); if (!ugdb) goto put_module; INIT_LIST_HEAD(ugdb-u_processes); INIT_LIST_HEAD(ugdb-u_stopped); mutex_init(ugdb-u_mutex); spin_lock_init(ugdb-u_slock); init_waitqueue_head(ugdb-u_wait); pb_init(ugdb-u_pbuf); return ugdb; put_module: module_put(THIS_MODULE); out: return ERR_PTR(err); } #define P_DETACHING (1 1) #define P_ZOMBIE(1 2) struct ugdb_process { int p_pid; int p_state; struct list_headp_threads; struct ugdb *p_ugdb; struct list_headp_processes; }; static inline bool process_alive(struct ugdb_process *process) { return !(process-p_state P_ZOMBIE); } static inline void mark_process_dead(struct ugdb_process *process) { process-p_state |= P_ZOMBIE; } static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr) { struct ugdb_process *process; process = kzalloc(sizeof(*process), GFP_KERNEL); if (!process) return NULL; process-p_pid = pid_nr; process-p_ugdb = ugdb; INIT_LIST_HEAD(process-p_threads); list_add_tail(process-p_processes, ugdb-u_processes); return process; } #define T_STOP_RUN 0 #define T_STOP_REQ (1 0)/* requested by gdb */ #define T_STOP_ALL (1 1)/* vCont;c:pX.-1, for report_clone */ #define T_STOP_ACK (1 2)/* visible to vStopped */ #define T_STOP_STOPPED (1 3)/* reported as stopped to gdb */ /* TASK_TRACED + deactivated ? */ struct ugdb_thread { int t_tid; int t_stop_state; int t_stop_code; struct ugdb *t_ugdb; struct ugdb_process *t_process; struct list_headt_threads; struct list_headt_stopped; struct pid *t_spid; struct utrace_engine*t_engine; }; static inline bool thread_alive(struct ugdb_thread *thread) { WARN_ON((thread-t_tid != 0) != process_alive(thread-t_process)); return thread-t_tid != 0; } static inline void mark_thread_dead(struct ugdb_thread *thread) { mark_process_dead(thread-t_process); thread-t_tid = 0; } /* * The thread should be alive, and it can't pass ugdb_report_death() * if the caller
Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case
On 08/19, Roland McGrath wrote: On 08/16, Roland McGrath wrote: Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5 get_utrace_lock() must not succeed if utrace-reap == T, this becomes a bit off-topic. However, I thought about relaxing the dead check in get_utrace_lock(), instead of utrace-reap we could check utrace-reap !utrace-death. In fact, initially I was going to do this, but then decided to make the simpler patch for now. When utrace-reap is set, it means release_task() has been called. The API caller has no way to know reaping hasn't already begun--except if its report_death callback has not returned yet. No! the task can be already reaped even before -report_death() was called. The task_struct itself can't go away, but that is all (perhaps you meant this). And this is the problem for ugdb. Damn, I knew this, that is why ugdb_process_exit() doesn't try to read -group_exit_code. Still I managed to forget that this has other implications. But I think that the API definition of once release_task() has been called (i.e. entered, maybe not returned yet), any utrace call will get -ESRCH, is a clear and comprehensible constraint. We should not open any holes in that. I am not sure. I had another reason in mind to check reap !death in get_utrace_lock(), synchronization with -report_death() callback. But OK. Let's forget about more utrace changes for now. Oleg.
Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction
On 08/19, Roland McGrath wrote: Wait. It doesn't break this. It only breaks -EALREADY contract. And I don't understand why this is bad. The -EALREADY contract lets you have a report_death callback that does all your cleanup and then returns UTRACE_DETACH. I think this is possible without the current -EALREADY contract. To have that plan, you need a way for an asynchronous detach attempt to be excluded once report_death has begun, and for that asynchronous caller to know that's the situation. The problem is, with the current conract detach is never simple if you have report_death. You always have to synchronize with this callback. But see below. It breaks the documented API. I am of course open to changing the API. But we need to get completely clear on what the new range of plans we'll support is, and make the documentation and the implementation match. Yes. I thought it makes sense to change the API and docs if we can improve things, before utrace is widely used. But OK, lets's forget about this. Why? What is the point? What makes UTRACE_EVENT(DEATH) so special? I do not see the logic at all. It is special because it is the last interesting event to a traditional user such as ptrace. Hence it is attractive to write a report_death callback that returns UTRACE_DETACH spontaneously, i.e. without an asynchronous caller (i.e. debugger thread) having requested the detach. Sure. And this is still possible. If -report_death() does something which the caller of utrace_control() should know, then they should take care of synchronization anyway. With or without this patch, utrace_control(DEATH) can return 0 after -report_death() was already called. If your engine's report_death always returns UTRACE_DETACH, then utrace_control(UTRACE_DETACH) can never return 0 once report_death is about to be called nor any time thereafter. I don't undestand this. OK, probably I missed something, this doesn't matter. But, the current state of play from the broader perspective is that we really have no idea about the future viability of the utrace API. I don't think it makes a lot of sense to put a great deal of effort into perfecting the corners of the API much further when we don't really have any good reasons to think that this API will be the basis of anything that survives in the long run. Yes. I have to agree, this is good point. So, lets forget about these changes, at least for now. The current focus of work is your ugdb prototype. If some utrace changes make it greatly easier to get that to kind-of-working status, then they are worthwhile. No. I didn't think about ugdb at all. I'll find the workaround for ugdb. If nothing else, I can use utrace_engine_ops-release(). Or something else. If it works in practice but has disturbing robustness holes, it is OK to just comment that loudly now and move on to the next battle, IMHO. OK, agreed. Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/19, Roland McGrath wrote: OK, instead of filling utrace_detached_ops{} we can change start_callback() - if (want UTRACE_EVENT(QUIESCE)) { + if ((want UTRACE_EVENT(QUIESCE)) || ops == detached_ops) { If we're going to have a special-case check for utrace_detached_ops, then it can just short-circuit entirely and there is no need for any callbacks in utrace_detached_ops. (Then we might even use NULL or (void *)-1l instead of utrace_detached_ops.) All it needs to do is: report-detaches = true; utrace-reporting = NULL; return NULL; Yes, I thought about this too, see the changelog. But probably we need utrace_detached_ops-report_reap() anyway. -report_death can return UTRACE_DETACH, and after that utrace_report_death() can skip utrace_reset() and do -report_reap(). And, in particular, I don't understand this code in utrace_get_signal: } else if (!(task-utrace_flags UTRACE_EVENT(QUIESCE))) { /* * We only got here to clear utrace-signal_handler. */ return -1; } How so? What if we were called because of utrace_control(INTERRUPT) and QUIESCE is not set? If an engine used UTRACE_INTERRUPT without having first set QUIESCE, then it has no rightful expectation of getting any callback. Hmm. This is certainly new to me. Could you confirm? If yes, shouldn't we change utrace_control() - if (action = UTRACE_REPORT action UTRACE_RESUME + if (action = UTRACE_INTERRUPT action UTRACE_RESUME unlikely(!(engine-flags UTRACE_EVENT(QUIESCE { then? I must admit, I do not understand why INTERRUPT needs QUIESCE. utrace_get_signal() should respect QUIESCE, this is clear. But why it is needed? Oleg.
Re: [PATCH] fix mark_engine_detached() vs start_callback() race
On 08/19, Roland McGrath wrote: I think in the longer term mark_engine_detached() should not change engine-flags at all but add QUIESCE to -utrace_flags. However, this breaks utrace_maybe_reap(reap = true) and we should avoid the race with finish_callback() which clears -reporting after report_quiesce(). What's the benefit to adding QUIESCE? If any utrace code gets entered at all, then any callback run will be able to do the special-case ops check for detached engines. Well yes, but otoh it could be the last engine. We shouldn't miss, say, start_callback() from utrace_resume() (although currently -spurious helps). Anyway, this needs more thinking, and probably you are right and QUIESCE is not needed. Oleg.
Re: gdbstub initial code, v5
On 08/23, Oleg Nesterov wrote: However. I spent all Monday trying to resolve the new bug, and so far I do not understand what happens. Extremely hard to reproduce, and the kernel just hangs silently, without any message. So far I suspect the proble in utrace.c, but this time I am not sure. Solved. This was scheduler bug fixed in 2.6.35, but I used 2.6.34. This is really funny. This bug (PF_STARTING lockup) was found and fixed by me Peter. Oh. But I hit yet another problem, BUG_ON() in __utrace_engine_release(). Again, it is not reproducible, I saw it only once in dmesg and I do not even know for sure what I was doing. I'll contiue tomorrow, but if I won't be able to quickly resolve this problem I am going to ignore it for now. This time I think ugdb is wrong. Oleg.
Re: gdbstub initial code, v5
Just a small report to explain what I am doing... On 08/20, Oleg Nesterov wrote: - I forgot to implement the attach to the thread group with the dead leader. Next time. Almost done, but we should avoid the races with exec somehow. But this is minor. I tried to test this code as much as I can. Again, I do not use gdb at all, I am using the scripts which try to really stress ugdb. Found 2 bugs in ugdb.ko, the second one is not nice but at least I have the temporary fix. However. I spent all Monday trying to resolve the new bug, and so far I do not understand what happens. Extremely hard to reproduce, and the kernel just hangs silently, without any message. So far I suspect the proble in utrace.c, but this time I am not sure. Will continue tomorrow... Oleg.
gdbstub initial code, v5
On 08/19, Oleg Nesterov wrote: Next step: handle exit correctly and report W/S. I misunderstood what gdbserver does when the main thread exits, it is not stupid as I wrongly thought. Yes, in non-stop mode gdbserver reports W/X;process:PID when the last thread exits. This makes sense, so does ugdb. But, == All, please ack/nack the behavioral difference! When the main thread exits, gdbserver still exposes it to gdb as a running process. It is visible via info threads, you can switch to this thread, $Tp or $Hx result in OK as if this thread is alive. gdbserver even pretends that $vCont;x:DEAD_THEAD works, although this thread obviously can never report something. I don't think this is really right. This just confuses the user, and imho this should be considered like the minor bug. ugdb doesn't do this. If the main thread exits - it exits like any other thread. I played with gdb, it seems to handle this case fine. == Problems: - I forgot to implement the attach to the thread group with the dead leader. Next time. - The exit code (Wxx) can be wrong in mt-case. The problem is, -report_death can't safely access -group_exit_code with kernel 2.6.35. This is solveable. Roland, sorry, I ignored your emails for today. It is not easy to me to switch between ugdb an utrace ;) Oleg. #include linux/module.h #include linux/proc_fs.h #include linux/utrace.h #include linux/poll.h #include linux/mm.h #include linux/regset.h #include asm/uaccess.h static int o_remote_debug; module_param_named(echo, o_remote_debug, bool, 0); #define BUFFER_SIZE 1024 #define PACKET_SIZE 1024 struct pbuf { char*cur, *pkt; charbuf[BUFFER_SIZE]; }; static inline void pb_init(struct pbuf *pb) { pb-cur = pb-buf; pb-pkt = NULL; } enum { U_STOP_IDLE = 0, U_STOP_PENDING, U_STOP_SENT, }; struct ugdb { struct list_headu_processes; struct list_headu_stopped; int u_stop_state; struct mutexu_mutex; spinlock_t u_slock; struct ugdb_thread *u_cur_tinfo, *u_cur_hg, *u_cur_hc; wait_queue_head_t u_wait; int u_err; struct pbuf u_pbuf; charu_cbuf[PACKET_SIZE]; int u_clen; unsigned int u_no_ack:1, u_allstop:1; }; static inline void ugdb_ck_stopped(struct ugdb *ugdb) { // XXX: temporary racy check WARN_ON(!list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_IDLE); WARN_ON(list_empty(ugdb-u_stopped) ugdb-u_stop_state == U_STOP_PENDING); } static struct ugdb *ugdb_create(void) { struct ugdb *ugdb; int err; err = -ENODEV; // XXX: ugly. proc_reg_open() should take care. if (!try_module_get(THIS_MODULE)) goto out; err = -ENOMEM; ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL); if (!ugdb) goto put_module; INIT_LIST_HEAD(ugdb-u_processes); INIT_LIST_HEAD(ugdb-u_stopped); mutex_init(ugdb-u_mutex); spin_lock_init(ugdb-u_slock); init_waitqueue_head(ugdb-u_wait); pb_init(ugdb-u_pbuf); return ugdb; put_module: module_put(THIS_MODULE); out: return ERR_PTR(err); } #define P_DETACHING (1 1) #define P_ZOMBIE(1 2) struct ugdb_process { int p_pid; int p_state; struct list_headp_threads; struct ugdb *p_ugdb; struct list_headp_processes; }; static inline bool process_alive(struct ugdb_process *process) { return !(process-p_state P_ZOMBIE); } static inline void mark_process_dead(struct ugdb_process *process) { process-p_state |= P_ZOMBIE; } static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid) { struct ugdb_process *process; process = kzalloc(sizeof(*process), GFP_KERNEL); if (!process) return NULL; process-p_pid = pid; process-p_ugdb = ugdb; INIT_LIST_HEAD(process-p_threads); list_add_tail(process-p_processes, ugdb-u_processes); return process; } #define T_STOP_RUN 0 #define T_STOP_REQ (1 0)/* requested by gdb */ #define T_STOP_ALL (1 1)/* vCont;c:pX.-1, for report_clone */ #define T_STOP_ACK (1 2)/* visible to vStopped */ #define T_STOP_STOPPED (1
Re: gdbstub initial code, v4
On 08/19, Roland McGrath wrote: Note the second attachment, GDBCAT. It is just the simple perl script which connects /proc/ugdb to tcp port. It can be used for remote debugging via tcp, or with (unpatched) gdb which can't do select() on /proc/ugdb. bash$ nc -l 2000 /proc/ugdb Yes, this works, Actually, it is more convenient to use it in any case, at least for logging purposes. (gdb) set remote debug 1 set debug remote 1. Yes, I tried this. Very, very inconvenient. In this case the debugging output is mixed with the normal output, Also: you can't save to file, can't filter packets, can't add the packet for debugging (not implemented yet). But yes, it is not strictly necessary, nc should work too. Oleg.
Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction
On 08/17, Oleg Nesterov wrote: On 08/16, Roland McGrath wrote: The problem is, utrace_control(DETACH) does nothing and returns -EALREADY if utrace-death is set, this is not right. We can and should detach in this case, we only should skip utrace_reset() to avoid the race with utrace_report_death()-REPORT_CALLBACKS(). This behavior is the original (minimal) synchronization scheme from before we had utrace_barrier. See Interlock with final callbacks in Documentation/DocBook/utrace.tmpl. OK, I agree my patch breaks this part Normally functionutrace_control/function called with constantUTRACE_DETACH/constant returns zero, and this means that no more callbacks will be made. of documentation (which I never read ;) Wait. It doesn't break this. It only breaks -EALREADY contract. And I don't understand why this is bad. But in any case, personally I dislike the current behaviour anyway, I think this certainly complicates the life for module writers. Instead of simple detach + barrier, you always need the nontrivial code if report_quiesce/death ever touches engine-data! This can't be good. Yes. Roland, could you explain once again why do you dislike this patch? Once again. Currently utrace_control(DETACH) refuses to even try to detach the engine if utrace-death is set. Why? What is the point? What makes UTRACE_EVENT(DEATH) so special? I do not see the logic at all. If -report_death() does something which the caller of utrace_control() should know, then they should take care of synchronization anyway. With or without this patch, utrace_control(DEATH) can return 0 after -report_death() was already called. Currently, utrace_control(DETACH) returns -EALREADY when this callback was alredy called, or it can be called later, or it may be running. And utrace_barrier() can't help. With this patch utrace_control(DETACH) returns either 0 with the same meaning (will not be called later, but probably was already called before utrace_control), or -EINPROGRESS which suggests to use utrace_barrier(). Please correct me, but I think this certainly makes things much simpler. Otherwise UTRACE_DETACH is never trivial (and iiuc it is better to avoid utrace_engine_ops-release() hook). What do you think? Oleg.
[PATCH] fix mark_engine_detached() vs start_callback() race
Suppose that engine-flags == UTRACE_EVENT(EXEC), QUIESCE bit is not set. 1. start_callback() reads want = engine-flags (== EXEC) 2. mark_engine_detached() sets engine-ops = utrace_detached_ops 3. start_callback() gets ops = utrace_detached_ops After that start_callback() skips if (want UTRACE_EVENT(QUIESCE)) block and returns utrace_detached_ops, then -report_exec == NULL will be called. This is the minimal temporary ugly fix for now, we should certainly cleanup and simplify this logic. The barriers in mark_engine_detached() and in start_callback() can't help and should be removed. If we ignore utrace_get_signal() we do not even need utrace_detached_quiesce(), start_callback() could just do ops = engine-ops; if (ops == utrace_detached_ops) { report-detaches = true; return NULL; } I think in the longer term mark_engine_detached() should not change engine-flags at all but add QUIESCE to -utrace_flags. However, this breaks utrace_maybe_reap(reap = true) and we should avoid the race with finish_callback() which clears -reporting after report_quiesce(). A bit off-topic, but I don't think finish_callback() should check engine-ops == utrace_detached_ops before return. Instead we should change finish_callback_report() to return the boolean. We shouldn't return true without setting report-detaches. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- kstub/kernel/utrace.c~8_fix_mark_detached_without_quiesce 2010-08-18 16:46:08.0 +0200 +++ kstub/kernel/utrace.c 2010-08-18 17:47:53.0 +0200 @@ -1522,7 +1522,7 @@ static const struct utrace_engine_ops *s smp_rmb(); ops = engine-ops; - if (want UTRACE_EVENT(QUIESCE)) { + if ((want UTRACE_EVENT(QUIESCE)) || ops == utrace_detached_ops) { if (finish_callback(task, utrace, report, engine, (*ops-report_quiesce)(report-action, engine, event)))
[PATCH?] avoid the unnecessary utrace_resume()-utrace_reset()
utrace_resume(UTRACE_REPORT) always calls utrace_reset() because start_callback() obviously can't clear report-spurious when event == 0. Change start_callback() to correctly clear -spurious in this case. We could probably clear it in utrace_resume() unconditionally after reporting loop, but this is not exactly right if there are no engines which have QUIESCE in engine-flags. utrace_get_signal() probably has the same problem. Note: utrace_control(DETACH) does utrace_do_stop() and sets UTRACE_REPORT if the tracee is not stopped. It also does mark_engine_detached() which does not set QUIESCE in target-utrace_flags. This means we rely on report.spurious which should provoke utrace_reset() from utrace_resume() if target-utrace_flags doesn't have QUIESCE. A bit too subtle, imho. Also, UTRACE_REPORT can be lost because of UTRACE_INTERRUPT or normal signal: utrace_get_signal() checks utrace_flags UTRACE_EVENT(QUIESCE) and returns otherwise. This should be fixed somehow. This check is wrong anyway, but it is not clear how we can fix the race with DETACH. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- kstub/kernel/utrace.c~10_utrace_resume_and_spurious 2010-08-18 19:00:50.0 +0200 +++ kstub/kernel/utrace.c 2010-08-18 19:41:05.0 +0200 @@ -1540,7 +1540,7 @@ static const struct utrace_engine_ops *s if (want ENGINE_STOP) report-action = UTRACE_STOP; - if (want event) { + if (want (event ?: UTRACE_EVENT(QUIESCE))) { report-spurious = false; return ops; }
Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case
On 08/16, Roland McGrath wrote: - It is possible that both -death and -reap are true. In this case it is OK to clear UTRACE_EVENT(REAP), but set_events fails. No, it's not OK to clear it. Once -reap is set, then the engine's ops-report_reap might or might not have been called already. Afaics - no. If utrace-death is set (and we check it under utrace-lock) we can ignore utrace-reap. In short, if ops-report_reap can be called before -death is cleared, then 2 possible callers of utrace_maybe_reap() can race with each other, but this can't happen. utrace-death == T means: - (utrace_flags _UTRACE_DEATH_EVENTS) == T - utrace-death was set by utrace_report_death() which will take utrace-lock later and clear -death, only then it may call ops-report_reap(). - until utrace_report_death() clears -death, _UTRACE_DEATH_EVENTS must be set in -utrace_flags, otherwise utrace_maybe_reap(true) is buggy. Note that both utrace-death and _UTRACE_DEATH_EVENTS are cleared atomically from utrace-lock pov. IOW. utrace-death is true, then IF (utrace-reap) tracehook_prepare_releas()-utrace_maybe_reap(true) was already called, this is how utrace-reap was set. But utrace_maybe_reap() did nothing and returned. We rely on the subsequent utrace_maybe_reap(false) from utrace_report_death() - but this can't happen until we drop utrace-lock ELSE utrace-reap can't be set until we drop utrace-lock. Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5 get_utrace_lock() must not succeed if utrace-reap == T, this becomes a bit off-topic. However, I thought about relaxing the dead check in get_utrace_lock(), instead of utrace-reap we could check utrace-reap !utrace-death. In fact, initially I was going to do this, but then decided to make the simpler patch for now. Oleg.
Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks
On 08/16, Roland McGrath wrote: In short: this double check allows to avoid tasklist when utrace_flags already has these bits while engine-flags doesn't. But multitracing is unlikely case, in the likely case if we add _UTRACE_DEATH_EVENTS to engine-flags we set these bits in utrace_flags too. I think it makes sense to consolidate these checks to make the code a bit more understandable. I don't really agree about unlikely case. In many uses, the systemtap task-finder will have a utrace engine on every task in the system, for example. Moreover, this is an importantly distinct particular kind of micro-optimization to be doing here: avoiding a system-wide lock. Any place that we take tasklist_lock at all, we are introducing a system-wide slowdown or limitation on scaling, just because of our tracing of one task. So optimizing the minimize that is qualitatively different and really much more important than avoiding taking the utrace lock, for example. But with that note in your mind, I am happy to take this patch now as a simplification and let reoptimization come back later. OK, agreed. Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/17, Oleg Nesterov wrote: Suppose that, say, engine-flags == UTRACE_EVENT(EXEC) (no QUIESCE). 1. start_callback() reads want = engine-flags (== EXEC) 2. mark_engine_detached() sets engine-ops = utrace_detached_ops 3. start_callback() gets ops = utrace_detached_ops After that start_callback() skips if (want UTRACE_EVENT(QUIESCE)) block and returns utrace_detached_ops, utrace_detached_ops-report_exec == NULL will be called. OK, instead of filling utrace_detached_ops{} we can change start_callback() - if (want UTRACE_EVENT(QUIESCE)) { + if ((want UTRACE_EVENT(QUIESCE)) || ops == detached_ops) { But this is minor. Roland, I can't explain this, but I have the strong gut feeling there is something else we should worry about. I am still reading this code... Or not... I'll recheck once again tomorrow and try to make the patch. In particular, I'd like to think if we can simplify this somehow. Say, avoid the barriers somewhere, or avoid changing engine-flags in mark_engine_detached(). I am worried that mark_engine_detached() sets engine-flags, but it doesn't add QUIESCE to utrace_flags. I can't convince myself that utrace_do_stop() closes all possible races. And, in particular, I don't understand this code in utrace_get_signal: } else if (!(task-utrace_flags UTRACE_EVENT(QUIESCE))) { /* * We only got here to clear utrace-signal_handler. */ return -1; } How so? What if we were called because of utrace_control(INTERRUPT) and QUIESCE is not set? Oleg.
[PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction
Suppose that we want to detach the engine and free engine-data. To avoid the races with our calbacks which can use -data we should do either err = utrace_set_events(0); if (err == ...) utrace_barrier(); utrace_control(DETACH); or err = utrace_control(DETACH); if (err = ...) utrace_barrier(); Neither variant works if we should care about report_quiesce() or report_death(). Let's discuss the 2nd case, the 1st one have the similar problems. The problem is, utrace_control(DETACH) does nothing and returns -EALREADY if utrace-death is set, this is not right. We can and should detach in this case, we only should skip utrace_reset() to avoid the race with utrace_report_death()-REPORT_CALLBACKS(). This was the main problem I hit during the testing. Consider the exiting tracee with the valid engine-ops, suppose that utrace_control(DETACH) is called right after utrace_report_death() unlocks utrace-lock and before it does REPORT_CALLBACKS(). utrace_control() fails, but utrace_barrier() does not help after that. It takes get_utrace_lock() successfully and returns 0. The caller frees engine-data, but utrace_report_death() calls -report_death() after that. Kill utrace_control_dead(). Change utrace_control() to always allow UTRACE_DETACH if engine has the valid -ops. This means that mark_engine_detached() can race with REPORT_CALLBACKS() but there is nothing new. utrace_do_stop() is unnecessary and pointless in this case, but it doesn't hurt. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c | 49 ++--- 1 file changed, 10 insertions(+), 39 deletions(-) --- kstub/kernel/utrace.c~6_fix_control_dead2010-08-16 11:17:05.0 +0200 +++ kstub/kernel/utrace.c 2010-08-16 11:17:51.0 +0200 @@ -928,37 +928,6 @@ void utrace_maybe_reap(struct task_struc } } -/* - * You can't do anything to a dead task but detach it. - * If release_task() has been called, you can't do that. - * - * On the exit path, DEATH and QUIESCE event bits are set only - * before utrace_report_death() has taken the lock. At that point, - * the death report will come soon, so disallow detach until it's - * done. This prevents us from racing with it detaching itself. - * - * Called only when @target-exit_state is nonzero. - */ -static inline int utrace_control_dead(struct task_struct *target, - struct utrace *utrace, - enum utrace_resume_action action) -{ - lockdep_assert_held(utrace-lock); - - if (action != UTRACE_DETACH || unlikely(utrace-reap)) - return -ESRCH; - - if (unlikely(utrace-death)) - /* -* We have already started the death report. We can't -* prevent the report_death and report_reap callbacks, -* so tell the caller they will happen. -*/ - return -EALREADY; - - return 0; -} - /** * utrace_control - control a thread being traced by a tracing engine * @target:thread to affect @@ -1115,18 +1084,20 @@ int utrace_control(struct task_struct *t ret = 0; /* -* -exit_state can change under us, this doesn't matter. -* We do not care about -exit_state in fact, but we do -* care about -reap and -death. If either flag is set, -* we must also see -exit_state != 0. +* -exit_state can change under us, this doesn't matter. But, +* if utrace-death is set we must also see -exit_state != 0, +* this is what we care about. */ if (unlikely(target-exit_state)) { - ret = utrace_control_dead(target, utrace, action); - if (ret) { + /* You can't do anything to a dead task but detach it */ + if (action != UTRACE_DETACH) { spin_unlock(utrace-lock); - return ret; + return -ESRCH; } - reset = true; + + /* If it is not set, we can safely do utrace_reset() */ + if (likely(!utrace-death)) + reset = true; } switch (action) {
[PATCH 3/3] utrace_set_events: kill now unnecessary exit_state/reap checks
Now that get_utrace_lock() can never succeed if utrace-reap is true, we can kill this check in utrace_set_events(). This also means we can kill the target-exit_state check, it buys nothing now. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) --- kstub/kernel/utrace.c~7_set_events_kill_reap2010-08-16 11:17:51.0 +0200 +++ kstub/kernel/utrace.c 2010-08-16 11:28:00.0 +0200 @@ -539,15 +539,10 @@ int utrace_set_events(struct task_struct old_utrace_flags = target-utrace_flags; old_flags = engine-flags ~ENGINE_STOP; - /* If -death or -reap is true we must see exit_state != 0. */ - if (target-exit_state) { - if (utrace-death) { - if ((old_flags ~events) _UTRACE_DEATH_EVENTS) - goto unlock; - } else if (utrace-reap) { - if ((old_flags ^ events) UTRACE_EVENT(REAP)) - goto unlock; - } + if ((old_flags ~events) _UTRACE_DEATH_EVENTS) { + /* Too late if utrace_report_death() is in progress */ + if (utrace-death) + goto unlock; } /*
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/16, Oleg Nesterov wrote: Well, I tried to test these changes, seems to work. But... From 2/3 changelog: This means that mark_engine_detached() can race with REPORT_CALLBACKS() but there is nothing new. Yes, there is nothing new. But, Roland, this is WRONG! With or without these fixes utrace_control()-mark_engine_detached() was always wrong if it races with REPORT/REPORT_CALLBACKS. Not sure what we were thinking about :/ Look. Suppose that utrace_control(DETACH) is called right after start_callback() returns ops != NULL. After that finish_callback(..., ops-callback(...)) can hit -callback == NULL! Cough. I guess I spent too much time with utrace and testing today ;) Let me try again. mark_engine_detached() does engine-ops = utrace_detached_ops; smp_wmb(); engine-flags = UTRACE_EVENT(QUIESCE); Whatever we do, start_callback() can see the old engine-flags but the new -ops = utrace_detached_ops. Just suppose that the caller of UTRACE_DETACH is interrupted right after setting engine-ops. The obvious and simplest solution: utrace_detached_ops{} should have the dummy handler for every callback. But I'd like to think a bit more. Yes. Perhaps (unlikely) we can reverse the order, but I can no longer think properly today. Or do you think I miss something and this is false alarm? Oleg.