[RFC v2 00/19] utrace for 3.0 kernel

2011-06-30 Thread Oleg Nesterov
Another attempt. This version tries to decouple utrace and ptrace.
This way it is much simpler to follow the upstream changes, afaics.

TODO:

- The single-stepping updates in ptrace_resume() can race
  with utrace_reset()-user_disable_single_step().

  This was fixed by 20/20, but I noticed that this patch
  is buggy right before sending.

- Perhaps PTRACE_SYSEMU/TIF_SYSCALL_EMU logic was broken,
  I need to recheck.

- Testing.

Oleg.



[PATCH 06/19] wait_task_inactive: treat task-state and match_state as bitmasks

2011-06-30 Thread Oleg Nesterov
Change wait_task_inactive() to check state  match_state instead of
state == match_state. This should not make any difference, but this
allows us to add more stopped bits which can be set or cleared
independently.

IOW. wait_task_inactive() assumes that if task-state != 0, it can
only be changed to TASK_RUNNING. Currently this is true, and in this
case state  match_state continues to work. But, unlike the current
check, it also works if task-state has other bits set while the caller
is only interested in, say, __TASK_TRACED.

Note: I think wait_task_inactive() should be cleanuped upstrean anyway,
nowadays we have TASK_WAKING and task-state != 0 doesn't necessarily
mean it is TASK_RUNNING. It also makes sense to exclude the !TASK_REPORT
bits during the check. Finally, probably this patch makes sense anyway
even without utrace. For example, a stopped _and_ traced thread could
have task-state = TASK_STOPPED | TASK_TRACED, this can be useful.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/sched.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 3f2e502..ade7997 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2277,7 +2277,7 @@ unsigned long wait_task_inactive(struct task_struct *p, 
long match_state)
 * is actually now running somewhere else!
 */
while (task_running(rq, p)) {
-   if (match_state  unlikely(p-state != match_state))
+   if (match_state  !likely(p-state  match_state))
return 0;
cpu_relax();
}
-- 
1.5.5.1




[PATCH 04/19] introduce wake_up_quiescent()

2011-06-30 Thread Oleg Nesterov
No functional changes. Add the new helper, wake_up_quiescent(task, state),
which simply returns wake_up_state(task, state). Change all callers which
do wake_up_state(STOPPED/TRACED) to use the new helper. ptrace_stop() is
a bit special, it does __set_current_state(RUNNING) in the very unlikely
case, change it as well.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/signal.h |2 ++
 kernel/ptrace.c|2 +-
 kernel/signal.c|   12 ++--
 kernel/utrace.c|2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a822300..2be3712 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -239,6 +239,8 @@ static inline int valid_signal(unsigned long sig)
 struct timespec;
 struct pt_regs;
 
+extern int wake_up_quiescent(struct task_struct *p, unsigned int state);
+
 extern int next_signal(struct sigpending *pending, sigset_t *mask);
 extern int do_send_sig_info(int sig, struct siginfo *info,
struct task_struct *p, bool group);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9988b13..26ae214 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -566,7 +566,7 @@ static int ptrace_resume(struct task_struct *child, long 
request,
child-exit_code = data;
 
if (lock_task_sighand(child, flags)) {
-   wake_up_state(child, __TASK_TRACED);
+   wake_up_quiescent(child, __TASK_TRACED);
unlock_task_sighand(child, flags);
}
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 2138cee..4fcf1c7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -652,6 +652,14 @@ void signal_wake_up(struct task_struct *t, int resume)
 }
 
 /*
+ * wakes up the STOPPED/TRACED task, must be called with -siglock held.
+ */
+int wake_up_quiescent(struct task_struct *p, unsigned int state)
+{
+   return wake_up_state(p, state);
+}
+
+/*
  * Remove signals in mask from the pending set and queue.
  * Returns 1 if any signals were found.
  *
@@ -811,7 +819,7 @@ static int prepare_signal(int sig, struct task_struct *p, 
int from_ancestor_ns)
do {
task_clear_group_stop_pending(t);
rm_from_queue(SIG_KERNEL_STOP_MASK, t-pending);
-   wake_up_state(t, __TASK_STOPPED);
+   wake_up_quiescent(t, __TASK_STOPPED);
} while_each_thread(p, t);
 
/*
@@ -1800,7 +1808,7 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
do_notify_parent_cldstop(current, false, why);
 
spin_lock_irq(current-sighand-siglock);
-   __set_current_state(TASK_RUNNING);
+   wake_up_quiescent(current, __TASK_TRACED);
spin_unlock_irq(current-sighand-siglock);
 
if (clear_code)
diff --git a/kernel/utrace.c b/kernel/utrace.c
index 6e7fafb..d7c547c 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -648,7 +648,7 @@ static void utrace_wakeup(struct task_struct *target, 
struct utrace *utrace)
 {
lockdep_assert_held(utrace-lock);
spin_lock_irq(target-sighand-siglock);
-   wake_up_state(target, __TASK_TRACED);
+   wake_up_quiescent(target, __TASK_TRACED);
spin_unlock_irq(target-sighand-siglock);
 }
 
-- 
1.5.5.1




[PATCH 09/19] tracehooks: kill tracehook_finish_jctl(), add tracehook_finish_stop()

2011-06-30 Thread Oleg Nesterov
tracehook_finish_jctl() is needed to avoid the races with SIGKILL
which wakes up UTRACED task, and thus it should be called every time
after the STOPPED/TRACED/UTRACED returns from schedule(), remember
that TASK_UTRACED can be added while the task is STOPPED/UTRACED.

- rename it to tracehook_finish_stop(),jctl no longer matches the
  reality.

- change do_signal_state() to call this helper right after schedule(),
  otherwise this logic is broken by the upstream changes

- now that utrace doesn't control TASK_TRACED bit, ptrace_stop() must
  call this helper too.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/tracehook.h |6 +++---
 kernel/signal.c   |5 +++--
 kernel/utrace.c   |2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 7d7bdde..3c7b6b3 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -528,11 +528,11 @@ static inline int tracehook_get_signal(struct task_struct 
*task,
 }
 
 /**
- * tracehook_finish_jctl - report about return from job control stop
+ * tracehook_finish_stop - report about return from STOPPED/TRACED
  *
- * This is called by do_signal_stop() after wakeup.
+ * This is called by do_signal_stop() and ptrace_stop after wakeup.
  */
-static inline void tracehook_finish_jctl(void)
+static inline void tracehook_finish_stop(void)
 {
if (task_utrace_flags(current))
utrace_finish_stop();
diff --git a/kernel/signal.c b/kernel/signal.c
index 4fcf1c7..a7979ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1816,6 +1816,7 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
read_unlock(tasklist_lock);
}
 
+   tracehook_finish_stop();
/*
 * While in TASK_TRACED, we were considered frozen enough.
 * Now that we woke up, it's crucial if we're supposed to be
@@ -1952,6 +1953,8 @@ retry:
/* Now we don't run again until woken by SIGCONT or SIGKILL */
schedule();
 
+   tracehook_finish_stop();
+
spin_lock_irq(current-sighand-siglock);
} else {
ptrace_stop(current-group_stop  GROUP_STOP_SIGMASK,
@@ -1974,8 +1977,6 @@ retry:
 
spin_unlock_irq(current-sighand-siglock);
 
-   tracehook_finish_jctl();
-
return 1;
 }
 
diff --git a/kernel/utrace.c b/kernel/utrace.c
index be98607..daa96b9 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -741,7 +741,7 @@ static bool utrace_reset(struct task_struct *task, struct 
utrace *utrace)
 void utrace_finish_stop(void)
 {
/*
-* If we were task_is_traced() and then SIGKILL'ed, make
+* If we were task_is_utraced() and then SIGKILL'ed, make
 * sure we do nothing until the tracer drops utrace-lock.
 */
if (unlikely(__fatal_signal_pending(current))) {
-- 
1.5.5.1




[PATCH 10/19] teach wake_up_quiescent() to do selective wake_up

2011-06-30 Thread Oleg Nesterov
Both utrace and ptrace can want the same thread to be quiescent, in this
case its state is TASK_TRACED | TASK_UTRACED. And this also means that
this task must not run unless both utrace and ptrace resume it.

Change wake_up_quiescent(p, state) to do p-state = ~state and return
false unless there is no more quiescent bits in task-state.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/signal.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index a7979ad..57552e6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -651,11 +651,26 @@ void signal_wake_up(struct task_struct *t, int resume)
kick_process(t);
 }
 
+#define STATE_QUIESCENT(__TASK_STOPPED | __TASK_TRACED | 
__TASK_UTRACED)
 /*
  * wakes up the STOPPED/TRACED task, must be called with -siglock held.
  */
 int wake_up_quiescent(struct task_struct *p, unsigned int state)
 {
+   unsigned int quiescent = (p-state  STATE_QUIESCENT);
+
+   WARN_ON(state  ~(STATE_QUIESCENT | TASK_INTERRUPTIBLE));
+
+   if (quiescent) {
+   state = ~TASK_INTERRUPTIBLE;
+   if ((quiescent  ~state) != 0) {
+   p-state = ~state;
+   WARN_ON(!(p-state  STATE_QUIESCENT));
+   WARN_ON(!(p-state  TASK_WAKEKILL));
+   return 0;
+   }
+   }
+
return wake_up_state(p, state);
 }
 
-- 
1.5.5.1




[PATCH 11/19] ptrace_stop: do not assume the task is running after wake_up_quiescent()

2011-06-30 Thread Oleg Nesterov
If ptrace_stop() sets TASK_TRACED and then detects we should not stop,
it can race with utrace_do_stop() which can see TASK_TRACED and add
TASK_UTRACED. In this case we should stop for utrace needs.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/signal.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 57552e6..89e691d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1829,6 +1829,14 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
if (clear_code)
current-exit_code = 0;
read_unlock(tasklist_lock);
+
+   /*
+* It is possible that __TASK_UTRACED was added by utrace
+* while we were __TASK_TRACED and before we take -siglock
+* for wake_up_quiescent(), we need to block in this case.
+* Otherwise this is unnecessary but absolutely harmless.
+*/
+   schedule();
}
 
tracehook_finish_stop();
-- 
1.5.5.1




[PATCH 17/19] teach ptrace_set_syscall_trace() to play well with utrace

2011-06-30 Thread Oleg Nesterov
1. ptrace_set_syscall_trace(true)-set_tsk_thread_flag(TIF_SYSCALL_TRACE)
   can race with utrace_control()-utrace_reset() path which can miss
   PT_SYSCALL_TRACE and clear TIF_SYSCALL_TRACE after it was already set.

2. ptrace_set_syscall_trace(false) clears TIF_SYSCALL_TRACE and this is
   not utrace-friendly, it can need this flag.

Change ptrace_set_syscall_trace() to take task_utrace_lock(), this is
enough to fix the 1st problem. Check task_utrace_flags() before clearing
TIF_SYSCALL_TRACE, this fixes 2.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/ptrace.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0825a01..209ea2d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -23,6 +23,7 @@
 #include linux/uaccess.h
 #include linux/regset.h
 #include linux/hw_breakpoint.h
+#include linux/utrace.h
 
 static void ptrace_signal_wake_up(struct task_struct *p, int quiescent)
 {
@@ -39,13 +40,16 @@ static void ptrace_signal_wake_up(struct task_struct *p, 
int quiescent)
 
 static void ptrace_set_syscall_trace(struct task_struct *p, bool on)
 {
+   task_utrace_lock(p);
if (on) {
p-ptrace |= PT_SYSCALL_TRACE;
set_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
} else {
p-ptrace = ~PT_SYSCALL_TRACE;
-   clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
+   if (!(task_utrace_flags(p)  UTRACE_EVENT_SYSCALL))
+   clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
}
+   task_utrace_unlock(p);
 }
 
 /*
-- 
1.5.5.1




[PATCH 18/19] introduce PT_SINGLE_STEP and PT_SINGLE_BLOCK

2011-06-30 Thread Oleg Nesterov
Add the new internal ptrace flags, PT_SINGLE_STEP and PT_SINGLE_BLOCK.

Like PT_SYSCALL_TRACE, this is needed to avoid the unnecessary ptrace
reports when TIF_SINGLESTEP was set by another engine, not by ptrace.
Also, we need these bits to coordinate the user_*_single_step() calls
from ptrace and utrace.

TODO: update the !x86 ptrace code which does user_disable_single_step().

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 arch/x86/kernel/ptrace.c  |1 +
 include/linux/ptrace.h|5 -
 include/linux/tracehook.h |7 +--
 kernel/ptrace.c   |3 +++
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 807c2a2..7ab475f 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -807,6 +807,7 @@ static int ioperm_get(struct task_struct *target,
  */
 void ptrace_disable(struct task_struct *child)
 {
+   child-ptrace = ~(PT_SINGLE_STEP | PT_SINGLE_BLOCK);
user_disable_single_step(child);
 #ifdef TIF_SYSCALL_EMU
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 98d995d..65b1e4f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -91,6 +91,8 @@
 #define PT_TRACE_MASK  0x03f4
 
 #define PT_SYSCALL_TRACE   0x0001
+#define PT_SINGLE_STEP 0x0002
+#define PT_SINGLE_BLOCK0x0004
 
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT  31
@@ -189,7 +191,8 @@ static inline void ptrace_init_task(struct task_struct 
*child, bool ptrace)
child-ptrace = 0;
if (unlikely(ptrace)  (current-ptrace  PT_PTRACED)) {
child-ptrace = current-ptrace;
-   child-ptrace = ~PT_SYSCALL_TRACE;
+   child-ptrace =
+   ~(PT_SYSCALL_TRACE | PT_SINGLE_STEP | PT_SINGLE_BLOCK);
__ptrace_link(child, current-parent);
}
 
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 6ce7a37..06edb52 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -121,6 +121,9 @@ static inline __must_check int 
tracehook_report_syscall_entry(
return 0;
 }
 
+#define ptrace_wants_step()\
+   (current-ptrace  (PT_SINGLE_STEP | PT_SINGLE_BLOCK))
+
 /**
  * tracehook_report_syscall_exit - task has just finished a system call
  * @regs:  user register state of current task
@@ -143,7 +146,7 @@ static inline void tracehook_report_syscall_exit(struct 
pt_regs *regs, int step)
if (task_utrace_flags(current)  UTRACE_EVENT(SYSCALL_EXIT))
utrace_report_syscall_exit(regs);
 
-   if (step) {
+   if (step  ptrace_wants_step()) {
siginfo_t info;
user_single_step_siginfo(current, regs, info);
force_sig_info(SIGTRAP, info, current);
@@ -436,7 +439,7 @@ static inline void tracehook_signal_handler(int sig, 
siginfo_t *info,
 {
if (task_utrace_flags(current))
utrace_signal_handler(current, stepping);
-   if (stepping)
+   if (stepping  ptrace_wants_step())
ptrace_notify(SIGTRAP);
 }
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 209ea2d..44908d0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -575,13 +575,16 @@ static int ptrace_resume(struct task_struct *child, long 
request,
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
 #endif
 
+   child-ptrace = ~(PT_SINGLE_STEP | PT_SINGLE_BLOCK);
if (is_singleblock(request)) {
if (unlikely(!arch_has_block_step()))
return -EIO;
+   child-ptrace |= PT_SINGLE_BLOCK;
user_enable_block_step(child);
} else if (is_singlestep(request) || is_sysemu_singlestep(request)) {
if (unlikely(!arch_has_single_step()))
return -EIO;
+   child-ptrace |= PT_SINGLE_STEP;
user_enable_single_step(child);
} else {
user_disable_single_step(child);
-- 
1.5.5.1




[PATCH 12/19] get_signal_to_deliver: restructure utrace/ptrace signal reporting

2011-06-30 Thread Oleg Nesterov
get_signal_to_deliver() assumes that either tracehook_get_signal() does
nothing (without CONFIG_UTRACE), or it also reports the signal to ptrace
engine implemented on top of utrace. Now that ptrace works independently
this doesn't work.

Change the code to call ptrace_signal() after tracehook_get_signal().

Move -ptrace check from ptrace_signal() to get_signal_to_deliver(),
we do not want to change *return_ka if it was initialized by utrace
and the task is not traced.

IOW, roughly, ptrace acts as if it is the last attached engine, it
takes the final decision about the signal.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/signal.c |   24 +++-
 1 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 89e691d..d0e0c67 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2006,9 +2006,6 @@ retry:
 static int ptrace_signal(int signr, siginfo_t *info,
 struct pt_regs *regs, void *cookie)
 {
-   if (!task_ptrace(current))
-   return signr;
-
ptrace_signal_deliver(regs, cookie);
 
/* Let the debugger run.  */
@@ -2110,6 +2107,7 @@ relock:
signr = tracehook_get_signal(current, regs, info, return_ka);
if (unlikely(signr  0))
goto relock;
+
if (unlikely(signr != 0))
ka = return_ka;
else {
@@ -2117,18 +2115,18 @@ relock:
 GROUP_STOP_PENDING)  do_signal_stop(0))
goto relock;
 
-   signr = dequeue_signal(current, current-blocked,
-  info);
+   signr = dequeue_signal(current, current-blocked, 
info);
 
-   if (!signr)
-   break; /* will return 0 */
+   ka = sighand-action[signr-1];
+   }
 
-   if (signr != SIGKILL) {
-   signr = ptrace_signal(signr, info,
- regs, cookie);
-   if (!signr)
-   continue;
-   }
+   if (!signr)
+   break; /* will return 0 */
+
+   if (signr != SIGKILL  current-ptrace) {
+   signr = ptrace_signal(signr, info, regs, cookie);
+   if (!signr)
+   continue;
 
ka = sighand-action[signr-1];
}
-- 
1.5.5.1




Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework

2011-06-30 Thread Oleg Nesterov
On 06/21, Roland McGrath wrote:

  OK, so here's my (hacky) idea:
  (1) Forget ptrace-via-utrace.  Have utrace be a separate thing.  This
  way the recent ptrace changes won't matter.

This is what V2 does.

  (2) But, what about ptrace co-existing well with utrace?  Make them
  mutually exclusive - a ptraced-process can't be utraced and a
  utraced-process can't be ptraced.

 We had this situation before for a while.  It has the substantial downside
 that e.g. you cannot do any system-wide systemtap tracing without making
 all strace and gdb use impossible.

Yes, we can't make them mutually exclusive, this can't work. So V2 tries
to teach them play well together.

  Assuming the above is a semi-reasonable idea, it might be a lot less
  work than updating the ptrace-via-utrace code to handle the new ptrace
  changes.

 That's for Oleg to say.  (Sorry, Oleg. ;-)

Oh, I am not sure what is simpler ;)

Oleg.