Re: [PATCH 0/31] utrace for 3.1 kernel

2012-02-07 Thread Oleg Nesterov
On 02/06, Josh Stone wrote:

 On 02/06/2012 06:24 PM, Ying, Victor wrote:
  I have problem to checkout utrace via git clone
  git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git
  utrace-3.0, it always reports fatal: the remote end hung up
  unexpectedly; I tried both utrace-3.0 and utrace-3.1, none of them
  works.

 I don't think Oleg has restored his account since kernel.org was
 rebuilt.  He's been posting utrace on github instead, so try:
   git://github.com/utrace/linux.git

Yes, thanks Josh.

There are

utrace-3.0
utrace-3.1
utrace-3.2
utrace-3.3

branches for different kernel versions.

Oleg.



[PATCH 0/1] (Was: utrace for 3.3 kernel)

2012-01-30 Thread Oleg Nesterov
On 01/25, Josh Boyer wrote:

 I'll get this into rawhide this evening.

Thanks Josh.

There is another simple change, I forgout about it. It is purely
internal, has no effect outside of utrace.c.

Oleg.



[PATCH utrace-3.1 32/31] ptrace_report_syscall: check if TIF_SYSCALL_EMU is defined

2011-08-09 Thread Oleg Nesterov
From: Tony Breeds t...@bakeyournoodle.com

TIF_SYSCALL_EMU is x86 only, add ifdef into ptrace_report_syscall().

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/tracehook.h |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 90ca578..a1bac95 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -59,8 +59,12 @@ static inline void ptrace_report_syscall(struct pt_regs 
*regs)
 {
int ptrace = current-ptrace;
 
-   if (!(ptrace  PT_SYSCALL_TRACE)  !test_thread_flag(TIF_SYSCALL_EMU))
-   return;
+   if (!(ptrace  PT_SYSCALL_TRACE)) {
+#ifdef TIF_SYSCALL_EMU
+   if (!test_thread_flag(TIF_SYSCALL_EMU))
+#endif
+   return;
+   }
 
ptrace_notify(SIGTRAP | ((ptrace  PT_TRACESYSGOOD) ? 0x80 : 0));
 
-- 
1.5.5.1




[PATCH 0/31] utrace for 3.1 kernel

2011-08-03 Thread Oleg Nesterov
On 08/02, Oleg Nesterov wrote:

 utrace patches for 3.1 kernel. Untested, will try to do some tests
 tomorrow.

I tried to test it a bit, seems to work. But see the new
[PATCH 31/31] utrace_resume: check irqs_disabled() to shut up lockdep.

The whole series is available in

git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git utrace-3.1

Oleg.



[PATCH 04/31] tracehooks: reintroduce tracehook_consider_fatal_signal()

2011-08-03 Thread Oleg Nesterov
Add the killed tracehook_consider_fatal_signal() back. It has multiple
callers and it is not easy add the necessary checks inline.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 arch/s390/kernel/traps.c  |4 ++--
 include/linux/tracehook.h |   22 ++
 kernel/signal.c   |4 ++--
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
index ffabcd9..1018ab6 100644
--- a/arch/s390/kernel/traps.c
+++ b/arch/s390/kernel/traps.c
@@ -329,7 +329,7 @@ void __kprobes do_per_trap(struct pt_regs *regs)
 
if (notify_die(DIE_SSTEP, sstep, regs, 0, 0, SIGTRAP) == NOTIFY_STOP)
return;
-   if (!current-ptrace)
+   if (!tracehook_consider_fatal_signal(current, SIGTRAP))
return;
info.si_signo = SIGTRAP;
info.si_errno = 0;
@@ -428,7 +428,7 @@ static void __kprobes illegal_op(struct pt_regs *regs, long 
pgm_int_code,
if (get_user(*((__u16 *) opcode), (__u16 __user *) location))
return;
if (*((__u16 *) opcode) == S390_BREAKPOINT_U16) {
-   if (current-ptrace) {
+   if (tracehook_consider_fatal_signal(current, SIGTRAP)) {
info.si_signo = SIGTRAP;
info.si_errno = 0;
info.si_code = TRAP_BRKPT;
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 8cc28bc..ec2af67 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -161,6 +161,28 @@ static inline void tracehook_signal_handler(int sig, 
siginfo_t *info,
ptrace_notify(SIGTRAP);
 }
 
+/**
+ * tracehook_consider_fatal_signal - suppress special handling of fatal signal
+ * @task:  task receiving the signal
+ * @sig:   signal number being sent
+ *
+ * Return nonzero to prevent special handling of this termination signal.
+ * Normally handler for signal is %SIG_DFL.  It can be %SIG_IGN if @sig is
+ * ignored, in which case force_sig() is about to reset it to %SIG_DFL.
+ * When this returns zero, this signal might cause a quick termination
+ * that does not give the debugger a chance to intercept the signal.
+ *
+ * Called with or without @task-sighand-siglock held.
+ */
+static inline int tracehook_consider_fatal_signal(struct task_struct *task,
+ int sig)
+{
+   if (unlikely(task_utrace_flags(task)  (UTRACE_EVENT(SIGNAL_TERM) |
+   UTRACE_EVENT(SIGNAL_CORE
+   return 1;
+   return task-ptrace != 0;
+}
+
 #ifdef TIF_NOTIFY_RESUME
 /**
  * set_notify_resume - cause tracehook_notify_resume() to be called
diff --git a/kernel/signal.c b/kernel/signal.c
index 291c970..d7ef0da 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -494,7 +494,7 @@ int unhandled_signal(struct task_struct *tsk, int sig)
if (handler != SIG_IGN  handler != SIG_DFL)
return 0;
/* if ptraced, let the tracer determine */
-   return !tsk-ptrace;
+   return !tracehook_consider_fatal_signal(tsk, sig);
 }
 
 /*
@@ -982,7 +982,7 @@ static void complete_signal(int sig, struct task_struct *p, 
int group)
if (sig_fatal(p, sig) 
!(signal-flags  (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) 
!sigismember(t-real_blocked, sig) 
-   (sig == SIGKILL || !t-ptrace)) {
+   (sig == SIGKILL || !tracehook_consider_fatal_signal(t, sig))) {
/*
 * This signal will be fatal to the whole group.
 */
-- 
1.5.5.1




[PATCH 08/31] restore the DEATH/REAP utrace hooks

2011-08-03 Thread Oleg Nesterov
Restore the necessary hooks in release_task() and exit_notify(),
add the corresponding helpers into utrace.h.

Note: the @signal argument passed to -report_death() does not
match the previous behaviour. I think this shouldn't affect the
current users, and I bet nobody can really understand what this
magic argument should actually mean anyway.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/utrace.h |   22 ++
 kernel/exit.c  |4 
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/include/linux/utrace.h b/include/linux/utrace.h
index 9a2e2f4..cf13839 100644
--- a/include/linux/utrace.h
+++ b/include/linux/utrace.h
@@ -697,4 +697,26 @@ static inline __must_check int utrace_barrier_pid(struct 
pid *pid,
 
 #endif /* CONFIG_UTRACE */
 
+static inline void utrace_release_task(struct task_struct *task)
+{
+   /* see utrace_add_engine() about this barrier */
+   smp_mb();
+   if (task_utrace_flags(task))
+   utrace_maybe_reap(task, task_utrace_struct(task), true);
+}
+
+static inline void utrace_exit_notify(struct task_struct *task,
+ int signal, int group_dead)
+{
+   /*
+* If utrace_set_events() was just called to enable
+* UTRACE_EVENT(DEATH), then we are obliged to call
+* utrace_report_death() and not miss it.  utrace_set_events()
+* checks @task-exit_state under tasklist_lock to synchronize
+* with exit_notify(), the caller.
+*/
+   if (task_utrace_flags(task)  _UTRACE_DEATH_EVENTS)
+   utrace_report_death(task, group_dead, signal);
+}
+
 #endif /* linux/utrace.h */
diff --git a/kernel/exit.c b/kernel/exit.c
index c1b0ab6..ba5ba22 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -168,6 +168,8 @@ void release_task(struct task_struct * p)
struct task_struct *leader;
int zap_leader;
 repeat:
+   utrace_release_task(p);
+
/* don't need to get the RCU readlock here - the process is dead and
 * can't be modifying its own credentials. But shut RCU-lockdep up */
rcu_read_lock();
@@ -860,6 +862,8 @@ static void exit_notify(struct task_struct *tsk, int 
group_dead)
wake_up_process(tsk-signal-group_exit_task);
write_unlock_irq(tasklist_lock);
 
+   utrace_exit_notify(tsk, autoreap ? -1 : SIGCHLD, group_dead);
+
/* If the process is dead, release it - nobody will wait for it */
if (autoreap)
release_task(tsk);
-- 
1.5.5.1




[PATCH 09/31] utrace: remove jobctl bits

2011-08-03 Thread Oleg Nesterov
- change utrace_get_signal() to check JOBCTL_STOP_PENDING instead of
  signal-group_stop_count. With the recent changes group_stop_count
  doesn't necessarily mean this task should participate in group stop.

- remove the participate in group stop code from utrace_wakeup() and
  utrace_stop(), this is no longer needed and wrong.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/utrace.c |   16 ++--
 1 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/kernel/utrace.c b/kernel/utrace.c
index 1e750ad..5d3974e 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -648,11 +648,7 @@ static void utrace_wakeup(struct task_struct *target, 
struct utrace *utrace)
 {
lockdep_assert_held(utrace-lock);
spin_lock_irq(target-sighand-siglock);
-   if (target-signal-flags  SIGNAL_STOP_STOPPED ||
-   target-signal-group_stop_count)
-   target-state = TASK_STOPPED;
-   else
-   wake_up_state(target, __TASK_TRACED);
+   wake_up_state(target, __TASK_TRACED);
spin_unlock_irq(target-sighand-siglock);
 }
 
@@ -805,14 +801,6 @@ relock:
 
__set_current_state(TASK_TRACED);
 
-   /*
-* If there is a group stop in progress,
-* we must participate in the bookkeeping.
-*/
-   if (unlikely(task-signal-group_stop_count) 
-   !--task-signal-group_stop_count)
-   task-signal-flags = SIGNAL_STOP_STOPPED;
-
spin_unlock_irq(task-sighand-siglock);
spin_unlock(utrace-lock);
 
@@ -2037,7 +2025,7 @@ int utrace_get_signal(struct task_struct *task, struct 
pt_regs *regs,
ka = NULL;
memset(return_ka, 0, sizeof *return_ka);
} else if (!(task-utrace_flags  UTRACE_EVENT_SIGNAL_ALL) ||
-  unlikely(task-signal-group_stop_count)) {
+  unlikely(task-jobctl  JOBCTL_STOP_PENDING)) {
/*
 * If no engine is interested in intercepting signals or
 * we must stop, let the caller just dequeue them normally
-- 
1.5.5.1




[PATCH 05/31] add utrace hooks into sig_ignored() and recalc_sigpending()

2011-08-03 Thread Oleg Nesterov
Add the necessary and somewhat special hooks into sig_ignored() and
recalc_sigpending(). Basically this restores _force_sigpending() and
_consider_ignored_signal() tracehook logic.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/utrace.h |2 ++
 kernel/signal.c|7 ++-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/include/linux/utrace.h b/include/linux/utrace.h
index f251efe..1b8da1c 100644
--- a/include/linux/utrace.h
+++ b/include/linux/utrace.h
@@ -107,6 +107,8 @@ bool utrace_report_syscall_entry(struct pt_regs *);
 void utrace_report_syscall_exit(struct pt_regs *);
 void utrace_signal_handler(struct task_struct *, int);
 
+#define UTRACE_FLAG(task, ev)  (task_utrace_flags(task)  UTRACE_EVENT(ev))
+
 #ifndef CONFIG_UTRACE
 
 /*
diff --git a/kernel/signal.c b/kernel/signal.c
index d7ef0da..0f9af0b 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -87,7 +87,7 @@ static int sig_ignored(struct task_struct *t, int sig, int 
from_ancestor_ns)
/*
 * Tracers may want to know about even ignored signals.
 */
-   return !t-ptrace;
+   return !t-ptrace  !UTRACE_FLAG(t, SIGNAL_IGN);
 }
 
 /*
@@ -150,6 +150,11 @@ void recalc_sigpending_and_wake(struct task_struct *t)
 
 void recalc_sigpending(void)
 {
+   if (task_utrace_flags(current)  utrace_interrupt_pending()) {
+   set_thread_flag(TIF_SIGPENDING);
+   return;
+   }
+
if (!recalc_sigpending_tsk(current)  !freezing(current))
clear_thread_flag(TIF_SIGPENDING);
 
-- 
1.5.5.1




[PATCH 11/31] introduce wake_up_quiescent()

2011-08-03 Thread Oleg Nesterov
No functional changes. Add the new helper, wake_up_quiescent(task, state),
which simply returns wake_up_state(task, state). Change all callers which
do wake_up_state(STOPPED/TRACED) to use the new helper. ptrace_stop() is
a bit special, it does __set_current_state(RUNNING) in the very unlikely
case, change it as well.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/signal.h |2 ++
 kernel/ptrace.c|2 +-
 kernel/signal.c|   12 ++--
 kernel/utrace.c|2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a822300..2be3712 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -239,6 +239,8 @@ static inline int valid_signal(unsigned long sig)
 struct timespec;
 struct pt_regs;
 
+extern int wake_up_quiescent(struct task_struct *p, unsigned int state);
+
 extern int next_signal(struct sigpending *pending, sigset_t *mask);
 extern int do_send_sig_info(int sig, struct siginfo *info,
struct task_struct *p, bool group);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 56b8fc1..4194664 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -621,7 +621,7 @@ static int ptrace_resume(struct task_struct *child, long 
request,
child-exit_code = data;
 
if (lock_task_sighand(child, flags)) {
-   wake_up_state(child, __TASK_TRACED);
+   wake_up_quiescent(child, __TASK_TRACED);
unlock_task_sighand(child, flags);
}
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 71f5cca..3e8e0b1 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -702,6 +702,14 @@ void signal_wake_up(struct task_struct *t, int resume)
 }
 
 /*
+ * wakes up the STOPPED/TRACED task, must be called with -siglock held.
+ */
+int wake_up_quiescent(struct task_struct *p, unsigned int state)
+{
+   return wake_up_state(p, state);
+}
+
+/*
  * Remove signals in mask from the pending set and queue.
  * Returns 1 if any signals were found.
  *
@@ -888,7 +896,7 @@ static int prepare_signal(int sig, struct task_struct *p, 
int from_ancestor_ns)
task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
rm_from_queue(SIG_KERNEL_STOP_MASK, t-pending);
if (likely(!(t-ptrace  PT_SEIZED)))
-   wake_up_state(t, __TASK_STOPPED);
+   wake_up_quiescent(t, __TASK_STOPPED);
else
ptrace_trap_notify(t);
} while_each_thread(p, t);
@@ -1879,7 +1887,7 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
do_notify_parent_cldstop(current, false, why);
 
spin_lock_irq(current-sighand-siglock);
-   __set_current_state(TASK_RUNNING);
+   wake_up_quiescent(current, __TASK_TRACED);
spin_unlock_irq(current-sighand-siglock);
 
if (clear_code)
diff --git a/kernel/utrace.c b/kernel/utrace.c
index 5d3974e..cebc390 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -648,7 +648,7 @@ static void utrace_wakeup(struct task_struct *target, 
struct utrace *utrace)
 {
lockdep_assert_held(utrace-lock);
spin_lock_irq(target-sighand-siglock);
-   wake_up_state(target, __TASK_TRACED);
+   wake_up_quiescent(target, __TASK_TRACED);
spin_unlock_irq(target-sighand-siglock);
 }
 
-- 
1.5.5.1




[PATCH 16/31] reintroduce tracehook_finish_jctl() as utrace_end_stop()

2011-08-03 Thread Oleg Nesterov
utrace_finish_stop() is needed to avoid the races with SIGKILL which
wakes up UTRACED task, and thus it should be called every time after
the STOPPED/TRACED/UTRACED returns from schedule(), remember that
TASK_UTRACED can be added while the task is STOPPED/UTRACED.

- change do_signal_state() to call this helper right after schedule(),
  otherwise this logic is broken by the upstream changes

- now that utrace doesn't control TASK_TRACED bit, ptrace_stop() must
  call this helper too.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/utrace.h |   11 +++
 kernel/signal.c|5 +
 kernel/utrace.c|2 +-
 3 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/include/linux/utrace.h b/include/linux/utrace.h
index cf13839..0279c74 100644
--- a/include/linux/utrace.h
+++ b/include/linux/utrace.h
@@ -719,4 +719,15 @@ static inline void utrace_exit_notify(struct task_struct 
*task,
utrace_report_death(task, group_dead, signal);
 }
 
+/**
+ * utrace_end_stop - report about return from STOPPED/TRACED
+ *
+ * This is called by do_signal_stop() and ptrace_stop after wakeup.
+ */
+static inline void utrace_end_stop(void)
+{
+   if (task_utrace_flags(current))
+   utrace_finish_stop();
+}
+
 #endif /* linux/utrace.h */
diff --git a/kernel/signal.c b/kernel/signal.c
index 0dc6abb..a625309 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1895,6 +1895,8 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
read_unlock(tasklist_lock);
}
 
+   utrace_end_stop();
+
/*
 * While in TASK_TRACED, we were considered frozen enough.
 * Now that we woke up, it's crucial if we're supposed to be
@@ -2059,6 +2061,9 @@ static bool do_signal_stop(int signr)
 
/* Now we don't run again until woken by SIGCONT or SIGKILL */
schedule();
+
+   utrace_end_stop();
+
return true;
} else {
/*
diff --git a/kernel/utrace.c b/kernel/utrace.c
index 2097103..d41b982 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -741,7 +741,7 @@ static bool utrace_reset(struct task_struct *task, struct 
utrace *utrace)
 void utrace_finish_stop(void)
 {
/*
-* If we were task_is_traced() and then SIGKILL'ed, make
+* If we were task_is_utraced() and then SIGKILL'ed, make
 * sure we do nothing until the tracer drops utrace-lock.
 */
if (unlikely(__fatal_signal_pending(current))) {
-- 
1.5.5.1




[PATCH 15/31] utrace: use TASK_UTRACED instead of TASK_TRACED

2011-08-03 Thread Oleg Nesterov
Change utrace.c to use TASK_UTRACED instead of TASK_TRACED.

- utrace_stop/utrace_wakeup: simply use the new state

- utrace_do_stop: do not clear STOPPED/TRACED, but add the new
  __TASK_UTRACED bit to state the fact that both ptrace and utrace
  want this task to be stopped

- naturally, do not use task_is_traced() to check if this task was
  stopped by utrace, use the new task_is_utraced() helper which
  checks __TASK_UTRACED.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/utrace.c |   26 ++
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/utrace.c b/kernel/utrace.c
index cebc390..2097103 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -462,6 +462,8 @@ static void put_detached_list(struct list_head *list)
  */
 #define ENGINE_STOP(1UL  _UTRACE_NEVENTS)
 
+#define task_is_utraced(task)  ((task-state  __TASK_UTRACED) != 0)
+
 static void mark_engine_wants_stop(struct task_struct *task,
   struct utrace_engine *engine)
 {
@@ -576,7 +578,7 @@ int utrace_set_events(struct task_struct *target,
 
ret = 0;
if ((old_flags  ~events)  target != current 
-   !task_is_stopped_or_traced(target)  !target-exit_state) {
+   !task_is_utraced(target)  !target-exit_state) {
/*
 * This barrier ensures that our engine-flags changes
 * have hit before we examine utrace-reporting,
@@ -623,21 +625,21 @@ static void mark_engine_detached(struct utrace_engine 
*engine)
  */
 static bool utrace_do_stop(struct task_struct *target, struct utrace *utrace)
 {
-   if (task_is_stopped(target)) {
+   if (task_is_stopped_or_traced(target)) {
/*
 * Stopped is considered quiescent; when it wakes up, it will
 * go through utrace_finish_stop() before doing anything else.
 */
spin_lock_irq(target-sighand-siglock);
-   if (likely(task_is_stopped(target)))
-   __set_task_state(target, TASK_TRACED);
+   if (likely(task_is_stopped_or_traced(target)))
+   target-state |= TASK_UTRACED;
spin_unlock_irq(target-sighand-siglock);
} else if (utrace-resume  UTRACE_REPORT) {
utrace-resume = UTRACE_REPORT;
set_notify_resume(target);
}
 
-   return task_is_traced(target);
+   return task_is_utraced(target);
 }
 
 /*
@@ -648,7 +650,7 @@ static void utrace_wakeup(struct task_struct *target, 
struct utrace *utrace)
 {
lockdep_assert_held(utrace-lock);
spin_lock_irq(target-sighand-siglock);
-   wake_up_quiescent(target, __TASK_TRACED);
+   wake_up_quiescent(target, __TASK_UTRACED);
spin_unlock_irq(target-sighand-siglock);
 }
 
@@ -710,7 +712,7 @@ static bool utrace_reset(struct task_struct *task, struct 
utrace *utrace)
/*
 * If no more engines want it stopped, wake it up.
 */
-   if (task_is_traced(task)  !(flags  ENGINE_STOP)) {
+   if (task_is_utraced(task)  !(flags  ENGINE_STOP)) {
/*
 * It just resumes, so make sure single-step
 * is not left set.
@@ -749,7 +751,7 @@ void utrace_finish_stop(void)
 }
 
 /*
- * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
+ * Perform %UTRACE_STOP, i.e. block in TASK_UTRACED until woken up.
  * @task == current, @utrace == current-utrace, which is not locked.
  * Return true if we were woken up by SIGKILL even though some utrace
  * engine may still want us to stay stopped.
@@ -799,7 +801,7 @@ relock:
return;
}
 
-   __set_current_state(TASK_TRACED);
+   __set_current_state(TASK_UTRACED);
 
spin_unlock_irq(task-sighand-siglock);
spin_unlock(utrace-lock);
@@ -809,14 +811,14 @@ relock:
utrace_finish_stop();
 
/*
-* While in TASK_TRACED, we were considered frozen enough.
+* While in TASK_UTRACED, we were considered frozen enough.
 * Now that we woke up, it's crucial if we're supposed to be
 * frozen that we freeze now before running anything substantial.
 */
try_to_freeze();
 
/*
-* While we were in TASK_TRACED, complete_signal() considered
+* While we were in TASK_UTRACED, complete_signal() considered
 * us uninterested in signal wakeups.  Now make sure our
 * TIF_SIGPENDING state is correct for normal running.
 */
@@ -1087,7 +1089,7 @@ int utrace_control(struct task_struct *target,
if (unlikely(IS_ERR(utrace)))
return PTR_ERR(utrace);
 
-   reset = task_is_traced(target);
+   reset = task_is_utraced(target);
ret = 0;
 
/*
-- 
1.5.5.1




[PATCH 12/31] introduce ptrace_signal_wake_up()

2011-08-03 Thread Oleg Nesterov
Add the new helper, ptrace_signal_wake_up(), change ptrace.c/signal.c
to use it instead of signal_wake_up() to wake up a STOPPED/TRACED task.

The new helper does almost the same, except:

- it doesn't use the TASK_WAKEKILL bit to wake up the TRACED
  or STOPPED task, it uses __TASK_STOPPED | __TASK_TRACED
  explicitly. This is what ptrace actually wants, it should
  never wake up a TASK_KILLABLE task.

  This should be cleanuped upatream, signal_wake_up() should
  take the state as an argument, not a boolean. Until then
  we add a new static helper.

- it uses wake_up_quiescent() instead of wake_up_state().

Thereafter every change from STOPPED/TRACED to RUNNING is done via
wake_up_quiescent().

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/ptrace.h |1 +
 kernel/ptrace.c|   20 
 kernel/signal.c|2 +-
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 800f113..6d9282a 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -113,6 +113,7 @@
 #include linux/compiler.h/* For unlikely.  */
 #include linux/sched.h   /* For struct task_struct.  */
 
+extern void ptrace_signal_wake_up(struct task_struct *p, int quiescent);
 
 extern long arch_ptrace(struct task_struct *child, long request,
unsigned long addr, unsigned long data);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 4194664..1a50090 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -25,6 +25,18 @@
 #include linux/hw_breakpoint.h
 #include linux/cn_proc.h
 
+void ptrace_signal_wake_up(struct task_struct *p, int quiescent)
+{
+   unsigned int state;
+
+   set_tsk_thread_flag(p, TIF_SIGPENDING);
+
+   state = TASK_INTERRUPTIBLE;
+   if (quiescent)
+   state |= (__TASK_STOPPED | __TASK_TRACED);
+   if (!wake_up_quiescent(p, state))
+   kick_process(p);
+}
 
 static int ptrace_trapping_sleep_fn(void *flags)
 {
@@ -106,7 +118,7 @@ void __ptrace_unlink(struct task_struct *child)
 * TASK_KILLABLE sleeps.
 */
if (child-jobctl  JOBCTL_STOP_PENDING || task_is_traced(child))
-   signal_wake_up(child, task_is_traced(child));
+   ptrace_signal_wake_up(child, task_is_traced(child));
 
spin_unlock(child-sighand-siglock);
 }
@@ -296,7 +308,7 @@ static int ptrace_attach(struct task_struct *task, long 
request,
 */
if (task_is_stopped(task) 
task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
-   signal_wake_up(task, 1);
+   ptrace_signal_wake_up(task, 1);
 
spin_unlock(task-sighand-siglock);
 
@@ -731,7 +743,7 @@ int ptrace_request(struct task_struct *child, long request,
 * tracee into STOP.
 */
if (likely(task_set_jobctl_pending(child, JOBCTL_TRAP_STOP)))
-   signal_wake_up(child, child-jobctl  JOBCTL_LISTENING);
+   ptrace_signal_wake_up(child, child-jobctl  
JOBCTL_LISTENING);
 
unlock_task_sighand(child, flags);
ret = 0;
@@ -760,7 +772,7 @@ int ptrace_request(struct task_struct *child, long request,
 * of this trap and now.  Trigger re-trap immediately.
 */
if (child-jobctl  JOBCTL_TRAP_NOTIFY)
-   signal_wake_up(child, true);
+   ptrace_signal_wake_up(child, true);
 
unlock_task_sighand(child, flags);
ret = 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index 3e8e0b1..0dc6abb 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -854,7 +854,7 @@ static void ptrace_trap_notify(struct task_struct *t)
assert_spin_locked(t-sighand-siglock);
 
task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
-   signal_wake_up(t, t-jobctl  JOBCTL_LISTENING);
+   ptrace_signal_wake_up(t, t-jobctl  JOBCTL_LISTENING);
 }
 
 /*
-- 
1.5.5.1




[PATCH 18/31] ptrace_stop: do not assume the task is running after wake_up_quiescent()

2011-08-03 Thread Oleg Nesterov
If ptrace_stop() sets TASK_TRACED and then detects we should not stop,
it can race with utrace_do_stop() which can see TASK_TRACED and add
TASK_UTRACED. In this case we should stop for utrace needs.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/signal.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0d1675a..249760f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1908,6 +1908,14 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
if (clear_code)
current-exit_code = 0;
read_unlock(tasklist_lock);
+
+   /*
+* It is possible that __TASK_UTRACED was added by utrace
+* while we were __TASK_TRACED and before we take -siglock
+* for wake_up_quiescent(), we need to block in this case.
+* Otherwise this is unnecessary but absolutely harmless.
+*/
+   schedule();
}
 
utrace_end_stop();
-- 
1.5.5.1




[PATCH 14/31] introduce TASK_UTRACED state

2011-08-03 Thread Oleg Nesterov
Introduce TASK_UTRACED state, will be used by utrace instead of TASK_TRACED.

Note: this state is reported as t (tracing stop) to the user-space to
avoid the confusion. IOW, it looks like TASK_TRACED in /proc/pid/status.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 fs/proc/array.c   |   11 ++-
 include/linux/sched.h |   20 +++-
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index f0c0ea2..e0daec4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -138,11 +138,12 @@ static const char * const task_state_array[] = {
D (disk sleep),   /*   2 */
T (stopped),  /*   4 */
t (tracing stop), /*   8 */
-   Z (zombie),   /*  16 */
-   X (dead), /*  32 */
-   x (dead), /*  64 */
-   K (wakekill), /* 128 */
-   W (waking),   /* 256 */
+   t (tracing stop), /*  16 (stopped by utrace) */
+   Z (zombie),   /*  32 */
+   X (dead), /*  64 */
+   x (dead), /* 128 */
+   K (wakekill), /* 256 */
+   W (waking),   /* 512 */
 };
 
 static inline const char *get_task_state(struct task_struct *tsk)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c6d79af..f3f0a77 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -184,16 +184,17 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq 
*cfs_rq)
 #define TASK_UNINTERRUPTIBLE   2
 #define __TASK_STOPPED 4
 #define __TASK_TRACED  8
+#define __TASK_UTRACED 16
 /* in tsk-exit_state */
-#define EXIT_ZOMBIE16
-#define EXIT_DEAD  32
+#define EXIT_ZOMBIE32
+#define EXIT_DEAD  64
 /* in tsk-state again */
-#define TASK_DEAD  64
-#define TASK_WAKEKILL  128
-#define TASK_WAKING256
-#define TASK_STATE_MAX 512
+#define TASK_DEAD  128
+#define TASK_WAKEKILL  256
+#define TASK_WAKING512
+#define TASK_STATE_MAX 1024
 
-#define TASK_STATE_TO_CHAR_STR RSDTtZXxKW
+#define TASK_STATE_TO_CHAR_STR RSDTtUZXxKW
 
 extern char ___assert_task_state[1 - 2*!!(
sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];
@@ -202,15 +203,16 @@ extern char ___assert_task_state[1 - 2*!!(
 #define TASK_KILLABLE  (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED   (TASK_WAKEKILL | __TASK_STOPPED)
 #define TASK_TRACED(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_UTRACED   (TASK_WAKEKILL | __TASK_UTRACED)
 
 /* Convenience macros for the sake of wake_up */
 #define TASK_NORMAL(TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)
-#define TASK_ALL   (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED)
+#define TASK_ALL   (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED | 
__TASK_UTRACED)
 
 /* get_task_state() */
 #define TASK_REPORT(TASK_RUNNING | TASK_INTERRUPTIBLE | \
 TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \
-__TASK_TRACED)
+__TASK_TRACED | __TASK_UTRACED)
 
 #define task_is_traced(task)   ((task-state  __TASK_TRACED) != 0)
 #define task_is_stopped(task)  ((task-state  __TASK_STOPPED) != 0)
-- 
1.5.5.1




[PATCH 19/31] get_signal_to_deliver: restore/restructure utrace/ptrace signal reporting

2011-08-03 Thread Oleg Nesterov
- Reintroduce tracehook_get_signal() as utrace_hook_signal().

- Change get_signal_to_deliver() to call utrace_hook_signal() first,
  before dequeue_signal()

- Always call ptrace_signal() if signal != SIGKILL, no matter whether
  this signal comes from utrace or not.

  Since this can change signr again, update struct k_sigaction *ka
  in this case.

IOW, roughly, ptrace acts as if it is the last attached engine, it
takes the final decision about the signal.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/utrace.h |   31 +++
 kernel/signal.c|   30 --
 2 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/include/linux/utrace.h b/include/linux/utrace.h
index 0279c74..63103e2 100644
--- a/include/linux/utrace.h
+++ b/include/linux/utrace.h
@@ -730,4 +730,35 @@ static inline void utrace_end_stop(void)
utrace_finish_stop();
 }
 
+/**
+ * utrace_hook_signal - deliver synthetic signal to traced task
+ * @task:  @current
+ * @regs:  task_pt_regs(@current)
+ * @info:  details of synthetic signal
+ * @return_ka: sigaction for synthetic signal
+ *
+ * Return zero to check for a real pending signal normally.
+ * Return -1 after releasing the siglock to repeat the check.
+ * Return a signal number to induce an artificial signal delivery,
+ * setting *@info and *@return_ka to specify its details and behavior.
+ *
+ * The @return_ka-sa_handler value controls the disposition of the
+ * signal, no matter the signal number.  For %SIG_DFL, the return value
+ * is a representative signal to indicate the behavior (e.g. %SIGTERM
+ * for death, %SIGQUIT for core dump, %SIGSTOP for job control stop,
+ * %SIGTSTP for stop unless in an orphaned pgrp), but the signal number
+ * reported will be @info-si_signo instead.
+ *
+ * Called with @task-sighand-siglock held, before dequeuing pending signals.
+ */
+static inline int utrace_hook_signal(struct task_struct *task,
+  struct pt_regs *regs,
+  siginfo_t *info,
+  struct k_sigaction *return_ka)
+{
+   if (unlikely(task_utrace_flags(task)))
+   return utrace_get_signal(task, regs, info, return_ka);
+   return 0;
+}
+
 #endif /* linux/utrace.h */
diff --git a/kernel/signal.c b/kernel/signal.c
index 249760f..3c783d3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2234,17 +2234,27 @@ relock:
for (;;) {
struct k_sigaction *ka;
 
-   if (unlikely(current-jobctl  JOBCTL_STOP_PENDING) 
-   do_signal_stop(0))
+   signr = utrace_hook_signal(current, regs, info, return_ka);
+   if (unlikely(signr  0))
goto relock;
 
-   if (unlikely(current-jobctl  JOBCTL_TRAP_MASK)) {
-   do_jobctl_trap();
-   spin_unlock_irq(sighand-siglock);
-   goto relock;
-   }
+   if (unlikely(signr != 0))
+   ka = return_ka;
+   else {
+   if (unlikely(current-jobctl  JOBCTL_STOP_PENDING) 
+   do_signal_stop(0))
+   goto relock;
 
-   signr = dequeue_signal(current, current-blocked, info);
+   if (unlikely(current-jobctl  JOBCTL_TRAP_MASK)) {
+   do_jobctl_trap();
+   spin_unlock_irq(sighand-siglock);
+   goto relock;
+   }
+
+   signr = dequeue_signal(current, current-blocked, 
info);
+
+   ka = sighand-action[signr-1];
+   }
 
if (!signr)
break; /* will return 0 */
@@ -2254,9 +2264,9 @@ relock:
  regs, cookie);
if (!signr)
continue;
-   }
 
-   ka = sighand-action[signr-1];
+   ka = sighand-action[signr-1];
+   }
 
/* Trace actually delivered signals. */
trace_signal_deliver(signr, info, ka);
-- 
1.5.5.1




[PATCH 20/31] utrace_get_signal: s/JOBCTL_STOP_PENDING/JOBCTL_PENDING_MASK/

2011-08-03 Thread Oleg Nesterov
utrace_get_signal() checks JOBCTL_STOP_PENDING to detect the
case when we should not try to dequeue the signal but should
try to participate in the group-stop.

With the recent changes this is not enough, everything which
contrbutes to recalc_sigpending_tsk() should be respected.

Check JOBCTL_PENDING_MASK instead. This matches the
JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK code in the caller,
get_signal_to_deliver(). Note that this code won't run if
utrace_get_signal() returns signr  0.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/utrace.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/utrace.c b/kernel/utrace.c
index d41b982..0bb0a06 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -2027,7 +2027,7 @@ int utrace_get_signal(struct task_struct *task, struct 
pt_regs *regs,
ka = NULL;
memset(return_ka, 0, sizeof *return_ka);
} else if (!(task-utrace_flags  UTRACE_EVENT_SIGNAL_ALL) ||
-  unlikely(task-jobctl  JOBCTL_STOP_PENDING)) {
+  unlikely(task-jobctl  JOBCTL_PENDING_MASK)) {
/*
 * If no engine is interested in intercepting signals or
 * we must stop, let the caller just dequeue them normally
-- 
1.5.5.1




[PATCH 17/31] teach wake_up_quiescent() to do selective wake_up

2011-08-03 Thread Oleg Nesterov
Both utrace and ptrace can want the same thread to be quiescent, in this
case its state is TASK_TRACED | TASK_UTRACED. And this also means that
this task must not run unless both utrace and ptrace resume it.

Change wake_up_quiescent(p, state) to do p-state = ~state and return
false unless there is no more quiescent bits in task-state.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/signal.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index a625309..0d1675a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -701,11 +701,26 @@ void signal_wake_up(struct task_struct *t, int resume)
kick_process(t);
 }
 
+#define STATE_QUIESCENT(__TASK_STOPPED | __TASK_TRACED | 
__TASK_UTRACED)
 /*
  * wakes up the STOPPED/TRACED task, must be called with -siglock held.
  */
 int wake_up_quiescent(struct task_struct *p, unsigned int state)
 {
+   unsigned int quiescent = (p-state  STATE_QUIESCENT);
+
+   WARN_ON(state  ~(STATE_QUIESCENT | TASK_INTERRUPTIBLE));
+
+   if (quiescent) {
+   state = ~TASK_INTERRUPTIBLE;
+   if ((quiescent  ~state) != 0) {
+   p-state = ~state;
+   WARN_ON(!(p-state  STATE_QUIESCENT));
+   WARN_ON(!(p-state  TASK_WAKEKILL));
+   return 0;
+   }
+   }
+
return wake_up_state(p, state);
 }
 
-- 
1.5.5.1




[PATCH 22/31] introduce PT_SYSCALL_TRACE flag

2011-08-03 Thread Oleg Nesterov
Currently tracehooks assume that if the ptraced task has
TIF_SYSCALL_TRACE set, the tracee should report the syscall.
This is not true, this thread flag can be set by utrace.

Add the new internal ptrace flag, PT_SYSCALL_TRACE. Change
ptrace_set_syscall_trace() to set/clear this bit along with
TIF_SYSCALL_TRACE, change ptrace_report_syscall() to check
this flag instead of PT_PTRACED.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/ptrace.h|3 +++
 include/linux/tracehook.h |2 +-
 kernel/ptrace.c   |7 +--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 6d9282a..c10f610 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -104,6 +104,8 @@
 
 #define PT_TRACE_MASK  0x03f4
 
+#define PT_SYSCALL_TRACE   0x0002
+
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT  31
 #define PT_SINGLESTEP  (1PT_SINGLESTEP_BIT)
@@ -227,6 +229,7 @@ static inline void ptrace_init_task(struct task_struct 
*child, bool ptrace)
 
if (unlikely(ptrace)  current-ptrace) {
child-ptrace = current-ptrace;
+   child-ptrace = ~PT_SYSCALL_TRACE;
__ptrace_link(child, current-parent);
 
if (child-ptrace  PT_SEIZED)
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index ec2af67..eb9fe30 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -59,7 +59,7 @@ static inline void ptrace_report_syscall(struct pt_regs *regs)
 {
int ptrace = current-ptrace;
 
-   if (!(ptrace  PT_PTRACED))
+   if (!(ptrace  PT_SYSCALL_TRACE))
return;
 
ptrace_notify(SIGTRAP | ((ptrace  PT_TRACESYSGOOD) ? 0x80 : 0));
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index dc2ad34..7deb292 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -40,10 +40,13 @@ void ptrace_signal_wake_up(struct task_struct *p, int 
quiescent)
 
 static void ptrace_set_syscall_trace(struct task_struct *p, bool on)
 {
-   if (on)
+   if (on) {
+   p-ptrace |= PT_SYSCALL_TRACE;
set_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
-   else
+   } else {
+   p-ptrace = ~PT_SYSCALL_TRACE;
clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
+   }
 }
 
 static int ptrace_trapping_sleep_fn(void *flags)
-- 
1.5.5.1




[PATCH 21/31] introduce ptrace_set_syscall_trace()

2011-08-03 Thread Oleg Nesterov
No functional changes. Add the new helper, ptrace_set_syscall_trace(),
which should be used to set/clear TIF_SYSCALL_TRACE in ptrace code.
Currently it does nothing more.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/ptrace.c |   15 ++-
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1a50090..dc2ad34 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -38,6 +38,14 @@ void ptrace_signal_wake_up(struct task_struct *p, int 
quiescent)
kick_process(p);
 }
 
+static void ptrace_set_syscall_trace(struct task_struct *p, bool on)
+{
+   if (on)
+   set_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
+   else
+   clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
+}
+
 static int ptrace_trapping_sleep_fn(void *flags)
 {
schedule();
@@ -418,7 +426,7 @@ static int ptrace_detach(struct task_struct *child, 
unsigned int data)
 
/* Architecture-specific hardware disable .. */
ptrace_disable(child);
-   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+   ptrace_set_syscall_trace(child, false);
 
write_lock_irq(tasklist_lock);
/*
@@ -606,10 +614,7 @@ static int ptrace_resume(struct task_struct *child, long 
request,
if (!valid_signal(data))
return -EIO;
 
-   if (request == PTRACE_SYSCALL)
-   set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
-   else
-   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+   ptrace_set_syscall_trace(child, request == PTRACE_SYSCALL);
 
 #ifdef TIF_SYSCALL_EMU
if (request == PTRACE_SYSEMU || request == PTRACE_SYSEMU_SINGLESTEP)
-- 
1.5.5.1




[PATCH 23/31] utrace: don't clear TIF_SYSCALL_TRACE if it is needed by ptrace

2011-08-03 Thread Oleg Nesterov
TIF_SYSCALL_TRACE should be cleared only if both ptrace and utrace do
not want it, change utrace_reset() to check PT_SYSCALL_TRACE before
clear_tsk_thread_flag(TIF_SYSCALL_TRACE).

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/utrace.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/utrace.c b/kernel/utrace.c
index 0bb0a06..bebf6de 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -697,6 +697,7 @@ static bool utrace_reset(struct task_struct *task, struct 
utrace *utrace)
BUG_ON(utrace-death);
flags = UTRACE_EVENT(REAP);
} else if (!(flags  UTRACE_EVENT_SYSCALL) 
+  !(task-ptrace  PT_SYSCALL_TRACE) 
   test_tsk_thread_flag(task, TIF_SYSCALL_TRACE)) {
clear_tsk_thread_flag(task, TIF_SYSCALL_TRACE);
}
-- 
1.5.5.1




[PATCH 02/31] utrace: add utrace_init_task/utrace_free_task calls

2011-08-03 Thread Oleg Nesterov
Add the necessary copy_process()-utrace_init_task() and
free_task()-utrace_free_task() calls.

Originally this was the part of utrace core patch, but since
tracehooks are dying it doesn't make sense to reintroduce them.
Instead, just call the utrace_ helpers directly. This is fine
even without CONFIG_UTRACE, gcc is smart enough to optimize out
the code in free_task().

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/fork.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index e7ceaca..a9891da 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -66,6 +66,7 @@
 #include linux/user-return-notifier.h
 #include linux/oom.h
 #include linux/khugepaged.h
+#include linux/utrace.h
 
 #include asm/pgtable.h
 #include asm/pgalloc.h
@@ -167,6 +168,8 @@ void free_task(struct task_struct *tsk)
free_thread_info(tsk-stack);
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
+   if (task_utrace_struct(tsk))
+   utrace_free_task(tsk);
free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
@@ -1096,6 +1099,8 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
if (!p)
goto fork_out;
 
+   utrace_init_task(p);
+
ftrace_graph_init_task(p);
 
rt_mutex_init_task(p);
-- 
1.5.5.1




[PATCH 24/31] introduce task_utrace_lock/task_utrace_unlock

2011-08-03 Thread Oleg Nesterov
- Add task_utrace_lock(task). It simply takes task-utrace-lock if
  this task was ever utraced. Otherwise it takes task_lock(), this
  serializes with utrace_attach_task()-utrace_task_alloc(). In both
  case the caller can be sure it can't race with anything which needs
  utrace-lock.

- Add task_utrace_unlock(task), it releases the corresponding lock.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/utrace.h |9 +
 kernel/utrace.c|   26 ++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/include/linux/utrace.h b/include/linux/utrace.h
index 63103e2..f37373b 100644
--- a/include/linux/utrace.h
+++ b/include/linux/utrace.h
@@ -117,6 +117,12 @@ void utrace_signal_handler(struct task_struct *, int);
 
 #ifndef CONFIG_UTRACE
 
+static inline void task_utrace_lock(struct task_struct *task)
+{
+}
+static inline void task_utrace_unlock(struct task_struct *task)
+{
+}
 /*
  * linux/tracehook.h uses these accessors to avoid #ifdef CONFIG_UTRACE.
  */
@@ -139,6 +145,9 @@ static inline void task_utrace_proc_status(struct seq_file 
*m,
 
 #else  /* CONFIG_UTRACE */
 
+extern void task_utrace_lock(struct task_struct *task);
+extern void task_utrace_unlock(struct task_struct *task);
+
 static inline unsigned long task_utrace_flags(struct task_struct *task)
 {
return task-utrace_flags;
diff --git a/kernel/utrace.c b/kernel/utrace.c
index bebf6de..960dd9e 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -79,6 +79,32 @@ static int __init utrace_init(void)
 }
 module_init(utrace_init);
 
+void task_utrace_lock(struct task_struct *task)
+{
+   struct utrace *utrace = task_utrace_struct(task);
+
+   if (!utrace) {
+   task_lock(task);
+   utrace = task_utrace_struct(task);
+   if (!utrace)
+   return;
+
+   task_unlock(task);
+   }
+
+   spin_lock(utrace-lock);
+}
+
+void task_utrace_unlock(struct task_struct *task)
+{
+   struct utrace *utrace = task_utrace_struct(task);
+
+   if (utrace)
+   spin_unlock(utrace-lock);
+   else
+   task_unlock(task);
+}
+
 /*
  * Set up @task.utrace for the first time.  We can have races
  * between two utrace_attach_task() calls here.  The task_lock()
-- 
1.5.5.1




[PATCH 27/31] utrace: finish_resume_report: don't do user_xxx_step() if ptrace_wants_step()

2011-08-03 Thread Oleg Nesterov
finish_resume_report() should not enable/disable the stepping if
ptrace_wants_step() == T. If ptrace wants block_step while utrace
wants single_step we could promote the stepping, but I do not
think this really makes sense.

Unless the tracee is killed this can't race with ptrace, this is
called by the tracee itself. If it is killed we do not care.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/tracehook.h |8 
 kernel/utrace.c   |9 ++---
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 21c8ca2..b6812d4 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -104,8 +104,8 @@ static inline __must_check int 
tracehook_report_syscall_entry(
return 0;
 }
 
-#define ptrace_wants_step()\
-   (current-ptrace  (PT_SINGLE_STEP | PT_SINGLE_BLOCK))
+#define ptrace_wants_step(task)\
+   ((task)-ptrace  (PT_SINGLE_STEP | PT_SINGLE_BLOCK))
 
 /**
  * tracehook_report_syscall_exit - task has just finished a system call
@@ -129,7 +129,7 @@ static inline void tracehook_report_syscall_exit(struct 
pt_regs *regs, int step)
if (task_utrace_flags(current)  UTRACE_EVENT(SYSCALL_EXIT))
utrace_report_syscall_exit(regs);
 
-   if (step  ptrace_wants_step()) {
+   if (step  ptrace_wants_step(current)) {
siginfo_t info;
user_single_step_siginfo(current, regs, info);
force_sig_info(SIGTRAP, info, current);
@@ -160,7 +160,7 @@ static inline void tracehook_signal_handler(int sig, 
siginfo_t *info,
 {
if (task_utrace_flags(current))
utrace_signal_handler(current, stepping);
-   if (stepping  ptrace_wants_step())
+   if (stepping  ptrace_wants_step(current))
ptrace_notify(SIGTRAP);
 }
 
diff --git a/kernel/utrace.c b/kernel/utrace.c
index 960dd9e..05e8532 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -1829,7 +1829,8 @@ static void finish_resume_report(struct task_struct *task,
 
case UTRACE_BLOCKSTEP:
if (likely(arch_has_block_step())) {
-   user_enable_block_step(task);
+   if (!ptrace_wants_step(task))
+   user_enable_block_step(task);
break;
}
 
@@ -1842,7 +1843,8 @@ static void finish_resume_report(struct task_struct *task,
 
case UTRACE_SINGLESTEP:
if (likely(arch_has_single_step())) {
-   user_enable_single_step(task);
+   if (!ptrace_wants_step(task))
+   user_enable_single_step(task);
} else {
/*
 * This means some callback is to blame for failing
@@ -1857,7 +1859,8 @@ static void finish_resume_report(struct task_struct *task,
case UTRACE_REPORT:
case UTRACE_RESUME:
default:
-   user_disable_single_step(task);
+   if (!ptrace_wants_step(task))
+   user_disable_single_step(task);
break;
}
 }
-- 
1.5.5.1




[PATCH 30/31] ptrace_report_syscall: check TIF_SYSCALL_EMU

2011-08-03 Thread Oleg Nesterov
4d16a64 introduce PT_SYSCALL_TRACE flag breaks PTRACE_SYSEMU
which doesn't set PT_SYSCALL_TRACE.

Change ptrace_report_syscall() to check TIF_SYSCALL_EMU as well.
This can't conflict with utrace, this flag can only be set by
ptrace.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/tracehook.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index b6812d4..90ca578 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -59,7 +59,7 @@ static inline void ptrace_report_syscall(struct pt_regs *regs)
 {
int ptrace = current-ptrace;
 
-   if (!(ptrace  PT_SYSCALL_TRACE))
+   if (!(ptrace  PT_SYSCALL_TRACE)  !test_thread_flag(TIF_SYSCALL_EMU))
return;
 
ptrace_notify(SIGTRAP | ((ptrace  PT_TRACESYSGOOD) ? 0x80 : 0));
-- 
1.5.5.1




[PATCHSET] utrace for 3.1 kernel

2011-08-02 Thread Oleg Nesterov
Hello.

utrace patches for 3.1 kernel. Untested, will try to do some tests
tomorrow.

I do not want to spam you all and the lists, please look at

git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git utrace-3.1

Oleg Nesterov (29):
  utrace: add utrace_init_task/utrace_free_task calls
  tracehooks: add utrace hooks
  tracehooks: reintroduce tracehook_consider_fatal_signal()
  add utrace hooks into sig_ignored() and recalc_sigpending()
  restore the EXEC/EXIT/CLONE utrace hooks
  utrace: utrace_report_death() can use task_utrace_struct()
  restore the DEATH/REAP utrace hooks
  utrace: remove jobctl bits
  ptrace: take -siglock around s/TRACED/RUNNING/
  introduce wake_up_quiescent()
  introduce ptrace_signal_wake_up()
  wait_task_inactive: treat task-state and match_state as bitmasks
  introduce TASK_UTRACED state
  utrace: use TASK_UTRACED instead of TASK_TRACED
  reintroduce tracehook_finish_jctl() as utrace_end_stop()
  teach wake_up_quiescent() to do selective wake_up
  ptrace_stop: do not assume the task is running after wake_up_quiescent()
  get_signal_to_deliver: restore/restructure utrace/ptrace signal reporting
  utrace_get_signal: s/JOBCTL_STOP_PENDING/JOBCTL_PENDING_MASK/
  introduce ptrace_set_syscall_trace()
  introduce PT_SYSCALL_TRACE flag
  utrace: don't clear TIF_SYSCALL_TRACE if it is needed by ptrace
  introduce task_utrace_lock/task_utrace_unlock
  teach ptrace_set_syscall_trace() to play well with utrace
  introduce PT_SINGLE_STEP and PT_SINGLE_BLOCK
  utrace: finish_resume_report: don't do user_xxx_step() if 
ptrace_wants_step()
  ptrace: shift user_*_step() from ptrace_resume() to ptrace_stop()
  ptrace_disable: no need to disable stepping
  ptrace_report_syscall: check TIF_SYSCALL_EMU

Roland McGrath (1):
  utrace core

 Documentation/DocBook/Makefile|2 +-
 Documentation/DocBook/utrace.tmpl |  589 +
 arch/s390/kernel/traps.c  |4 +-
 arch/x86/kernel/ptrace.c  |1 -
 fs/exec.c |5 +-
 fs/proc/array.c   |   14 +-
 include/linux/ptrace.h|7 +
 include/linux/sched.h |   25 +-
 include/linux/signal.h|2 +
 include/linux/tracehook.h |   53 +-
 include/linux/utrace.h|  773 
 init/Kconfig  |9 +
 kernel/Makefile   |1 +
 kernel/exit.c |5 +
 kernel/fork.c |9 +
 kernel/ptrace.c   |   57 +-
 kernel/sched.c|2 +-
 kernel/signal.c   |   97 ++-
 kernel/utrace.c   | 2461 +
 19 files changed, 4062 insertions(+), 54 deletions(-)
 create mode 100644 Documentation/DocBook/utrace.tmpl
 create mode 100644 include/linux/utrace.h
 create mode 100644 kernel/utrace.c



[PATCH 0/2] update ptrace_disable() + PTRACE_SYSEMU/TIF_SYSCALL_EMU fix

2011-07-03 Thread Oleg Nesterov
On 07/01, Oleg Nesterov wrote:

   - Perhaps PTRACE_SYSEMU/TIF_SYSCALL_EMU logic was broken,
 I need to recheck.

Yes, it was. Fixed by 2/2.

ptrace: shift user_*_step() from ptrace_resume() to ptrace_stop()
forgot to change ptrace_disable(), see 1/2.

Oleg.



[PATCH 2/2] ptrace_report_syscall: check TIF_SYSCALL_EMU

2011-07-03 Thread Oleg Nesterov
4d16a64 introduce PT_SYSCALL_TRACE flag breaks PTRACE_SYSEMU
which doesn't set PT_SYSCALL_TRACE.

Change ptrace_report_syscall() to check TIF_SYSCALL_EMU as well.
This can't conflict with utrace, this flag can only be set by
ptrace.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/tracehook.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 5612d2d..ac833de 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -76,7 +76,7 @@ static inline void ptrace_report_syscall(struct pt_regs *regs)
 {
int ptrace = task_ptrace(current);
 
-   if (!(ptrace  PT_SYSCALL_TRACE))
+   if (!(ptrace  PT_SYSCALL_TRACE)  !test_thread_flag(TIF_SYSCALL_EMU))
return;
 
ptrace_notify(SIGTRAP | ((ptrace  PT_TRACESYSGOOD) ? 0x80 : 0));
-- 
1.5.5.1




[PATCH 1/2] utrace: finish_resume_report: don't do user_xxx_step() if ptrace_wants_step()

2011-07-01 Thread Oleg Nesterov
finish_resume_report() should not enable/disable the stepping if
ptrace_wants_step() == T. If ptrace wants block_step while utrace
wants single_step we could promote the stepping, but I do not
think this really makes sense.

Unless the tracee is killed this can't race with ptrace, this is
called by the tracee itself. If it is killed we do not care.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/tracehook.h |8 
 kernel/utrace.c   |9 ++---
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 06edb52..5612d2d 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -121,8 +121,8 @@ static inline __must_check int 
tracehook_report_syscall_entry(
return 0;
 }
 
-#define ptrace_wants_step()\
-   (current-ptrace  (PT_SINGLE_STEP | PT_SINGLE_BLOCK))
+#define ptrace_wants_step(task)\
+   ((task)-ptrace  (PT_SINGLE_STEP | PT_SINGLE_BLOCK))
 
 /**
  * tracehook_report_syscall_exit - task has just finished a system call
@@ -146,7 +146,7 @@ static inline void tracehook_report_syscall_exit(struct 
pt_regs *regs, int step)
if (task_utrace_flags(current)  UTRACE_EVENT(SYSCALL_EXIT))
utrace_report_syscall_exit(regs);
 
-   if (step  ptrace_wants_step()) {
+   if (step  ptrace_wants_step(current)) {
siginfo_t info;
user_single_step_siginfo(current, regs, info);
force_sig_info(SIGTRAP, info, current);
@@ -439,7 +439,7 @@ static inline void tracehook_signal_handler(int sig, 
siginfo_t *info,
 {
if (task_utrace_flags(current))
utrace_signal_handler(current, stepping);
-   if (stepping  ptrace_wants_step())
+   if (stepping  ptrace_wants_step(current))
ptrace_notify(SIGTRAP);
 }
 
diff --git a/kernel/utrace.c b/kernel/utrace.c
index 508c13c..fbabb81 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -1828,7 +1828,8 @@ static void finish_resume_report(struct task_struct *task,
 
case UTRACE_BLOCKSTEP:
if (likely(arch_has_block_step())) {
-   user_enable_block_step(task);
+   if (!ptrace_wants_step(task))
+   user_enable_block_step(task);
break;
}
 
@@ -1841,7 +1842,8 @@ static void finish_resume_report(struct task_struct *task,
 
case UTRACE_SINGLESTEP:
if (likely(arch_has_single_step())) {
-   user_enable_single_step(task);
+   if (!ptrace_wants_step(task))
+   user_enable_single_step(task);
} else {
/*
 * This means some callback is to blame for failing
@@ -1856,7 +1858,8 @@ static void finish_resume_report(struct task_struct *task,
case UTRACE_REPORT:
case UTRACE_RESUME:
default:
-   user_disable_single_step(task);
+   if (!ptrace_wants_step(task))
+   user_disable_single_step(task);
break;
}
 }
-- 
1.5.5.1




[PATCH 2/2] ptrace: shift user_*_step() from ptrace_resume() to ptrace_stop()

2011-07-01 Thread Oleg Nesterov
1. ptrace_resume() plays with the stopped task which can be also
   task_is_utraced(). In this case user_enable_xxx_step() can race
   with utrace_reset()-user_disable_single_step().

   We could change utrace_reset() to check ptrace_wants_step() and
   add the task_utrace_lock + unlock barrier after ptrace_resume()
   sets PT_SINGLE_STEP.

   But it is better to reassign enable/desable from the tracer to
   the tracee, it can check its PT_SINGLE_ bits after wakeup. This
   also makes sense because enable_step(task) can be faster if
   task == current.

2. ptrace can do user_disable_single_step() while utrace needs the
   stepping.

   Set TIF_NOTIFY_RESUME after user_disable_single_step() to ensure
   the tracee can't return to the user-space without utrace_resume().
   Any correct engine which wants the stepping should reassert it.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/ptrace.c |4 
 kernel/signal.c |   12 
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 44908d0..d37b30d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -580,14 +580,10 @@ static int ptrace_resume(struct task_struct *child, long 
request,
if (unlikely(!arch_has_block_step()))
return -EIO;
child-ptrace |= PT_SINGLE_BLOCK;
-   user_enable_block_step(child);
} else if (is_singlestep(request) || is_sysemu_singlestep(request)) {
if (unlikely(!arch_has_single_step()))
return -EIO;
child-ptrace |= PT_SINGLE_STEP;
-   user_enable_single_step(child);
-   } else {
-   user_disable_single_step(child);
}
 
child-exit_code = data;
diff --git a/kernel/signal.c b/kernel/signal.c
index d0e0c67..bc40bd7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1840,6 +1840,18 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
}
 
tracehook_finish_stop();
+
+   if (current-ptrace  PT_SINGLE_BLOCK)
+   user_enable_block_step(current);
+   else if (current-ptrace  PT_SINGLE_STEP)
+   user_enable_single_step(current);
+   else {
+   user_disable_single_step(current);
+   /* if utrace needs the stepping it should reassert */
+   if (task_utrace_flags(current))
+   set_thread_flag(TIF_NOTIFY_RESUME);
+   }
+
/*
 * While in TASK_TRACED, we were considered frozen enough.
 * Now that we woke up, it's crucial if we're supposed to be
-- 
1.5.5.1




[RFC v2 00/19] utrace for 3.0 kernel

2011-06-30 Thread Oleg Nesterov
Another attempt. This version tries to decouple utrace and ptrace.
This way it is much simpler to follow the upstream changes, afaics.

TODO:

- The single-stepping updates in ptrace_resume() can race
  with utrace_reset()-user_disable_single_step().

  This was fixed by 20/20, but I noticed that this patch
  is buggy right before sending.

- Perhaps PTRACE_SYSEMU/TIF_SYSCALL_EMU logic was broken,
  I need to recheck.

- Testing.

Oleg.



[PATCH 06/19] wait_task_inactive: treat task-state and match_state as bitmasks

2011-06-30 Thread Oleg Nesterov
Change wait_task_inactive() to check state  match_state instead of
state == match_state. This should not make any difference, but this
allows us to add more stopped bits which can be set or cleared
independently.

IOW. wait_task_inactive() assumes that if task-state != 0, it can
only be changed to TASK_RUNNING. Currently this is true, and in this
case state  match_state continues to work. But, unlike the current
check, it also works if task-state has other bits set while the caller
is only interested in, say, __TASK_TRACED.

Note: I think wait_task_inactive() should be cleanuped upstrean anyway,
nowadays we have TASK_WAKING and task-state != 0 doesn't necessarily
mean it is TASK_RUNNING. It also makes sense to exclude the !TASK_REPORT
bits during the check. Finally, probably this patch makes sense anyway
even without utrace. For example, a stopped _and_ traced thread could
have task-state = TASK_STOPPED | TASK_TRACED, this can be useful.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/sched.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 3f2e502..ade7997 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2277,7 +2277,7 @@ unsigned long wait_task_inactive(struct task_struct *p, 
long match_state)
 * is actually now running somewhere else!
 */
while (task_running(rq, p)) {
-   if (match_state  unlikely(p-state != match_state))
+   if (match_state  !likely(p-state  match_state))
return 0;
cpu_relax();
}
-- 
1.5.5.1




[PATCH 04/19] introduce wake_up_quiescent()

2011-06-30 Thread Oleg Nesterov
No functional changes. Add the new helper, wake_up_quiescent(task, state),
which simply returns wake_up_state(task, state). Change all callers which
do wake_up_state(STOPPED/TRACED) to use the new helper. ptrace_stop() is
a bit special, it does __set_current_state(RUNNING) in the very unlikely
case, change it as well.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/signal.h |2 ++
 kernel/ptrace.c|2 +-
 kernel/signal.c|   12 ++--
 kernel/utrace.c|2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a822300..2be3712 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -239,6 +239,8 @@ static inline int valid_signal(unsigned long sig)
 struct timespec;
 struct pt_regs;
 
+extern int wake_up_quiescent(struct task_struct *p, unsigned int state);
+
 extern int next_signal(struct sigpending *pending, sigset_t *mask);
 extern int do_send_sig_info(int sig, struct siginfo *info,
struct task_struct *p, bool group);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9988b13..26ae214 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -566,7 +566,7 @@ static int ptrace_resume(struct task_struct *child, long 
request,
child-exit_code = data;
 
if (lock_task_sighand(child, flags)) {
-   wake_up_state(child, __TASK_TRACED);
+   wake_up_quiescent(child, __TASK_TRACED);
unlock_task_sighand(child, flags);
}
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 2138cee..4fcf1c7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -652,6 +652,14 @@ void signal_wake_up(struct task_struct *t, int resume)
 }
 
 /*
+ * wakes up the STOPPED/TRACED task, must be called with -siglock held.
+ */
+int wake_up_quiescent(struct task_struct *p, unsigned int state)
+{
+   return wake_up_state(p, state);
+}
+
+/*
  * Remove signals in mask from the pending set and queue.
  * Returns 1 if any signals were found.
  *
@@ -811,7 +819,7 @@ static int prepare_signal(int sig, struct task_struct *p, 
int from_ancestor_ns)
do {
task_clear_group_stop_pending(t);
rm_from_queue(SIG_KERNEL_STOP_MASK, t-pending);
-   wake_up_state(t, __TASK_STOPPED);
+   wake_up_quiescent(t, __TASK_STOPPED);
} while_each_thread(p, t);
 
/*
@@ -1800,7 +1808,7 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
do_notify_parent_cldstop(current, false, why);
 
spin_lock_irq(current-sighand-siglock);
-   __set_current_state(TASK_RUNNING);
+   wake_up_quiescent(current, __TASK_TRACED);
spin_unlock_irq(current-sighand-siglock);
 
if (clear_code)
diff --git a/kernel/utrace.c b/kernel/utrace.c
index 6e7fafb..d7c547c 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -648,7 +648,7 @@ static void utrace_wakeup(struct task_struct *target, 
struct utrace *utrace)
 {
lockdep_assert_held(utrace-lock);
spin_lock_irq(target-sighand-siglock);
-   wake_up_state(target, __TASK_TRACED);
+   wake_up_quiescent(target, __TASK_TRACED);
spin_unlock_irq(target-sighand-siglock);
 }
 
-- 
1.5.5.1




[PATCH 09/19] tracehooks: kill tracehook_finish_jctl(), add tracehook_finish_stop()

2011-06-30 Thread Oleg Nesterov
tracehook_finish_jctl() is needed to avoid the races with SIGKILL
which wakes up UTRACED task, and thus it should be called every time
after the STOPPED/TRACED/UTRACED returns from schedule(), remember
that TASK_UTRACED can be added while the task is STOPPED/UTRACED.

- rename it to tracehook_finish_stop(),jctl no longer matches the
  reality.

- change do_signal_state() to call this helper right after schedule(),
  otherwise this logic is broken by the upstream changes

- now that utrace doesn't control TASK_TRACED bit, ptrace_stop() must
  call this helper too.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/tracehook.h |6 +++---
 kernel/signal.c   |5 +++--
 kernel/utrace.c   |2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 7d7bdde..3c7b6b3 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -528,11 +528,11 @@ static inline int tracehook_get_signal(struct task_struct 
*task,
 }
 
 /**
- * tracehook_finish_jctl - report about return from job control stop
+ * tracehook_finish_stop - report about return from STOPPED/TRACED
  *
- * This is called by do_signal_stop() after wakeup.
+ * This is called by do_signal_stop() and ptrace_stop after wakeup.
  */
-static inline void tracehook_finish_jctl(void)
+static inline void tracehook_finish_stop(void)
 {
if (task_utrace_flags(current))
utrace_finish_stop();
diff --git a/kernel/signal.c b/kernel/signal.c
index 4fcf1c7..a7979ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1816,6 +1816,7 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
read_unlock(tasklist_lock);
}
 
+   tracehook_finish_stop();
/*
 * While in TASK_TRACED, we were considered frozen enough.
 * Now that we woke up, it's crucial if we're supposed to be
@@ -1952,6 +1953,8 @@ retry:
/* Now we don't run again until woken by SIGCONT or SIGKILL */
schedule();
 
+   tracehook_finish_stop();
+
spin_lock_irq(current-sighand-siglock);
} else {
ptrace_stop(current-group_stop  GROUP_STOP_SIGMASK,
@@ -1974,8 +1977,6 @@ retry:
 
spin_unlock_irq(current-sighand-siglock);
 
-   tracehook_finish_jctl();
-
return 1;
 }
 
diff --git a/kernel/utrace.c b/kernel/utrace.c
index be98607..daa96b9 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -741,7 +741,7 @@ static bool utrace_reset(struct task_struct *task, struct 
utrace *utrace)
 void utrace_finish_stop(void)
 {
/*
-* If we were task_is_traced() and then SIGKILL'ed, make
+* If we were task_is_utraced() and then SIGKILL'ed, make
 * sure we do nothing until the tracer drops utrace-lock.
 */
if (unlikely(__fatal_signal_pending(current))) {
-- 
1.5.5.1




[PATCH 10/19] teach wake_up_quiescent() to do selective wake_up

2011-06-30 Thread Oleg Nesterov
Both utrace and ptrace can want the same thread to be quiescent, in this
case its state is TASK_TRACED | TASK_UTRACED. And this also means that
this task must not run unless both utrace and ptrace resume it.

Change wake_up_quiescent(p, state) to do p-state = ~state and return
false unless there is no more quiescent bits in task-state.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/signal.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index a7979ad..57552e6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -651,11 +651,26 @@ void signal_wake_up(struct task_struct *t, int resume)
kick_process(t);
 }
 
+#define STATE_QUIESCENT(__TASK_STOPPED | __TASK_TRACED | 
__TASK_UTRACED)
 /*
  * wakes up the STOPPED/TRACED task, must be called with -siglock held.
  */
 int wake_up_quiescent(struct task_struct *p, unsigned int state)
 {
+   unsigned int quiescent = (p-state  STATE_QUIESCENT);
+
+   WARN_ON(state  ~(STATE_QUIESCENT | TASK_INTERRUPTIBLE));
+
+   if (quiescent) {
+   state = ~TASK_INTERRUPTIBLE;
+   if ((quiescent  ~state) != 0) {
+   p-state = ~state;
+   WARN_ON(!(p-state  STATE_QUIESCENT));
+   WARN_ON(!(p-state  TASK_WAKEKILL));
+   return 0;
+   }
+   }
+
return wake_up_state(p, state);
 }
 
-- 
1.5.5.1




[PATCH 11/19] ptrace_stop: do not assume the task is running after wake_up_quiescent()

2011-06-30 Thread Oleg Nesterov
If ptrace_stop() sets TASK_TRACED and then detects we should not stop,
it can race with utrace_do_stop() which can see TASK_TRACED and add
TASK_UTRACED. In this case we should stop for utrace needs.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/signal.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 57552e6..89e691d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1829,6 +1829,14 @@ static void ptrace_stop(int exit_code, int why, int 
clear_code, siginfo_t *info)
if (clear_code)
current-exit_code = 0;
read_unlock(tasklist_lock);
+
+   /*
+* It is possible that __TASK_UTRACED was added by utrace
+* while we were __TASK_TRACED and before we take -siglock
+* for wake_up_quiescent(), we need to block in this case.
+* Otherwise this is unnecessary but absolutely harmless.
+*/
+   schedule();
}
 
tracehook_finish_stop();
-- 
1.5.5.1




[PATCH 17/19] teach ptrace_set_syscall_trace() to play well with utrace

2011-06-30 Thread Oleg Nesterov
1. ptrace_set_syscall_trace(true)-set_tsk_thread_flag(TIF_SYSCALL_TRACE)
   can race with utrace_control()-utrace_reset() path which can miss
   PT_SYSCALL_TRACE and clear TIF_SYSCALL_TRACE after it was already set.

2. ptrace_set_syscall_trace(false) clears TIF_SYSCALL_TRACE and this is
   not utrace-friendly, it can need this flag.

Change ptrace_set_syscall_trace() to take task_utrace_lock(), this is
enough to fix the 1st problem. Check task_utrace_flags() before clearing
TIF_SYSCALL_TRACE, this fixes 2.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/ptrace.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0825a01..209ea2d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -23,6 +23,7 @@
 #include linux/uaccess.h
 #include linux/regset.h
 #include linux/hw_breakpoint.h
+#include linux/utrace.h
 
 static void ptrace_signal_wake_up(struct task_struct *p, int quiescent)
 {
@@ -39,13 +40,16 @@ static void ptrace_signal_wake_up(struct task_struct *p, 
int quiescent)
 
 static void ptrace_set_syscall_trace(struct task_struct *p, bool on)
 {
+   task_utrace_lock(p);
if (on) {
p-ptrace |= PT_SYSCALL_TRACE;
set_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
} else {
p-ptrace = ~PT_SYSCALL_TRACE;
-   clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
+   if (!(task_utrace_flags(p)  UTRACE_EVENT_SYSCALL))
+   clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
}
+   task_utrace_unlock(p);
 }
 
 /*
-- 
1.5.5.1




[PATCH 18/19] introduce PT_SINGLE_STEP and PT_SINGLE_BLOCK

2011-06-30 Thread Oleg Nesterov
Add the new internal ptrace flags, PT_SINGLE_STEP and PT_SINGLE_BLOCK.

Like PT_SYSCALL_TRACE, this is needed to avoid the unnecessary ptrace
reports when TIF_SINGLESTEP was set by another engine, not by ptrace.
Also, we need these bits to coordinate the user_*_single_step() calls
from ptrace and utrace.

TODO: update the !x86 ptrace code which does user_disable_single_step().

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 arch/x86/kernel/ptrace.c  |1 +
 include/linux/ptrace.h|5 -
 include/linux/tracehook.h |7 +--
 kernel/ptrace.c   |3 +++
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 807c2a2..7ab475f 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -807,6 +807,7 @@ static int ioperm_get(struct task_struct *target,
  */
 void ptrace_disable(struct task_struct *child)
 {
+   child-ptrace = ~(PT_SINGLE_STEP | PT_SINGLE_BLOCK);
user_disable_single_step(child);
 #ifdef TIF_SYSCALL_EMU
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 98d995d..65b1e4f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -91,6 +91,8 @@
 #define PT_TRACE_MASK  0x03f4
 
 #define PT_SYSCALL_TRACE   0x0001
+#define PT_SINGLE_STEP 0x0002
+#define PT_SINGLE_BLOCK0x0004
 
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT  31
@@ -189,7 +191,8 @@ static inline void ptrace_init_task(struct task_struct 
*child, bool ptrace)
child-ptrace = 0;
if (unlikely(ptrace)  (current-ptrace  PT_PTRACED)) {
child-ptrace = current-ptrace;
-   child-ptrace = ~PT_SYSCALL_TRACE;
+   child-ptrace =
+   ~(PT_SYSCALL_TRACE | PT_SINGLE_STEP | PT_SINGLE_BLOCK);
__ptrace_link(child, current-parent);
}
 
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 6ce7a37..06edb52 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -121,6 +121,9 @@ static inline __must_check int 
tracehook_report_syscall_entry(
return 0;
 }
 
+#define ptrace_wants_step()\
+   (current-ptrace  (PT_SINGLE_STEP | PT_SINGLE_BLOCK))
+
 /**
  * tracehook_report_syscall_exit - task has just finished a system call
  * @regs:  user register state of current task
@@ -143,7 +146,7 @@ static inline void tracehook_report_syscall_exit(struct 
pt_regs *regs, int step)
if (task_utrace_flags(current)  UTRACE_EVENT(SYSCALL_EXIT))
utrace_report_syscall_exit(regs);
 
-   if (step) {
+   if (step  ptrace_wants_step()) {
siginfo_t info;
user_single_step_siginfo(current, regs, info);
force_sig_info(SIGTRAP, info, current);
@@ -436,7 +439,7 @@ static inline void tracehook_signal_handler(int sig, 
siginfo_t *info,
 {
if (task_utrace_flags(current))
utrace_signal_handler(current, stepping);
-   if (stepping)
+   if (stepping  ptrace_wants_step())
ptrace_notify(SIGTRAP);
 }
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 209ea2d..44908d0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -575,13 +575,16 @@ static int ptrace_resume(struct task_struct *child, long 
request,
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
 #endif
 
+   child-ptrace = ~(PT_SINGLE_STEP | PT_SINGLE_BLOCK);
if (is_singleblock(request)) {
if (unlikely(!arch_has_block_step()))
return -EIO;
+   child-ptrace |= PT_SINGLE_BLOCK;
user_enable_block_step(child);
} else if (is_singlestep(request) || is_sysemu_singlestep(request)) {
if (unlikely(!arch_has_single_step()))
return -EIO;
+   child-ptrace |= PT_SINGLE_STEP;
user_enable_single_step(child);
} else {
user_disable_single_step(child);
-- 
1.5.5.1




[PATCH 12/19] get_signal_to_deliver: restructure utrace/ptrace signal reporting

2011-06-30 Thread Oleg Nesterov
get_signal_to_deliver() assumes that either tracehook_get_signal() does
nothing (without CONFIG_UTRACE), or it also reports the signal to ptrace
engine implemented on top of utrace. Now that ptrace works independently
this doesn't work.

Change the code to call ptrace_signal() after tracehook_get_signal().

Move -ptrace check from ptrace_signal() to get_signal_to_deliver(),
we do not want to change *return_ka if it was initialized by utrace
and the task is not traced.

IOW, roughly, ptrace acts as if it is the last attached engine, it
takes the final decision about the signal.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 kernel/signal.c |   24 +++-
 1 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 89e691d..d0e0c67 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2006,9 +2006,6 @@ retry:
 static int ptrace_signal(int signr, siginfo_t *info,
 struct pt_regs *regs, void *cookie)
 {
-   if (!task_ptrace(current))
-   return signr;
-
ptrace_signal_deliver(regs, cookie);
 
/* Let the debugger run.  */
@@ -2110,6 +2107,7 @@ relock:
signr = tracehook_get_signal(current, regs, info, return_ka);
if (unlikely(signr  0))
goto relock;
+
if (unlikely(signr != 0))
ka = return_ka;
else {
@@ -2117,18 +2115,18 @@ relock:
 GROUP_STOP_PENDING)  do_signal_stop(0))
goto relock;
 
-   signr = dequeue_signal(current, current-blocked,
-  info);
+   signr = dequeue_signal(current, current-blocked, 
info);
 
-   if (!signr)
-   break; /* will return 0 */
+   ka = sighand-action[signr-1];
+   }
 
-   if (signr != SIGKILL) {
-   signr = ptrace_signal(signr, info,
- regs, cookie);
-   if (!signr)
-   continue;
-   }
+   if (!signr)
+   break; /* will return 0 */
+
+   if (signr != SIGKILL  current-ptrace) {
+   signr = ptrace_signal(signr, info, regs, cookie);
+   if (!signr)
+   continue;
 
ka = sighand-action[signr-1];
}
-- 
1.5.5.1




Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework

2011-06-30 Thread Oleg Nesterov
On 06/21, Roland McGrath wrote:

  OK, so here's my (hacky) idea:
  (1) Forget ptrace-via-utrace.  Have utrace be a separate thing.  This
  way the recent ptrace changes won't matter.

This is what V2 does.

  (2) But, what about ptrace co-existing well with utrace?  Make them
  mutually exclusive - a ptraced-process can't be utraced and a
  utraced-process can't be ptraced.

 We had this situation before for a while.  It has the substantial downside
 that e.g. you cannot do any system-wide systemtap tracing without making
 all strace and gdb use impossible.

Yes, we can't make them mutually exclusive, this can't work. So V2 tries
to teach them play well together.

  Assuming the above is a semi-reasonable idea, it might be a lot less
  work than updating the ptrace-via-utrace code to handle the new ptrace
  changes.

 That's for Oleg to say.  (Sorry, Oleg. ;-)

Oh, I am not sure what is simpler ;)

Oleg.



[PATCH 0/4] utrace for 3.0 kernel

2011-06-20 Thread Oleg Nesterov
Hello.

Utrace patches for 3.0 kernel

0001-ptrace-temporary-revert-the-recent-ptrace-jobctl-re.patch
0002-tracehooks-preparation-for-ptrace-utrace.patch
0003-utrace-core.patch
0004-implement-utrace-ptrace.patch

also available in the following git branch

git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git utrace-3.0

Note: 1/4 is the temporary hack to keep utrace working, we need to
completely rework ptrace/utrace interaction.

Oleg.



[PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework

2011-06-20 Thread Oleg Nesterov
Temporary revert the following patches to keep utrace/utrace-ptrace working:

40ae717d1e78d982bd469b2013a4cbf4ec1ca434
ptrace: fix signal-wait_chldexit usage in 
task_clear_group_stop_trapping()

321fb561971ba0f10ce18c0f8a4b9fbfc7cef4b9
ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/

ee77f075921730b2b465880f9fd4367003bdab39
signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED

780006eac2fe7f4d2582da16a096e5a44c4767ff
signal: do_signal_stop: Remove the unneeded 
task_clear_group_stop_pending()

244056f9dbbc6dc4126a301c745fa3dd67d8af3c
job control: Don't send duplicate job control stop notification while 
ptraced

ceb6bd67f9b9db765e1c29405f26e8460391badd
job control: Notify the real parent of job control events regardless of 
ptrace

62bcf9d992ecc19ea4f37ff57ee0be3e843e
job control: Job control stop notifications should always go to the 
real parent

75b95953a56969a990e6ce154b260be83818fe71
job control: Add @for_ptrace to do_notify_parent_cldstop()

45cb24a1da53beb70f09efccc0373f6a47a9efe0
job control: Allow access to job control events through ptracees

9b84cca2564b9a5b2d064fb44d2a55a5b44473a0
job control: Fix ptracer wait(2) hang and explain notask_error clearing

408a37de6c95832a4880a88a742f89f0cc554d06
job control: Don't set group_stop exit_code if re-entering job control 
stop

0e9f0a4abfd80f8adca624538d479d95159b16d8
ptrace: Always put ptracee into appropriate execution state

e3bd058f62896ec7a2c605ed62a0a811e9147947
ptrace: Collapse ptrace_untrace() into __ptrace_unlink()

d79fdd6d96f46fabb779d86332e3677c6f5c2a4f
ptrace: Clean transitions between TASK_STOPPED and TRACED

5224fa3660ad3881d2f2ad726d22614117963f10
ptrace: Make do_signal_stop() use ptrace_stop() if the task is being 
ptraced

0ae8ce1c8c5b9007ce6bfc83ec2aa0dfce5bbed3
ptrace: Participate in group stop from ptrace_stop() iff the task is 
trapping for group stop

39efa3ef3a376a4e53de2f82fc91182459d34200
signal: Use GROUP_STOP_PENDING to stop once for a single group stop

e5c1902e9260a0075ea52cb5ef627a8d9aaede89
signal: Fix premature completion of group stop when interfered by ptrace

fe1bc6a0954611b806f9e158eb0817cf8ba21660
ptrace: Add @why to ptrace_stop()

edf2ed153bcae52de70db00a98b0e81a5668e563
ptrace: Kill tracehook_notify_jctl()

This obviously reverts some user-visible fixes, but the fixed problems
are very old and minor, they were never reported. In the long term we
need another solution.

Signed-off-by: Oleg Nesterov o...@redhat.com
---
 fs/exec.c |1 -
 include/linux/sched.h |   17 +--
 include/linux/tracehook.h |   27 
 kernel/exit.c |   77 ++-
 kernel/ptrace.c   |  116 +---
 kernel/signal.c   |  339 +
 6 files changed, 148 insertions(+), 429 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 6075a1e..82b5379 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1769,7 +1769,6 @@ static int zap_process(struct task_struct *start, int 
exit_code)
 
t = start;
do {
-   task_clear_group_stop_pending(t);
if (t != current  t-mm) {
sigaddset(t-pending.signal, SIGKILL);
signal_wake_up(t, 1);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a837b20..6c42e24 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -663,8 +663,9 @@ struct signal_struct {
  * Bits in flags field of signal_struct.
  */
 #define SIGNAL_STOP_STOPPED0x0001 /* job control stop in effect */
-#define SIGNAL_STOP_CONTINUED  0x0002 /* SIGCONT since WCONTINUED reap */
-#define SIGNAL_GROUP_EXIT  0x0004 /* group exit in progress */
+#define SIGNAL_STOP_DEQUEUED   0x0002 /* stop signal dequeued */
+#define SIGNAL_STOP_CONTINUED  0x0004 /* SIGCONT since WCONTINUED reap */
+#define SIGNAL_GROUP_EXIT  0x0008 /* group exit in progress */
 /*
  * Pending notifications to parent.
  */
@@ -1283,7 +1284,6 @@ struct task_struct {
int exit_state;
int exit_code, exit_signal;
int pdeath_signal;  /*  The signal sent when the parent dies  */
-   unsigned int group_stop;/* GROUP_STOP_*, siglock protected */
/* ??? */
unsigned int personality;
unsigned did_exec:1;
@@ -1803,17 +1803,6 @@ extern void thread_group_times(struct task_struct *p, 
cputime_t *ut, cputime_t *
 #define tsk_used_math(p) ((p)-flags  PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
-/*
- * task-group_stop flags
- */
-#define GROUP_STOP_SIGMASK 0x/* signr of the last group stop */
-#define GROUP_STOP_PENDING (1  16) /* task should

[PATCH 4/4] implement utrace-ptrace

2011-06-20 Thread Oleg Nesterov
The patch adds the new file, kernel/ptrace-utrace.c, which contains
the new implementation of ptrace over utrace.

It's supposed to be an invisible implementation change, nothing should
change to userland when CONFIG_UTRACE is enabled.

Signed-off-by: Roland McGrath rol...@redhat.com
Signed-off-by: Oleg Nesterov o...@redhat.com
---
 include/linux/ptrace.h |3 +
 kernel/Makefile|1 +
 kernel/ptrace-utrace.c | 1173 
 kernel/ptrace.c|  638 +-
 kernel/utrace.c|   16 +
 5 files changed, 1513 insertions(+), 318 deletions(-)
 create mode 100644 kernel/ptrace-utrace.c

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 446ed8f..5b0562e 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -101,6 +101,9 @@
 
 extern bool __ptrace_detach(struct task_struct *tracer,
struct task_struct *tracee);
+extern int ptrace_traceme(void);
+extern int ptrace_attach(struct task_struct *tsk);
+extern void ptrace_notify_stop(struct task_struct *tracee);
 
 extern long arch_ptrace(struct task_struct *child, long request,
unsigned long addr, unsigned long data);
diff --git a/kernel/Makefile b/kernel/Makefile
index 4a22e81..5c280dc 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -69,6 +69,7 @@ obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
 obj-$(CONFIG_SMP) += stop_machine.o
 obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
 obj-$(CONFIG_UTRACE) += utrace.o
+obj-$(CONFIG_UTRACE) += ptrace-utrace.o
 obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
 obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
 obj-$(CONFIG_AUDIT_WATCH) += audit_watch.o
diff --git a/kernel/ptrace-utrace.c b/kernel/ptrace-utrace.c
new file mode 100644
index 000..7a9b396
--- /dev/null
+++ b/kernel/ptrace-utrace.c
@@ -0,0 +1,1173 @@
+/*
+ * linux/kernel/ptrace.c
+ *
+ * (C) Copyright 1999 Linus Torvalds
+ *
+ * Common interfaces for ptrace() which we do not want
+ * to continually duplicate across every architecture.
+ */
+
+#include linux/capability.h
+#include linux/module.h
+#include linux/sched.h
+#include linux/errno.h
+#include linux/mm.h
+#include linux/highmem.h
+#include linux/pagemap.h
+#include linux/ptrace.h
+#include linux/utrace.h
+#include linux/security.h
+#include linux/signal.h
+#include linux/audit.h
+#include linux/pid_namespace.h
+#include linux/syscalls.h
+#include linux/uaccess.h
+
+/*
+ * unptrace a task: move it back to its original parent and
+ * remove it from the ptrace list.
+ *
+ * Must be called with the tasklist lock write-held.
+ */
+void __ptrace_unlink(struct task_struct *child)
+{
+   BUG_ON(!child-ptrace);
+
+   child-ptrace = 0;
+   child-parent = child-real_parent;
+   list_del_init(child-ptrace_entry);
+}
+
+struct ptrace_context {
+   int options;
+
+   int signr;
+   siginfo_t   *siginfo;
+
+   int stop_code;
+   unsigned long   eventmsg;
+
+   enum utrace_resume_action   resume;
+};
+
+#define PT_UTRACED 0x1000
+
+#define PTRACE_O_SYSEMU0x100
+#define PTRACE_O_DETACHED  0x200
+
+#define PTRACE_EVENT_SYSCALL   (1  16)
+#define PTRACE_EVENT_SIGTRAP   (2  16)
+#define PTRACE_EVENT_SIGNAL(3  16)
+/* events visible to user-space */
+#define PTRACE_EVENT_MASK  0x
+
+static inline bool ptrace_event_pending(struct ptrace_context *ctx)
+{
+   return ctx-stop_code != 0;
+}
+
+static inline int get_stop_event(struct ptrace_context *ctx)
+{
+   return ctx-stop_code  8;
+}
+
+static inline void set_stop_code(struct ptrace_context *ctx, int event)
+{
+   ctx-stop_code = (event  8) | SIGTRAP;
+}
+
+static inline struct ptrace_context *
+ptrace_context(struct utrace_engine *engine)
+{
+   return engine-data;
+}
+
+static const struct utrace_engine_ops ptrace_utrace_ops; /* forward decl */
+
+static struct utrace_engine *ptrace_lookup_engine(struct task_struct *tracee)
+{
+   return utrace_attach_task(tracee, UTRACE_ATTACH_MATCH_OPS,
+   ptrace_utrace_ops, NULL);
+}
+
+static int utrace_barrier_uninterruptible(struct task_struct *target,
+   struct utrace_engine *engine)
+{
+   for (;;) {
+   int err = utrace_barrier(target, engine);
+
+   if (err != -ERESTARTSYS)
+   return err;
+
+   schedule_timeout_uninterruptible(1);
+   }
+}
+
+static struct utrace_engine *
+ptrace_reuse_engine(struct task_struct *tracee)
+{
+   struct utrace_engine *engine;
+   struct ptrace_context *ctx;
+   int err = -EPERM;
+
+   engine = ptrace_lookup_engine(tracee);
+   if (IS_ERR(engine))
+   return engine;
+
+   ctx

Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework

2011-06-20 Thread Oleg Nesterov
On 06/20, Kyle McMartin wrote:

 On Mon, Jun 20, 2011 at 06:07:44PM +0200, Oleg Nesterov wrote:
  Temporary revert the following patches to keep utrace/utrace-ptrace working:
 
  huge list of patches here
 
  This obviously reverts some user-visible fixes, but the fixed problems
  are very old and minor, they were never reported. In the long term we
  need another solution.
 

 Dude, that's just not acceptable, that's way too much offset to deal with
 against upstream,

Yes, this reverts 20 patches. But they only touch the ptrace/stop paths,
there won't be more changes in this area until 3.1.

 especially since it's looking like uprobes will get
 merged in 3.1...

Probably yes.

In any case, this series should be dropped when fedora switches to 3.1.
I'll try to do something more clever for 3.1 if utrace is still needed.
Until then we need something for systemtap...

Oleg.



[PATCH 0/5] ptrace-utrace: fix exit_ptrace() vs ptrace_report_signal() races

2010-12-10 Thread Oleg Nesterov
In short: exit_ptrace()-ptrace_detach_task() is very wrong when it
tries to detach the !stopped tracee, we can not trust get_stop_event()
in this case.

This means that in the case like

ptrace(PTRACE_CONT, ..., SIGXXX);
exit(); // --- calls ptrace_detach_task()

the tracee can miss SIGXXX if ptrace_detach_task() does
utrace_control(UTRACE_DETACH) before the tracee calls -report_signal().

5/5 is the actual fix. 1-4 are preparations to simplify the review
and document the changes.

Oleg.



[PATCH 3/5] ptrace-utrace: don't assume resume != UTRACE_RESUME means stepping

2010-12-10 Thread Oleg Nesterov
No functional changes. Currently ptrace_report_syscall_exit() and
ptrace_report_signal(UTRACE_SIGNAL_HANDLER) can see either UTRACE_RESUME
or UTRACE_BLOCKSTEP/UTRACE_SINGLESTEP, but we are going to change this.
Change the code to not assume that resume != UTRACE_RESUME means stepping.
Add the trivial helper, is_step_resume(), which does the check.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/ptrace-utrace.c |   15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

--- kstub/kernel/ptrace-utrace.c~3_resume_step_helper   2010-09-20 
03:53:31.0 +0200
+++ kstub/kernel/ptrace-utrace.c2010-09-20 03:53:31.0 +0200
@@ -418,6 +418,11 @@ static u32 ptrace_report_syscall_entry(u
return UTRACE_SYSCALL_RUN | UTRACE_STOP;
 }
 
+static inline bool is_step_resume(enum utrace_resume_action resume)
+{
+   return resume == UTRACE_BLOCKSTEP || resume == UTRACE_SINGLESTEP;
+}
+
 static u32 ptrace_report_syscall_exit(u32 action, struct utrace_engine *engine,
  struct pt_regs *regs)
 {
@@ -426,10 +431,7 @@ static u32 ptrace_report_syscall_exit(u3
if (ptrace_event_pending(ctx))
return UTRACE_STOP;
 
-   if (ctx-resume != UTRACE_RESUME) {
-   WARN_ON(ctx-resume != UTRACE_BLOCKSTEP 
-   ctx-resume != UTRACE_SINGLESTEP);
-
+   if (is_step_resume(ctx-resume)) {
ctx-signr = SIGTRAP;
return UTRACE_INTERRUPT;
}
@@ -517,10 +519,7 @@ static u32 ptrace_report_signal(u32 acti
if (WARN_ON(ctx-siginfo))
ctx-siginfo = NULL;
 
-   if (resume != UTRACE_RESUME) {
-   WARN_ON(resume != UTRACE_BLOCKSTEP 
-   resume != UTRACE_SINGLESTEP);
-
+   if (is_step_resume(resume)) {
set_stop_code(ctx, PTRACE_EVENT_SIGTRAP);
return UTRACE_STOP | UTRACE_SIGNAL_IGN;
}



[PATCH 5/5] ptrace-utrace: fix exit_ptrace() vs ptrace_report_signal() races

2010-12-10 Thread Oleg Nesterov
Finally, the actual fix.

ptrace_detach_task(sig = -1) is very buggy. Somehow I completely forgot
that implicit detach can race with the running tracee. Depending on how
exactly it races with ptrace_report_signal() we can have the following
problems:

1) If the tracer exits right after ptrace(PTRACE_CONT, SIGXXX)
   the tracee can miss this signal.

2) the tracee can report a signal and stop after it was already
   detached.

Change ptrace_detach_task(sig = -1) to set ctx-resume = UTRACE_DETACH
before anything else, change ptrace_report_signal() to check ctx-resume
before reporting a signal. This fixes the 2nd problem.

To fix the 1st problem, change !explicit case to check -siginfo != NULL
instead of relying on get_stop_event() which can't work with the running
tracee.

Reported-by: Glen Johnson bugpr...@us.ibm.com
Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/ptrace-utrace.c |   41 ++---
 1 file changed, 34 insertions(+), 7 deletions(-)

--- kstub/kernel/ptrace-utrace.c~5_implicit_detach_races2010-09-20 
03:53:32.0 +0200
+++ kstub/kernel/ptrace-utrace.c2010-09-20 03:53:32.0 +0200
@@ -262,7 +262,7 @@ static void ptrace_detach_task(struct ta
 * If true, the caller is PTRACE_DETACH, otherwise
 * the tracer detaches implicitly during exit.
 */
-   bool voluntary = (sig = 0);
+   bool explicit = (sig = 0);
struct utrace_engine *engine = ptrace_lookup_engine(tracee);
enum utrace_resume_action action = UTRACE_DETACH;
struct ptrace_context *ctx;
@@ -272,24 +272,46 @@ static void ptrace_detach_task(struct ta
 
ctx = ptrace_context(engine);
 
-   if (sig) {
+   if (!explicit) {
+   int err;
 
+   /*
+* We are going to detach, the tracee can be running.
+* Ensure ptrace_report_signal() won't report a signal.
+*/
+   ctx-resume = UTRACE_DETACH;
+   err = utrace_barrier_uninterruptible(tracee, engine);
+
+   if (!err  ctx-siginfo) {
+   /*
+* The tracee has already reported a signal
+* before utrace_barrier().
+*
+* Resume it like we do in PTRACE_EVENT_SIGNAL
+* case below. The difference is that we can race
+* with ptrace_report_signal() if the tracee is
+* running but this doesn't matter. In any case
+* UTRACE_SIGNAL_REPORT must be pending and it
+* can return nothing but UTRACE_DETACH.
+*/
+   action = UTRACE_RESUME;
+   }
+
+   } else if (sig) {
switch (get_stop_event(ctx)) {
case PTRACE_EVENT_SYSCALL:
-   if (voluntary)
-   send_sig_info(sig, SEND_SIG_PRIV, tracee);
+   send_sig_info(sig, SEND_SIG_PRIV, tracee);
break;
 
case PTRACE_EVENT_SIGNAL:
-   if (voluntary)
-   ctx-signr = sig;
+   ctx-signr = sig;
ctx-resume = UTRACE_DETACH;
action = UTRACE_RESUME;
break;
}
}
 
-   ptrace_wake_up(tracee, engine, action, voluntary);
+   ptrace_wake_up(tracee, engine, action, explicit);
 
if (action != UTRACE_DETACH)
ctx-options = PTRACE_O_DETACHED;
@@ -553,6 +575,11 @@ static u32 ptrace_report_signal(u32 acti
}
 
WARN_ON(ctx-siginfo);
+
+   /* Raced with the exiting tracer ? */
+   if (resume == UTRACE_DETACH)
+   return action;
+
ctx-siginfo = info;
/*
 * ctx-siginfo points to the caller's stack.



Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-25 Thread Oleg Nesterov
On 10/22, Oleg Nesterov wrote:

 On 10/19, Roland McGrath wrote:
 
   Yes. But, with this patch, in this case
   utrace_barrier()-get_utrace_lock(attached = false) returns success
   and then we check utrace-reporting == engine.
 
  I guess that's OK if -reporting == engine is always set when any kind of
  callback might be in progress.

 I think utrace_maybe_reap() is the only exception,

   (Hmm. Probably utrace-reap = T case needs more attention,
utrace_maybe_reap() doesn't set utrace-reporting).
 
  That would be a problem.

 Yes, but the current code has the same problem? Hmm, I need to
 recheck this all once again tomorrow. Suddenly I started to suspect
 that even -ops == NULL case is not right. And the more I read this
 code now, the more I confused.

Yes. And get_utrace_lock() must not succeed if utrace-reap == T
c93fecc9 makes things worse. With this patch utrace_barrier() can
return -ESRCH while -report_death() or report_reap() is in progress.

But even before that comment, I do not think that !engine-ops check
can prevent the race with -report_reap(), we need at least mb()
in utrace_maybe_reap() before engine-ops = NULL.

 But in any case. If engine-ops == utrace_detached_ops and
 utrace-reporting != engine, I do not see why utrace_barrier() can't
 return success, even if utrace-reap is set.

No! it must not return success! I still think it should not spin if
-reporting != engine, but it should not return zero. Somehow I forgot
that utrace_barrier() != reporting != engine.

This means that my patch is very wrong.

Oleg.



Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-10-22 Thread Oleg Nesterov
On 10/22, Roland McGrath wrote:

  Hmm. If task is not stopped then it is current (except
  utrace_control(DETACH) can play with the dying task).

 Right, asynchronous detach was the problematic case I was concerned with.

but asynchronous detach doesn't do utrace_reset(), unless the tracee
is stopped or exiting (-exit_state != 0).

 But the whole problem we have is that we aren't getting to
 that path when we've done a detach, right?

Confused. We already discussed this before,

 OK, I'll do some testing and resend right now. In UTRACE_DETACH case
 reset can be true but the tracee is running. But I don't think it
 makes sense to check target-exit_state == 0, correct?

I think it's OK if it's running with -exit_state set (or even with just
PF_EXITING set), i.e. already in kernel mode and never going back to
user mode.  (Then it's essentially equivalent to calling user_*_step
while racing with a SIGKILL death, which has to be OK.)  Any other kind
of running would not be OK.

Probably we misunderstood each other. In any case I agree, that patch
was not good.

Oleg.



Re: utrace unwanted traps

2010-10-21 Thread Oleg Nesterov
On 10/19, Roland McGrath wrote:

   But that's how it used to be, and then we changed it.
   So it merits digging up our old discussion threads that led to doing that
   and see what all we had in mind then.
 
  I tried to grep my mail archives and google, but failed to find anything
  related.

 Based on the date of the commit you just mentioned, I browsed in:
   https://www.redhat.com/archives/utrace-devel/2009-October/thread.html
 and saw:
   https://www.redhat.com/archives/utrace-devel/2009-October/msg00287.html
 which is the day before that commit.  I didn't read through all that to
 refamiliarize myself with all the details.

Yes, I saw this thread, but didn't read it carefully. When I read it again,
I started to believe that yes, it can explain the current -resume logic.

However, I think this is false alarm. With a lot of efforts, I seem to
fully understand our discussion.

In short: I believe we never discussed why utrace-resume should be
*STEP, especially if task is stopped or it is going to stop. Although
I recall I thought this is obviously nice when I tried to review
utrace: sticky resume action.


As for this thread. Yes, I tried to suggest the (ugly)
utrace-set_singlestep which should be checked after wakeup. But my
motivation was quite different, what I was trying to preserve was
TIF_SINGLESTEP, not resume_action. This is why the top email says
I no longer think utrace_control() should just turn SINGLESTEP into
REPORT silently.


Please recall that, by that time

1. utrace_control(SINGLESTEP) called user_enable_single_step()

2. To handle the stepping over syscall correctly, both ptrace
   and ptrace-utrace relied on syscall_trace_leave() paths
   which checked TIF_SINGLESTEP to send SIGTRAP if needed.

1 was wrong, user_enable_single_step() was called under utrace-lock.
It was decided that the tracee itself should call this helper before
it returns to user-mode. To implement this, we could change
utrace_control(SINGLESTEP) to set TIF_NOTIFY_RESUME, so that
-report_quiesce() could return UTRACE_SINGLESTEP.

However, this breaks 2, the stepping over syscall. By the time
-report_quiesce() is called we already passed syscall_trace_leave()
without TIF_SINGLESTEP, and -report_quiesce() can't just return
UTRACE_SINGLESTEP but otoh it can't know that we should synthesize
a trap before return to user-mode.


So. I do not think there is any reason to ever keep UTRACE_*STEP in
utrace-resume. Except, if utrace_control(STEP) sets -resume = STEP
before wakeup, we avoid the unnecessary reporting loop in utrace_resume.
But at the same time, this leads to the problems we are discussing.


I even tested the kernel with the patch below, everything _seems_
to work (not that I think this can proves something).

What do you think?

Oleg.

--- kstub/kernel/utrace.c~XXX_RESUME2010-10-20 18:18:40.0 +0200
+++ kstub/kernel/utrace.c   2010-10-21 18:48:52.0 +0200
@@ -768,11 +768,13 @@ relock:
/*
 * Ensure a reporting pass when we're resumed.
 */
-   utrace-resume = action;
if (action == UTRACE_INTERRUPT)
set_thread_flag(TIF_SIGPENDING);
-   else
+   else {
+   action = UTRACE_REPORT;
set_thread_flag(TIF_NOTIFY_RESUME);
+   }
+   utrace-resume = action;
}
 
/*
@@ -1195,6 +1197,7 @@ int utrace_control(struct task_struct *t
 * In that case, utrace_get_signal() will be reporting soon.
 */
clear_engine_wants_stop(engine);
+   action = UTRACE_REPORT;
if (action  utrace-resume) {
utrace-resume = action;
set_notify_resume(target);
@@ -1381,11 +1384,13 @@ static void finish_report(struct task_st
 
if (resume  utrace-resume) {
spin_lock(utrace-lock);
-   utrace-resume = resume;
if (resume == UTRACE_INTERRUPT)
set_tsk_thread_flag(task, TIF_SIGPENDING);
-   else
+   else {
+   resume = UTRACE_REPORT;
set_tsk_thread_flag(task, TIF_NOTIFY_RESUME);
+   }
+   utrace-resume = resume;
spin_unlock(utrace-lock);
}
 



Re: utrace unwanted traps

2010-10-19 Thread Oleg Nesterov
On 10/14, Roland McGrath wrote:

  In short: I no longer understand why utrace-resume can be UTRACE_*STEP
  when the tracee is stopped, and in fact I think this is not right.

 I think we didn't have that originally.

Yes.

 The rationale that I remember is
 the idea that you should be able to implement a debugger interface feature
 like PTRACE_SINGLESTEP without noticeably more overhead than it had in
 pre-utrace ptrace.  That is, in vanilla ptrace, the tracer thread calls
 user_enable_single_step itself and then just lets the tracee resume
 directly to user mode.  We wanted to avoid always having an additional trip
 through tracehook_notify_resume and the utrace reporting loop and then back
 through the return-to-user assembly path a second time, between the tracee
 waking up and actually going to user mode.

Probably, I am not sure. I can't recall the previous discussions. IIRC,
the main problem with user_enable_single_step() was: we wanted to move
the callsite from tracer to tracee by many reasons. I can't recall if
avoid another reporting loop was the target, and additional trip
through tracehook_notify_resume() is not avoidable anyway.

Afaics, this was introduced by 26fefca955cc7c3c83014be2e10b773209d32eea
utrace: sticky resume action. Before that patch the stopped tracee
could only remember the pending UTRACE_REPORT or UTRACE_INTERRUPT.

 Right.  Said another way, any time an engine that previously used
 UTRACE_*STEP now uses UTRACE_RESUME, we must degrade utrace-resume back to
 UTRACE_REPORT at most.  Since we don't keep track of which engine it was
 that put UTRACE_*STEP into utrace-resume before, we'd have to just always
 degrade it any time anybody uses UTRACE_RESUME.

Agreed.

 That has almost the whole effect of punting utrace-resume back to just a
 utrace-report flag.

(see below)

 But that's how it used to be, and then we changed it.
 So it merits digging up our old discussion threads that led to doing that
 and see what all we had in mind then.

I tried to grep my mail archives and google, but failed to find anything
related.

_Perhaps_ this was not intended actually. Before the commit above, another
problem was fixed by 8ad60bbd4c665a11be179a0bff41483cca3b3560
utrace_stop: preserve report/interrupt requests across stop/resume, until
this one it was possible to lose UTRACE_INTERRUPT request if it races
with UTRACE_STOP from another engine. So, perhaps cleanup was the main
reason for 26fefca9.

See also http://www.mail-archive.com/utrace-devel@redhat.com/msg01647.html

Oleg.



Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-15 Thread Oleg Nesterov
On 10/12, Roland McGrath wrote:

  If engine is detached (has utrace_detached_ops), utrace_barrier(engine)
  spins until engine-ops becomes NULL. This is just wrong.

 Yes, this happens because get_utrace_lock returns -ERESTARTSYS and
 utrace_barrier checks for this and loops.  I agree these long-spin
 scenarios would be wrong.

 The reason it tries to wait for fully detached state is that after
 utrace_control(task,engine,UTRACE_DETACH), @task could still be in the
 middle of a callback for @engine.

Yes. But, with this patch, in this case
utrace_barrier()-get_utrace_lock(attached = false) returns success
and then we check utrace-reporting == engine.

This patch only makes a difference if ops == utrace_detached_ops,

Before this patch

spin until -ops goes away

After this patch

spin until -ops goes away OR -reporting != engine

(Hmm. Probably utrace-reap = T case needs more attention,
 utrace_maybe_reap() doesn't set utrace-reporting).

  Also, it is not clear why utrace_barrier() needs utrace-lock,
  except to ensure it is safe to dereference target/utrace.

 Well, wouldn't that be reason enough?  The comment in utrace_barrier
 talks about needing the lock.  This corresponds to the comment in the
 UTRACE_DETACH case of finish_callback_report.

Yes agreed. Honestly, I can't even recall what I meant.

  Note: we should also reconsider() utrace_barrier()-signal_pending() check.

 IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait
 (even moreso since it's really a spin).  If a buggy callback gets stuck
 blocking or spinning and fails to return promptly, then you wedge any
 debugger thread trying to synchronize with it via utrace_barrier.  If
 you can't even interrupt that debugger thread, then there will really be
 no chance to recover from the deadlock.

Completely agreed.

But, unfortunately, this signal_pending() check assumes the caller can
always handle this ERESTARTSYS. But this is not true, one example is
the exiting debugger doing exit_ptrace().

Oleg.



Re: gdbstub initial code, v7

2010-10-15 Thread Oleg Nesterov
On 10/13, Roland McGrath wrote:

  On 09/10, Roland McGrath wrote:
  
ugdb sets please stop flag and does utrace_control(INTERRUPT). 
However,
in unlikely case the tracee can stop before -report_signal() reporting
  
   I don't think this is the right thing to do.  When the intent is 
   explicitly
   to interrupt, there is no reason to stop before the interruption is
   complete, i.e. report_signal.
 
  This means that ugdb_report_quiesce() should never return UTRACE_STOP,
  and that is all.

 I'm not sure about this.

Ignoring the problems below, why?

  But what about multitracing? Suppose that (gdb) interrupt happens
  just before, say, do_report_syscall_entry() and another engine wants
  to stop. If ugdb_report_quiesce() doesn't return UTRACE_STOP, then
  gdb will wait until another debugger resumes the tracee.

 Yes, I do think that's a problem.  We want gdb to report back promptly.
 One possibility is to have report_quiesce notice its argument is
 UTRACE_EVENT(SYSCALL_ENTRY) and roll back to before the syscall.
 That is, it enables SYSCALL_ENTRY and SYSCALL_EXIT reporting, then
 its report_syscall_entry uses UTRACE_SIGNAL_ABORT, report_syscall_exit
 does syscall_set_return_value(-ERESTARTNOHAND, 0) and then returns
 UTRACE_INTERRUPT.  Now, we'll reenter a UTRACE_SIGNAL_REPORT callback
 before the system call, and we can stop there without being in any
 sticky situation.

Well, but this doesn't look friendly to other engines...

And at first glance this looks a bit too hairy. And, this doesn't
cover another case: gdb asks to stop the tracee when it is already
stopped by another engine and sleeps in utrace_resume() path.

So, I think ugdb should be changed so that signal SIG always works
(without reporing this signal) even when the stopped tracee doesn't
have the signal context.

As for $_siginfo (qXfer:siginfo:read::), I do not know what ugdb
should do. Probably it can just report the all-zeroes siginfo or
report si_signo = SIGSTOP.

Oleg.



[PATCH] utrace: utrace_resume()-start_callback() must clear -reporting

2010-10-12 Thread Oleg Nesterov
utrace_resume()-start_callback() can return without clearing
-reporting, this is very wrong. The bug was introduced by me in
47c593ee avoid the unnecessary utrace_resume()-utrace_reset()
commit.

Revert this patch and change start_callback() to check event right
after we call -report_quiesce(). If it is zero we can just clear
-spurious and return without playing with -reporting and -flags.
No need to worry about ENGINE_STOP, finish_callback() has already
updated engine-flags and report-action correctly.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- kstub/kernel/utrace.c~10_47c593ee_fix   2010-10-11 12:48:51.0 
+0200
+++ kstub/kernel/utrace.c   2010-10-12 21:19:33.0 +0200
@@ -1528,6 +1528,12 @@ static const struct utrace_engine_ops *s
   engine, event)))
return NULL;
 
+   if (!event) {
+   /* We only got here to report QUIESCE */
+   report-spurious = false;
+   return NULL;
+   }
+
/*
 * finish_callback() reset utrace-reporting after the
 * quiesce callback.  Now we set it again (as above)
@@ -1543,7 +1549,7 @@ static const struct utrace_engine_ops *s
if (want  ENGINE_STOP)
report-action = UTRACE_STOP;
 
-   if (want  (event ?: UTRACE_EVENT(QUIESCE))) {
+   if (want  event) {
report-spurious = false;
return ops;
}



Re: [PATCH?] avoid the unnecessary utrace_resume()-utrace_reset()

2010-10-11 Thread Oleg Nesterov
On 08/18, Oleg Nesterov wrote:

 utrace_resume(UTRACE_REPORT) always calls utrace_reset() because
 start_callback() obviously can't clear report-spurious when
 event == 0.

 Change start_callback() to correctly clear -spurious in this case.

Correctly? I am stupid, and this patch is wrong (47c593ee in
your tree).

 --- kstub/kernel/utrace.c~10_utrace_resume_and_spurious   2010-08-18 
 19:00:50.0 +0200
 +++ kstub/kernel/utrace.c 2010-08-18 19:41:05.0 +0200
 @@ -1540,7 +1540,7 @@ static const struct utrace_engine_ops *s
   if (want  ENGINE_STOP)
   report-action = UTRACE_STOP;

 - if (want  event) {
 + if (want  (event ?: UTRACE_EVENT(QUIESCE))) {
   report-spurious = false;
   return ops;
   }

with this change utrace_resume()-start_callback() returns with
utrace-reporting != NULL !!! This obviously breaks utrace_barrier(),
it can hang forever.

I noticed this by accident, when I was trying to understand the
problems with vCont changes.

I'll send the fix tomorrow. Damn, the fix is trivial but I'd like
to avoid another ugly check in start_callback(), and I don't think
utrace_resume() should clear -reporting.

Oleg.



Re: gdbstub initial code, v13

2010-10-11 Thread Oleg Nesterov
On 10/08, Oleg Nesterov wrote:

   - full vCont support.

 Well, this was easy, except even after 3 hours
 of debugging I can't understand why this change
 breaks the stepping over pthread_create :/

Oh, it doesn't break. I did a lot of mistakes when I was trying
to understand what happens. In short, GDBCAT was buggy, it used
the wrong socket for TCP_NODELAY.

Damn, it took me almost 2 days.

Oleg.



Re: BUG: gdb notification packets (Was: gdbstub initial code, v12)

2010-10-06 Thread Oleg Nesterov
On 10/05, Pedro Alves wrote:


(reordered)

 On Tuesday 05 October 2010 18:27:29, Oleg Nesterov wrote:

  So, I strongly believe gdb is buggy and should be fixed.

 Fix your stub to implement vCont;s/c(/S/C).

First of all, I confirm that when I added the (incomplete right
now) support for vCont;s the problem goes away, gdb never forgets
to send the necessary vStopped to consume all stop-reply packets.

Thanks Pedro.

  The more or less typical transcript is:
 
  [... snip ...]
  = s

 This is already wrong.

 The stub must support @samp{vCont} if it reports support for
 multiprocess extensions (@pxref{multiprocess extensions}).

Cough. Previously I was told here (on arc...@sourceware.org) that
Hc + s/c is enough and I shouldn't worry about vCont;s/c ;)

Currently ugdb only supports vCont;t because otherwise there is
obviously no way to stop all threads.

 The stub must also support vCont for non-stop, though I'll give you
 that it doesn't appear to be mentioned in the manual,

Yes, the manual doesn't explain this. Quite contrary, the decsription
of 'vCont?' definitely looks as if the stub is not obliged to implement
all vCont commands.

And, if the stub must support vCont for non-stop, then why gdb
doesn't complain after 'vCont?' but falls back to '$s' ?

 Look at remote.c:remote_resume, and you'll see that gdb does not
 wait for the OK after 'c'/'s'/'S'/'C' in non-stop mode.

Then gdbserver should be fixed? It does send OK in response to '$s',
that is why ugdb does this.

And what should be replied back to '$s', nothing? Very strange.
But seems to work... And yes, this explains the hang, gdb thinks
that this OK is connected to vStopped.

Again, the documentation is very confusing. Looking at
remote_resume()-remote_vcont_resume()-getpkt() I think that
vCont;s needs OK. Looking at D.3 Stop Reply Packets in
gdb.info I do not see any difference between `s' and `vCont'.

So. I still belive that something is wrong with gdb/gdbserever
but I don't care.


In any case ugdb should fully support vCont, hopefully I'll finish
this tomorrow. Could you answer a couple of questions?

1. Say, $vCont;s or $vCont;s:p-1.-1

   I assume, this should ignore the running threads, correct?
   IOW, iiuc this 's' applies to all threads which we already
   reported as stopped.

2. Say, $vCont;c:pPID.TID;s:p-1.-1

   Can I assume that gdb can never send this request as

$vCont;s:p-1.-1;c:pPID.TID ?

   If yes, then the implementation will be much simpler, I can
   add something like gencounters to ugdb_thread/process. Otherwise
   this needs more complications to figure out what should be done
   with each tracee.

Thanks,

Oleg.



utrace unwanted traps

2010-09-26 Thread Oleg Nesterov
On 09/23, Roland McGrath wrote:

  OK, finish_callback_report() and utrace_control(DETACH) can set
  TIF_NOTIFY_RESUME.

 Right.  Those utrace_resume has the report.action==UTRACE_RESUME bail-out
 case.  So either that would change or detach would also do UTRACE_REPORT.

  But what if there are no more attached engines?
  Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special
  case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise
  utrace_resume/utrace_get_signal won't be called.

 Right.  Or else tracehook_notify_resume could call utrace_resume
 unconditionally, but I'm not at all sure that is not worse.  The
 original theory was that it should always be OK to have some
 utrace_flags bits stay set when they are stale, because any kind of
 reporting pass that got enabled would hit the report-spurious case
 and clean the state up synchronously when it's safe.

I tried to make the patch which addresses this issue, but surprisingly
I failed. Because I think there are more (minor) problems here. When
it comes to multitracing, we can have the unwanted trap even without
detaching.

In short: I no longer understand why utrace-resume can be UTRACE_*STEP
when the tracee is stopped, and in fact I think this is not right.

For example. Consider two engines, E1 and E2. To simplify the discussion
suppose that both engines have engine-data-resume and -report_quiesce()
just returns -resume. So, if the debugger wants, say, single-step, it
does

engine-data-resume = UTRACE_SINGLESTEP;
utrace_control(UTRACE_SINGLESTEP);

Actually, any engine should do something like this.

Now suppose that E1 wants UTRACE_SINGLESTEP, E2 asks for UTRACE_STOP.
In this case utrace_resume() results in utrace_stop() which sets
utrace-resume = UTRACE_SINGLESTEP before sleeping.

First of all - why? Yes, the theory is that we have another reporting
loop after wake_up, but utrace-resume = UTRACE_REPORT is enough, E1
should always acquire UTRACE_SINGLESTEP if needed or it is buggy.

But more importantly, this doesn't look right or I missed something.
Suppose that, before E2 resumes the tracee, E1 does utrace_control(STOP)
which immediately succeeds and allows E1 to play with the stopped tracee.
Again, to simplify the discussion, suppose that E2 does UTRACE_RESUME
and nothing more after that.

Now. E1 wants to resume the tracee too and it does

engine-data-resume = UTRACE_RESUME;
utrace_control(UTRACE_RESUME);

utrace_control(RESUME) correctly clears TIF_SINGLESTEP, but it doesn't
change utrace-resume. This means utrace_resume() doesn't do the
reporting loop, it just calls user_enable_single_step() and returns
to user-mode.

So. Could you explain why utrace_stop() should ever keep UTRACE_STEP
in utrace-resume? finish_report() does the same but this looks fine.


Also. I am not sure utrace_control(UTRACE_STEP) should always set
utrace-resume = UTRACE_STEP. If the tracee is stopped but there
is another ENGINE_STOP engine (iow, the tracee won't be resumed
right now) we have the same problem.



Off-topic,

#define UTRACE_RESUME_BITS  (ilog2(UTRACE_RESUME_MAX) + 1)

why UTRACE_RESUME_MAX? it should be (ilog2(UTRACE_RESUME_MAX - 1) + 1), no?

Oleg.



Re: gdbstub initial code, v11

2010-09-22 Thread Oleg Nesterov
On 09/22, Frank Ch. Eigler wrote:

 oleg wrote:

  [...]  Honestly, I don't really know how do the right thing here.
  Anyway, most probably this code will be changed. Like ptrace, ugdb
  uses -report_syscall_exit() to synthesize a trap. Unlike ptrace,
  ugdb_report_signal() doesn't send SIGTRAP to itself but reports
  SIGTRAP using siginfo_t we have. In any case, whatever we do,
  multiple tracers can confuse each other.

 (It seems to me that a pure gdb report, without a synthetic
 self-injected SIGTRAP, should be fine.)

What do you mean?

  Next: fully implement g/G/p/P, currently I am a bit confused...
  But what about features? [...]

 You could dig out the old fishing plan.  One demonstrated
 improvement was from simulating (software) watchpoints within the
 gdb stub, instead of having gdb fall back to issing countless
 single-steps with memory-fetch inquiries in between.

When I do 'watch', gdb sends '$Z2'. I am a bit confused, iirc it
was decided I shouldn't play with Z packets now. But I won't
argue.

Oleg.



Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-22 Thread Oleg Nesterov
On 09/21, Roland McGrath wrote:

  OK, so what should we do for now?

 If we can't come to a straightforward answer for the general case
 fairly quickly, then we can do the simple change to start with.

I think we should start with changing utrace_control(DETACH) anyway,
then try to improve. I'll ressend the one-liner I already showed.

  Without the multitracing utrace_resume()-user_disable_single_step()
  fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP.

 No, no, we don't want that.  We don't need more state.  And, remember,
 we really don't need to microoptimize cases when single-step was used
 recently--the cost of taking one more single-step trap and percolating
 through the signal paths was going to be pretty large anyway.

OK,

 It's better to have a spurious report (preferably just spurious
 TIF_NOTIFY_RESUME with no actual callbacks) following any detach,
 or whatever it takes to ensure user_disable_single_step() always
 runs if user_enable_*_step did before and there is no engine around
 to see a report_quiesce callback pass.  If there is a report_quiesce
 or report_signal callback pass where nobody returns UTRACE_*STEP,
 then we should never leave stepping enabled when we return to user mode.

Hmm. I'll try to think more. Right now I don't really understand
how to do this correctly.

OK, finish_callback_report() and utrace_control(DETACH) can set
TIF_NOTIFY_RESUME. But what if there are no more attached engines?
Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special
case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise
utrace_resume/utrace_get_signal won't be called.

So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset()
should do user_disable_single_step() too if no more engines. Confused.

And in fact I don't understand why this is important. When it comes
to multiracing, any engine can hit the unwanted/unexpected trap
because another engine can ask for UTRACE_*STEP. The only really
important (I think) case is when the last engine detaches.

IOW. Suppose that eninge E does utrace_control(STEP) or its callback
returns UTRACE_*STEP. If we do not detach this engine, other engines
will see the trap. So why it is so important to clear X86_EFLAGS_TF
if we detach E ?

Oleg.



Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-22 Thread Oleg Nesterov
On 09/23, Oleg Nesterov wrote:

 On 09/21, Roland McGrath wrote:
 
  It's better to have a spurious report (preferably just spurious
  TIF_NOTIFY_RESUME with no actual callbacks) following any detach,
  or whatever it takes to ensure user_disable_single_step() always
  runs if user_enable_*_step did before and there is no engine around
  to see a report_quiesce callback pass.  If there is a report_quiesce
  or report_signal callback pass where nobody returns UTRACE_*STEP,
  then we should never leave stepping enabled when we return to user mode.

 Hmm. I'll try to think more. Right now I don't really understand
 how to do this correctly.

 OK, finish_callback_report() and utrace_control(DETACH) can set
 TIF_NOTIFY_RESUME. But what if there are no more attached engines?
 Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special
 case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise
 utrace_resume/utrace_get_signal won't be called.

 So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset()
 should do user_disable_single_step() too if no more engines. Confused.

Or, detach can always do user_disable_single_step() and set
TIF_NOTIFY_RESUME. If another engine wants stepping its report_quiesce()
should re-ack UTRACE_SINGLESTEP anyway, otherwise it is buggy.

 And in fact I don't understand why this is important. When it comes
 to multiracing, any engine can hit the unwanted/unexpected trap
 because another engine can ask for UTRACE_*STEP. The only really
 important (I think) case is when the last engine detaches.

Aaah. please ignore. Somehow I assumed every engine should hook
SIGTRAP.

Oleg.



Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-21 Thread Oleg Nesterov
On 09/21, Roland McGrath wrote:

  Change utrace_reset() to do user_disable_single_step(). Alternatively
  we could change ptrace layer, but I think this is not ptrace specific.

 Yes, it's right to fix this in the utrace layer.
 But I'm not sure that's the right place in the code to fix it.

 The expectation is that either we'll get to finish_resume_report,
 which calls user_*_step, or that utrace_control is resuming us
 without any expectation of a report.  For UTRACE_RESUME in that
 case, utrace_control calls user_disable_single_step.  So probably
 the UTRACE_DETACH case there should just do it to.

Yes, initially I was going to change utrace_control(DETACH), and this
certainly looks more clean.

I was worried about the case when the TIF_SINGLESTEP tracee detaches
itself and escapes finish_resume_report()-user_disable_single_step(),
say, utrace_report_exec(). But probably we can ignore this.

Another concern was implicit detach, but thinking more I do not see
why utrace_resume() is better.

OK, I'll do some testing and resend right now. In UTRACE_DETACH case
reset can be true but the tracee is running. But I don't think it
makes sense to check target-exit_state == 0, correct?

Oleg.



Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-21 Thread Oleg Nesterov
On 09/21, Roland McGrath wrote:

  I was worried about the case when the TIF_SINGLESTEP tracee detaches
  itself and escapes finish_resume_report()-user_disable_single_step(),
  say, utrace_report_exec(). But probably we can ignore this.

 No, I think that is indeed a problem.  If no engine is still attached
 whose last callback returned UTRACE_SINGLESTEP, we should never return
 to user mode with single-step enabled.

OK, so what should we do for now?

Without the multitracing utrace_resume()-user_disable_single_step()
fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP.

  Another concern was implicit detach, but thinking more I do not see
  why utrace_resume() is better.
 
  OK, I'll do some testing and resend right now. In UTRACE_DETACH case
  reset can be true but the tracee is running. But I don't think it
  makes sense to check target-exit_state == 0, correct?

 I think it's OK if it's running with -exit_state set (or even with just
 PF_EXITING set), i.e. already in kernel mode and never going back to
 user mode.  (Then it's essentially equivalent to calling user_*_step
 while racing with a SIGKILL death, which has to be OK.)  Any other kind
 of running would not be OK.

Yes, my concern was running and !current, not exiting. This is OK on
x86 but user_disable_single_step() is arch specific.

Oleg.



Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-21 Thread Oleg Nesterov
On 09/22, Oleg Nesterov wrote:

 On 09/21, Roland McGrath wrote:
 
   I was worried about the case when the TIF_SINGLESTEP tracee detaches
   itself and escapes finish_resume_report()-user_disable_single_step(),
   say, utrace_report_exec(). But probably we can ignore this.
 
  No, I think that is indeed a problem.  If no engine is still attached
  whose last callback returned UTRACE_SINGLESTEP, we should never return
  to user mode with single-step enabled.

 OK, so what should we do for now?

 Without the multitracing utrace_resume()-user_disable_single_step()
 fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP.

As expected, the patch below equally fixes the problem and passes
ptrace-tests.

Which one do you prefer?

Note that, since we are going to change UTRACE_RESUME to remove
ENGINE_STOP from utrace_flags, we can probably unify DETACH/RESUME

case UTRACE_DETACH:
mark_engine_detached(engine);
reset = reset || utrace_do_stop(target, utrace);
if (!reset) {
/*
 * As in utrace_set_events(), this barrier ensures
 * that our engine-flags changes have hit before we
 * examine utrace-reporting, pairing with the barrier
 * in start_callback().  If @target has not yet hit
 * finish_callback() to clear utrace-reporting, we
 * might be in the middle of a callback to @engine.
 */
smp_mb();
if (utrace-reporting == engine)
ret = -EINPROGRESS;
}
/* fall */

case UTRACE_RESUME:
clear_engine_wants_stop(engine);
target-utrace_flags = ~ENGINE_STOP;

/*
 * This and all other cases imply resuming if stopped.
 * There might not be another report before it just
 * resumes, so make sure single-step is not left set.
 */
if (likely(reset))
user_disable_single_step(target);
break;


--- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss   2010-09-20 
03:55:12.0 +0200
+++ kstub/kernel/utrace.c   2010-09-22 01:54:24.0 +0200
@@ -1139,7 +1139,9 @@ int utrace_control(struct task_struct *t
target-utrace_flags = ~ENGINE_STOP;
mark_engine_detached(engine);
reset = reset || utrace_do_stop(target, utrace);
-   if (!reset) {
+   if (reset) {
+   user_disable_single_step(target);
+   } else {
/*
 * As in utrace_set_events(), this barrier ensures
 * that our engine-flags changes have hit before we



gdbstub initial code, v11

2010-09-21 Thread Oleg Nesterov
Changes: syscall stepping + minor cleanups.

Seems to work, more or less, but surely there are some bugs.

Honestly, I don't really know how do the right thing here.
Anyway, most probably this code will be changed. Like ptrace,
ugdb uses -report_syscall_exit() to synthesize a trap. Unlike
ptrace, ugdb_report_signal() doesn't send SIGTRAP to itself
but reports SIGTRAP using siginfo_t we have. In any case,
whatever we do, multiple tracers can confuse each other.

Next: fully implement g/G/p/P, currently I am a bit confused...

But what about features? What should I do next? all-stop,
thread-specific breakpoints (currently I have no idea what
this means), or what?

Oleg.

#include linux/module.h
#include linux/proc_fs.h
#include linux/utrace.h
#include linux/poll.h
#include linux/mm.h
#include linux/regset.h
#include asm/uaccess.h

static int o_remote_debug;
module_param_named(echo, o_remote_debug, bool, 0);

#define BUFFER_SIZE 1024
#define PACKET_SIZE 1024

struct pbuf {
char*cur, *pkt;
charbuf[BUFFER_SIZE];
};

static inline void pb_init(struct pbuf *pb)
{
pb-cur = pb-buf;
pb-pkt = NULL;
}

enum {
U_STOP_IDLE = 0,
U_STOP_PENDING,
U_STOP_SENT,
};

struct ugdb {
struct list_headu_processes;
struct list_headu_stopped;

int u_stop_state;

struct mutexu_mutex;
spinlock_t  u_slock;

struct ugdb_thread
*u_cur_tinfo,
*u_cur_hg,
*u_cur_hc;

wait_queue_head_t   u_wait;

int u_err;

struct pbuf u_pbuf;
charu_cbuf[PACKET_SIZE];
int u_clen;

sigset_tu_sig_ign;

unsigned int
u_no_ack:1,
u_allstop:1;
};

static inline void ugdb_ck_stopped(struct ugdb *ugdb)
{
spin_lock(ugdb-u_slock);
WARN_ON(!list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_IDLE);
WARN_ON(list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_PENDING);
spin_unlock(ugdb-u_slock);
}

static struct ugdb *ugdb_create(void)
{
struct ugdb *ugdb;
int err;

err = -ENODEV;
// XXX: ugly. proc_reg_open() should take care.
if (!try_module_get(THIS_MODULE))
goto out;

err = -ENOMEM;
ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL);
if (!ugdb)
goto put_module;

INIT_LIST_HEAD(ugdb-u_processes);
INIT_LIST_HEAD(ugdb-u_stopped);

mutex_init(ugdb-u_mutex);
spin_lock_init(ugdb-u_slock);

init_waitqueue_head(ugdb-u_wait);

pb_init(ugdb-u_pbuf);

return ugdb;

put_module:
module_put(THIS_MODULE);
out:
return ERR_PTR(err);
}

#define P_DETACHING (1  1)
#define P_ZOMBIE(1  2)

struct ugdb_process {
int p_pid;
int p_state;

struct list_headp_threads;

struct ugdb *p_ugdb;
struct list_headp_processes;
};

static inline bool process_alive(struct ugdb_process *process)
{
return !(process-p_state  P_ZOMBIE);
}

static inline void mark_process_dead(struct ugdb_process *process)
{
process-p_state |= P_ZOMBIE;
}

static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr)
{
struct ugdb_process *process;

process = kzalloc(sizeof(*process), GFP_KERNEL);
if (!process)
return NULL;

process-p_pid = pid_nr;
process-p_ugdb = ugdb;
INIT_LIST_HEAD(process-p_threads);
list_add_tail(process-p_processes, ugdb-u_processes);

return process;
}

#define T_STOP_RUN  0
#define T_STOP_REQ  (1  0)/* requested by gdb */
#define T_STOP_ALL  (1  1)/* vCont;c:pX.-1, for report_clone */
#define T_STOP_ACK  (1  2)/* visible to vStopped */
#define T_STOP_STOPPED  (1  3)/* reported as stopped to gdb */
/* TASK_TRACED + deactivated ? */

#define T_EV_NONE   0
#define T_EV_EXIT   (1  24)
#define T_EV_SIGN   (2  24)

#define T_EV_TYPE(event)((0xff  24)  (event))
#define T_EV_DATA(event)(~(0xff  24)  (event))

struct ugdb_thread {
int t_tid;
int t_stop_state;
int t_stop_event;
boolt_step; // XXX: move me into 
t_stop_event

siginfo_t   *t_siginfo;

struct ugdb *t_ugdb;
struct ugdb_process *t_process;

struct 

[PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines

2010-09-20 Thread Oleg Nesterov
Test-case:

#include stdio.h
#include signal.h
#include unistd.h
#include sys/ptrace.h
#include sys/wait.h
#include assert.h

int main(void)
{
int pid, status;

pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
kill(getpid(), SIGSTOP);
return 0x23;
}

assert(wait(status) == pid);
assert(WIFSTOPPED(status)  WSTOPSIG(status) == SIGSTOP);

assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
assert(wait(status) == pid);
assert(WIFSTOPPED(status)  WSTOPSIG(status) == SIGTRAP);

assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
assert(wait(status) == pid);

if (WIFEXITED(status)  WEXITSTATUS(status) == 0x23)
return 0;

printf(ERR!! exit status: %04x\n, status);
return 1;
}

It fails because TIF_SINGLESTEP is not cleared after PTRACE_DETACH and
the tracee gets another trap after resume.

Change utrace_reset() to do user_disable_single_step(). Alternatively
we could change ptrace layer, but I think this is not ptrace specific.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |1 +
 1 file changed, 1 insertion(+)

--- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss   2010-09-20 
03:55:12.0 +0200
+++ kstub/kernel/utrace.c   2010-09-20 21:33:47.0 +0200
@@ -709,6 +709,7 @@ static bool utrace_reset(struct task_str
 */
utrace-resume = UTRACE_RESUME;
utrace-signal_handler = 0;
+   user_disable_single_step(task);
}
 
/*



gdbstub initial code, v10

2010-09-19 Thread Oleg Nesterov
Back to ugdb.

Changes: implement stepi, this also means breakpoints work too.

Notes:

- almost untested, probably needs more fixes

- syscall-stepping doesn't work correctly (should be simple)

- don't look at handle_setregs/handle_set_one_reg, I did
  this on purpose to be sure I really understand what gdb
  actually does

Well. This wasn't simple because I nearly got heart attack twice
when I was writing this change ;)

However, it turns out that ptrace-utrace (old or recently changed)
is innocent.

I found 2 problems, the 1st one is /usr/bin/gdb bug, another one
(I believe) is utrace core bug. I'll report tomorrow.

Oleg.
#include linux/module.h
#include linux/proc_fs.h
#include linux/utrace.h
#include linux/poll.h
#include linux/mm.h
#include linux/regset.h
#include asm/uaccess.h

static int o_remote_debug;
module_param_named(echo, o_remote_debug, bool, 0);

#define BUFFER_SIZE 1024
#define PACKET_SIZE 1024

struct pbuf {
char*cur, *pkt;
charbuf[BUFFER_SIZE];
};

static inline void pb_init(struct pbuf *pb)
{
pb-cur = pb-buf;
pb-pkt = NULL;
}

enum {
U_STOP_IDLE = 0,
U_STOP_PENDING,
U_STOP_SENT,
};

struct ugdb {
struct list_headu_processes;
struct list_headu_stopped;

int u_stop_state;

struct mutexu_mutex;
spinlock_t  u_slock;

struct ugdb_thread
*u_cur_tinfo,
*u_cur_hg,
*u_cur_hc;

wait_queue_head_t   u_wait;

int u_err;

struct pbuf u_pbuf;
charu_cbuf[PACKET_SIZE];
int u_clen;

sigset_tu_sig_ign;

unsigned int
u_no_ack:1,
u_allstop:1;
};

static inline void ugdb_ck_stopped(struct ugdb *ugdb)
{
spin_lock(ugdb-u_slock);
WARN_ON(!list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_IDLE);
WARN_ON(list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_PENDING);
spin_unlock(ugdb-u_slock);
}

static struct ugdb *ugdb_create(void)
{
struct ugdb *ugdb;
int err;

err = -ENODEV;
// XXX: ugly. proc_reg_open() should take care.
if (!try_module_get(THIS_MODULE))
goto out;

err = -ENOMEM;
ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL);
if (!ugdb)
goto put_module;

INIT_LIST_HEAD(ugdb-u_processes);
INIT_LIST_HEAD(ugdb-u_stopped);

mutex_init(ugdb-u_mutex);
spin_lock_init(ugdb-u_slock);

init_waitqueue_head(ugdb-u_wait);

pb_init(ugdb-u_pbuf);

return ugdb;

put_module:
module_put(THIS_MODULE);
out:
return ERR_PTR(err);
}

#define P_DETACHING (1  1)
#define P_ZOMBIE(1  2)

struct ugdb_process {
int p_pid;
int p_state;

struct list_headp_threads;

struct ugdb *p_ugdb;
struct list_headp_processes;
};

static inline bool process_alive(struct ugdb_process *process)
{
return !(process-p_state  P_ZOMBIE);
}

static inline void mark_process_dead(struct ugdb_process *process)
{
process-p_state |= P_ZOMBIE;
}

static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr)
{
struct ugdb_process *process;

process = kzalloc(sizeof(*process), GFP_KERNEL);
if (!process)
return NULL;

process-p_pid = pid_nr;
process-p_ugdb = ugdb;
INIT_LIST_HEAD(process-p_threads);
list_add_tail(process-p_processes, ugdb-u_processes);

return process;
}

#define T_STOP_RUN  0
#define T_STOP_REQ  (1  0)/* requested by gdb */
#define T_STOP_ALL  (1  1)/* vCont;c:pX.-1, for report_clone */
#define T_STOP_ACK  (1  2)/* visible to vStopped */
#define T_STOP_STOPPED  (1  3)/* reported as stopped to gdb */
/* TASK_TRACED + deactivated ? */

#define T_EV_NONE   0
#define T_EV_EXIT   (1  24)
#define T_EV_SIGN   (2  24)

#define T_EV_TYPE(event)((0xff  24)  (event))
#define T_EV_DATA(event)(~(0xff  24)  (event))

struct ugdb_thread {
int t_tid;
int t_stop_state;
int t_stop_event;
boolt_step; // XXX: move me into 
t_stop_event

siginfo_t   *t_siginfo;

struct ugdb *t_ugdb;
struct ugdb_process *t_process;

struct 

Re: exit_ptrace() ptrace_report_signal() problems

2010-09-15 Thread Oleg Nesterov
On 09/14, Oleg Nesterov wrote:

 Oh, please take a look at this (untested) patch.

I tried to test the cleanuped version, seems to work.

 But I think I should make another attempt, probably spinlock (-siglock ?)
 would be cleaner.

No. I think more locking buys nothing.

I'll split this patch and resend today.

Oleg.



Re: exit_ptrace() ptrace_report_signal() problems

2010-09-14 Thread Oleg Nesterov
On 09/14, Oleg Nesterov wrote:

 On 09/13, Roland McGrath wrote:
 
  Locks just between the tracer and ptrace_report_signal are not bad.

 OK, but let me think a bit more. I'd really like to avoid adding the
 new lock to avoid the very unlikely race with the exiting tracer.

Oh, please take a look at this (untested) patch.

But I think I should make another attempt, probably spinlock (-siglock ?)
would be cleaner.

Oleg.


--- kstub/kernel/ptrace-utrace.c~9_EXIT_PTRACE_CAN_LOSE_SIG 2010-08-24 
14:31:17.0 +0200
+++ kstub/kernel/ptrace-utrace.c2010-09-14 23:37:59.0 +0200
@@ -67,6 +67,7 @@ struct ptrace_context {
 #define PT_UTRACED 0x1000
 
 #define PTRACE_O_SYSEMU0x100
+#define PTRACE_O_REUSABLE  0x200
 
 #define PTRACE_EVENT_SYSCALL   (1  16)
 #define PTRACE_EVENT_SIGTRAP   (2  16)
@@ -115,7 +116,7 @@ ptrace_reuse_engine(struct task_struct *
return engine;
 
ctx = ptrace_context(engine);
-   if (unlikely(ctx-resume == UTRACE_DETACH)) {
+   if (unlikely(ctx-options == PTRACE_O_REUSABLE)) {
/*
 * Try to reuse this self-detaching engine.
 * The only caller which can hit this case is ptrace_attach(),
@@ -246,24 +247,48 @@ static void ptrace_detach_task(struct ta
 */
bool voluntary = (sig = 0);
struct utrace_engine *engine = ptrace_lookup_engine(tracee);
+   struct ptrace_context *ctx = ptrace_context(engine);
enum utrace_resume_action action = UTRACE_DETACH;
 
if (unlikely(IS_ERR(engine)))
return;
 
-   if (sig) {
-   struct ptrace_context *ctx = ptrace_context(engine);
+   if (!voluntary) {
+   int err;
+
+   ctx-resume = UTRACE_DETACH;
+   /* synchronize with ptrace_report_signal() */
+   do {
+   err = utrace_barrier(tracee, engine);
+   } while (err == -ERESTARTSYS);
+
+   if (!err) {
+   /*
+* The tracee has already reported a signal. If we
+* see -siginfo != NULL it is safe to mark this ctx
+* as reusable and resume the tracee. We can race
+* with ptrace_report_signal(UTRACE_SIGNAL_REPORT)
+* already in progress but this doesn't matter, it
+* must see -resume = UTRACE_DETACH.
+*/
+   if (ctx-siginfo) {
+   smp_wmb();
+   ctx-options = PTRACE_O_REUSABLE;
+   action = UTRACE_RESUME;
+   }
+   }
 
+   } else if (sig) {
switch (get_stop_event(ctx)) {
case PTRACE_EVENT_SYSCALL:
-   if (voluntary)
-   send_sig_info(sig, SEND_SIG_PRIV, tracee);
+   send_sig_info(sig, SEND_SIG_PRIV, tracee);
break;
 
case PTRACE_EVENT_SIGNAL:
-   if (voluntary)
-   ctx-signr = sig;
+   ctx-signr = sig;
ctx-resume = UTRACE_DETACH;
+   smp_wmb();
+   ctx-options = PTRACE_O_REUSABLE;
action = UTRACE_RESUME;
break;
}
@@ -410,6 +435,9 @@ static u32 ptrace_report_syscall_exit(u3
return UTRACE_STOP;
 
if (ctx-resume != UTRACE_RESUME) {
+   if (ctx-resume == UTRACE_DETACH)
+   return UTRACE_RESUME;
+
WARN_ON(ctx-resume != UTRACE_BLOCKSTEP 
ctx-resume != UTRACE_SINGLESTEP);
ctx-resume = UTRACE_RESUME;
@@ -502,6 +530,9 @@ static u32 ptrace_report_signal(u32 acti
ctx-siginfo = NULL;
 
if (resume != UTRACE_RESUME) {
+   if (resume == UTRACE_DETACH)
+   return action;
+
WARN_ON(resume != UTRACE_BLOCKSTEP 
resume != UTRACE_SINGLESTEP);
 
@@ -531,6 +562,11 @@ static u32 ptrace_report_signal(u32 acti
}
 
WARN_ON(ctx-siginfo);
+
+   /* Raced with the exiting tracer ? */
+   if (resume == UTRACE_DETACH)
+   return action;
+
ctx-siginfo = info;
/*
 * ctx-siginfo points to the caller's stack.
@@ -759,7 +795,7 @@ void exit_ptrace(struct task_struct *tra
 static int ptrace_set_options(struct task_struct *tracee,
struct utrace_engine *engine, long data)
 {
-   BUILD_BUG_ON(PTRACE_O_MASK  PTRACE_O_SYSEMU);
+   BUILD_BUG_ON(PTRACE_O_MASK  (PTRACE_O_SYSEMU | PTRACE_O_REUSABLE

Re: gdbstub initial code, v9

2010-09-10 Thread Oleg Nesterov
On 09/10, Roland McGrath wrote:

   Oleg == Oleg Nesterov o...@redhat.com writes:
 
  Oleg   (gdb) set var 0
 
  You need:  set variable var = 0
  The variable can be abbreviated.

 I've always just used:

   (gdb) set var=0

No, I tried this too, doesn't work.

(gdb) set var=0
A syntax error in expression, near `=0'.

But, it turns out I choosed a bad name for the variable when
I tested the fix in unxex().

(gdb) set xxx=0

This works.

(gdb) set var var=0

This works too. I guess, when gdb sees set var it expects the
full set variable ... command.

Oleg.



Re: gdbstub initial code, v7

2010-09-10 Thread Oleg Nesterov
On 09/10, Roland McGrath wrote:

  But I meant another case, when the stopped tracee doesn't have siginfo.
  Currently ugdb just sends this signal to tracee, and then it will be
  reported to gdb. Not sure if this is right or not, I can change this.
  (or perhap this doesn't matter, I dunno).

 What do you mean by doesn't have siginfo?  You mean non-signal stops?

Yes.

 What non-signal stops does ugdb report?

(gdb) interrupt

ugdb sets please stop flag and does utrace_control(INTERRUPT). However,
in unlikely case the tracee can stop before -report_signal() reporting
loop (especially in multitracing case). Or it can be already stopped
(note: this needs a separate discussion, currently ugdb intentionally
doesn't handle this case).

And. With the current implementation, even if the tracee stops after
ugdb_report_signal() was called, it doesn't setup -t_siginfo.

IOW. If the tracee actually recieves a signal, then

- qXfer:siginfo:read works

- signal SIG works as expected (delivered to tracee)

Otherwise

- qXfer:siginfo:read reports E01

- signal XX means TXX report.

Once again, this can be changed (fixed?), but I am not sure this
should be changed.

Oleg.



Re: gdbstub initial code, v8 ptrace

2010-09-10 Thread Oleg Nesterov
On 09/10, Roland McGrath wrote:

  I am a bit confused... OK, ugdb is wrong wrt multitracing.
  UTRACE_SIGNAL_REPORT case shouldn't return UTRACE_STOP | 
  UTRACE_SIGNAL_IGN,
  it should return UTRACE_STOP | UTRACE_SIGNAL_REPORT to keep 
  report-result.

 No, UTRACE_SIGNAL_REPORT is not meant for a return value.  Its only use is
 in the incoming argument to tell you that a given report_signal call is
 standing in for a report_quiesce(0) call.

Yes, that is why initially ugdb returned  | UTRACE_SIGNAL_IGN. But
you misunerstood my concerns (or me your reply ;)

But. Suppose we have to attached engines. The first engine gets
UTRACE_SIGNAL_REPORT and returns UTRACE_STOP | UTRACE_SIGNAL_IGN.

Or,

  But it needs to return UTRACE_SIGNAL_DELIVER?

 That's what you return when you want the signal delivered.

or yes, it returns UTRACE_SIGNAL_DELIVER after gdb does signal XX.

Now. The second engine gets UTRACE_SIGNAL_DELIVER or UTRACE_SIGNAL_IGN,
not UTRACE_SIGNAL_REPORT.


That is why ugdb_signal_report(UTRACE_SIGNAL_REPORT) tries to return
UTRACE_STOP | utrace_signal_action(action) to not change report-result
(passed to the next tracee) inside the reporting loop.

The worst case is UTRACE_SIGNAL_HANDLER. Single-stepping should know
about this case. This means that any engine should always return
UTRACE_resume_action | UTRACE_SIGNAL_HANDLER.

Unless we are going to change utrace_get_signal().

  Probably we can check orig_ka != NULL and treat orig_ka == NULL as
  UTRACE_SIGNAL_REPORT. Hopefully this can't be confused with
  UTRACE_SIGNAL_HANDLER.

 I'm not really sure what you mean here.

If -report_signal(orig_ka) was calles with orig_ka == NULL, then,
whatever utrace_signal_action(action) we see, originally it was
either UTRACE_SIGNAL_HANDLER or UTRACE_SIGNAL_REPORT but another engine
returned, say, UTRACE_SIGNAL_DELIVER/UTRACE_SIGNAL_IGN to deliver/stop.

  Not sure about UTRACE_SIGNAL_HOLD, but this is very unlikely race.

 You probably don't want to use that, but I'm not entirely sure.  ptrace
 doesn't use it, and the signal interception model is pretty much the same.

Yes, agreed. But there is one corner case. Until we generalize
utrace_stop()-ptrace_notify_stop() hook, the tracee can be reported as
stopped to gdb, but before it actually stops it can recieve a signal.

The remote protocol doesn't allow to send TSIG after we alreay sent
T00 (at least this actually confuses gdb), and we can't just stop, the
tracee should report this signal to debugger.

So, currently ugdb stops but uses UTRACE_SIGNAL_HOLD to report this
signal later.

Oleg.



Re: gdbstub initial code, v7

2010-09-10 Thread Oleg Nesterov
On 09/10, Roland McGrath wrote:

  ugdb sets please stop flag and does utrace_control(INTERRUPT). However,
  in unlikely case the tracee can stop before -report_signal() reporting

 I don't think this is the right thing to do.  When the intent is explicitly
 to interrupt, there is no reason to stop before the interruption is
 complete, i.e. report_signal.

This means that ugdb_report_quiesce() should never return UTRACE_STOP,
and that is all.

But what about multitracing? Suppose that (gdb) interrupt happens
just before, say, do_report_syscall_entry() and another engine wants
to stop. If ugdb_report_quiesce() doesn't return UTRACE_STOP, then
gdb will wait until another debugger resumes the tracee.

What do you think?

 If you only stop there, then you can always
 process a signal injection with complete flexibility.

Yes, sure (again, currently ugdb does not injection a signal even
if the tracee was stopped in report_signal, but of course we can
change this).

Oleg.



Re: gdbstub initial code, v9

2010-09-09 Thread Oleg Nesterov
On 09/09, Frank Ch. Eigler wrote:

 Oleg Nesterov o...@redhat.com writes:

  [...]
  But, Jan. Implementing the memory writes does not mean breakpoints
  automatically start to work!

 It approximately should though.

  Yes, gdb writes cc, and yes the tracee reports SIGTRAP. But after
  that continue does nothing except $c, and the tracee naturally
  gets SIGILL. I expected that, since ugdb doesn't even know the code
  was changed, gdb should write the original byte back before continue,
  but this doesn't happen.

 In normal all-stop mode,

Currently ugdb only supports non-stop

 gdb does normally replace the old
 instruction, in order to single-step over it with the 's' packet.

Yes, probably single-stepping is needed... I am still trying to
understand how this works with gdbserver, but I see vCont:s packets.

 Perhaps you're testing some buggy non-stop aspect that only works
 with 'Z' breakpoint management packets?

No. Just a trivial test-case which printfs in a loop.

 A fuller packet trace
 would help explain.

Please see below. But the only important part is:

$M4005ba,1:cc   --- set bp
$c  --- resume

of course, this can't work.

Full trace:

= qSupported:multiprocess+
= PacketSize=400;QStartNoAckMode+;QNonStop+;multiprocess+;QPassS...
= QStartNoAckMode
= OK
= !
= OK
= Hgp0.0
= E01
= QNonStop:1
= OK
= qfThreadInfo
= E01
= ?
= OK
= qSymbol::
=
= vAttach;95b
= OK
= qfThreadInfo
= mp95b.95b
= qsThreadInfo
= l
= Hgp95b.95b
= OK
= vCont?
= vCont;t
= vCont;t:p95b.-1
= OK
= %Stop:T00thread:p95b.95b;
= vStopped
= OK
= g
= fcfd90ad5329ff7f00...
= m600880,8
= 403c6d7d007f
= m7f007d6d3c48,8
= 00106d7d007f
= m7f007d6d1000,28
= f6e04c7d007fe807600080156d7d007f00...
= m7f007d6d1580,28
= 00f0ef29ff7ff6e04c7d007f50f45f29ff7f00c06c7d007f00...
= m7f007d4ce0f4,4
= 090a0069
= m7f007d6cc000,28
= 0030167d007f781f6d7d007f400b4b7d007fe8346d7d007f00...
= m7f007d6d1f78,4
= 2f6c6962
= m7f007d6d1f7c,4
= 2f6c6962
= m7f007d6d1f80,4
= 632e736f
= m7f007d6d1f84,4
= 2e36
= m7f007d6d34e8,28
= 00704b7d007f0002482e6d7d007f00...
= m400200,4
= 2f6c6962
= m400204,4
= 2f6c642d
= m400208,4
= 6c696e75
= m40020c,4
= 782d7838
= m400210,4
= 362d3634
= m400214,4
= 2e736f2e
= m400218,4
= 3200
= m7f007d6d3c40,4
= 0100
= m7f007d6d3c48,8
= 00106d7d007f
= m7f007d6d3c50,8
= c04e4c7d007f
= Z0,7f007d4c4ec0,1
=
= m7f007d4c4ec0,1
= f3
= X7f007d4c4ec0,0:
=
= M7f007d4c4ec0,1:cc
= OK
= m600880,8
= 403c6d7d007f
= m7f007d6d3c48,8
= 00106d7d007f
= m7f007d6d1000,28
= f6e04c7d007fe807600080156d7d007f00...
= m7f007d6d1580,28
= 00f0ef29ff7ff6e04c7d007f50f45f29ff7f00c06c7d007f00...
= m7f007d4ce0f4,4
= 090a0069
= m7f007d6cc000,28
= 0030167d007f781f6d7d007f400b4b7d007fe8346d7d007f00...
= m7f007d6d1f78,4
= 2f6c6962
= m7f007d6d1f7c,4
= 2f6c6962
= m7f007d6d1f80,4
= 632e736f
= m7f007d6d1f84,4
= 2e36
= m7f007d6d34e8,28
= 00704b7d007f0002482e6d7d007f00...
= m400200,4
= 2f6c6962
= m400204,4
= 2f6c642d
= m400208,4
= 6c696e75
= m40020c,4
= 782d7838
= m400210,4
= 362d3634
= m400214,4
= 2e736f2e
= m400218,4
= 3200
= m7f007d6d3c40,4
= 0100
= vCont;t:p95b.-1
= OK
= m7f007d201f40,1
= 48
= m7f007d201f40,1
= 48
= g
= fcfd90ad5329ff7f00...
= m7f007d201f40,1
= 48
= m7f007d201f40,1
= 48
= m40056c,12
= 554889e5e8e3fe89c6ba0700bfdc
= m40056c,1
= 55
= m40056d,3
= 4889e5
= m40056c,12
= 554889e5e8e3fe89c6ba0700bfdc
= m40056c,1
= 55
= m40056d,3
= 4889e5
= m4005ba,1
= e8
= m4005ba,1
= e8

(gdb) b BP.c:13
Breakpoint 1 at 0x4005ba: file BP.c

Re: gdbstub initial code, v9

2010-09-09 Thread Oleg Nesterov
On 09/09, Oleg Nesterov wrote:

 the tracee hits this bp and reports SIGTRAP

   = vStopped
   = OK
   = g
   = 00064000401f207d007f00...
   = P10=ba054000
   =
   = G00064000401f207d007f0...
   =

May be this can explain...

Probably I need to implement G/P first, otherwise gdb can't change ip.

Still, I'd appreciate if someone can explain me what gdb needs/expects
to handle breakpoints before I start to read the sources.

Oleg.



gdbstub initial code, v9

2010-09-08 Thread Oleg Nesterov
Changes:

- partly fix the multitracing problems.

  ugdb still can't work with ptrace, ptrace-utrace.c needs
  changes. But at least multiple udgb tracers can coexist,
  more or less.

  But of course they can confuse each other anyway, no matter
  what ugdb does.

- implement memory writes ($M).

- refactor memory reads to avoid the Too late to report the
  error case.

But, Jan. Implementing the memory writes does not mean breakpoints
automatically start to work!

Yes, gdb writes cc, and yes the tracee reports SIGTRAP. But after
that continue does nothing except $c, and the tracee naturally
gets SIGILL. I expected that, since ugdb doesn't even know the code
was changed, gdb should write the original byte back before continue,
but this doesn't happen.

Tried to understand how this works with gdbserver, but failed so far.
Will continue, but any hint is very much appreciated ;)

Oleg.
#include linux/module.h
#include linux/proc_fs.h
#include linux/utrace.h
#include linux/poll.h
#include linux/mm.h
#include linux/regset.h
#include asm/uaccess.h

static int o_remote_debug;
module_param_named(echo, o_remote_debug, bool, 0);

#define BUFFER_SIZE 1024
#define PACKET_SIZE 1024

struct pbuf {
char*cur, *pkt;
charbuf[BUFFER_SIZE];
};

static inline void pb_init(struct pbuf *pb)
{
pb-cur = pb-buf;
pb-pkt = NULL;
}

enum {
U_STOP_IDLE = 0,
U_STOP_PENDING,
U_STOP_SENT,
};

struct ugdb {
struct list_headu_processes;
struct list_headu_stopped;

int u_stop_state;

struct mutexu_mutex;
spinlock_t  u_slock;

struct ugdb_thread
*u_cur_tinfo,
*u_cur_hg,
*u_cur_hc;

wait_queue_head_t   u_wait;

int u_err;

struct pbuf u_pbuf;
charu_cbuf[PACKET_SIZE];
int u_clen;

sigset_tu_sig_ign;

unsigned int
u_no_ack:1,
u_allstop:1;
};

static inline void ugdb_ck_stopped(struct ugdb *ugdb)
{
spin_lock(ugdb-u_slock);
WARN_ON(!list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_IDLE);
WARN_ON(list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_PENDING);
spin_unlock(ugdb-u_slock);
}

static struct ugdb *ugdb_create(void)
{
struct ugdb *ugdb;
int err;

err = -ENODEV;
// XXX: ugly. proc_reg_open() should take care.
if (!try_module_get(THIS_MODULE))
goto out;

err = -ENOMEM;
ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL);
if (!ugdb)
goto put_module;

INIT_LIST_HEAD(ugdb-u_processes);
INIT_LIST_HEAD(ugdb-u_stopped);

mutex_init(ugdb-u_mutex);
spin_lock_init(ugdb-u_slock);

init_waitqueue_head(ugdb-u_wait);

pb_init(ugdb-u_pbuf);

return ugdb;

put_module:
module_put(THIS_MODULE);
out:
return ERR_PTR(err);
}

#define P_DETACHING (1  1)
#define P_ZOMBIE(1  2)

struct ugdb_process {
int p_pid;
int p_state;

struct list_headp_threads;

struct ugdb *p_ugdb;
struct list_headp_processes;
};

static inline bool process_alive(struct ugdb_process *process)
{
return !(process-p_state  P_ZOMBIE);
}

static inline void mark_process_dead(struct ugdb_process *process)
{
process-p_state |= P_ZOMBIE;
}

static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr)
{
struct ugdb_process *process;

process = kzalloc(sizeof(*process), GFP_KERNEL);
if (!process)
return NULL;

process-p_pid = pid_nr;
process-p_ugdb = ugdb;
INIT_LIST_HEAD(process-p_threads);
list_add_tail(process-p_processes, ugdb-u_processes);

return process;
}

#define T_STOP_RUN  0
#define T_STOP_REQ  (1  0)/* requested by gdb */
#define T_STOP_ALL  (1  1)/* vCont;c:pX.-1, for report_clone */
#define T_STOP_ACK  (1  2)/* visible to vStopped */
#define T_STOP_STOPPED  (1  3)/* reported as stopped to gdb */
/* TASK_TRACED + deactivated ? */

#define T_EV_NONE   0
#define T_EV_EXIT   (1  24)
#define T_EV_SIGN   (2  24)

#define T_EV_TYPE(event)((0xff  24)  (event))
#define T_EV_DATA(event)(~(0xff  24)  (event))

struct ugdb_thread {
int t_tid;
int 

Re: gdbstub initial code, v8

2010-09-08 Thread Oleg Nesterov
On 09/06, Frank Ch. Eigler wrote:

 Oleg Nesterov o...@redhat.com writes:

  [...]
  Therefore until you track some ugdb-specific software(*)
  breakpoints ugdb does not need to support Z0 IMO.  I guess ugdb
  will never have to support these: thread-related(?) and tracepoint
  ones.

  Good! I thought ugdb should somehow handle this all transparently
  for gdb. I thought (I don't know why) that writing int 3 from gdb
  side should be avoided in favour of some better method unknown to me.

 Please note that last year's gdbstub prototype used kernel uprobes as
 an optional gdb breakpoint implementation (i.e., a backend for the Z
 packets).  When/if the lkml uprobes patches actually get merged, ugdb
 should also use them.

Yes, agreed.

Oleg.



Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-09-06 Thread Oleg Nesterov
On 09/03, Roland McGrath wrote:

   So, it is technically kosher enough to use UTRACE_INTERRUPT without
   UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure
   about all those caveats above.
 
  How? see below.

 I mentioned the examples.

I misunderstood the UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) above
as if you mean it is possible to get UTRACE_SIGNAL_REPORT without QUIESCE.

   But, it's only UTRACE_EVENT(QUIESCE) that
   gives you any expectation of an extra callback for no event from 
   either
   UTRACE_REPORT or UTRACE_INTERRUPT.
 
  Yes, this is clear too.

 Apparently not. ;-)

Perhaps ;) At least I certainly missed your point.

 QUIESCE is the only event type for no event.  If you only asked for
 signal events, then you only get a callback when there is an actual signal
 event.  If you want an extra callback before dequeuing any signal, then
 that is what QUIESCE is for.

OK. This means ugdb should set QUIESCE, just to ensure its -report_signal()
will be called.

  Once again, I am not asking to change utrace now, but could you explain
  what is wrong with the patch below?

 That gives an extra unrequested report_signal callback to every engine that
 is only interested in some signal event.  Consider a crash-catcher
 engine.  It would only track UTRACE_EVENT(SIGNAL_CORE) (and perhaps CLONE
 for its bookkeeping).  This engine never asked for a callback before each
 and every attempt to dequeue any signal,

I see your point (I hope).

 which is what you would give it
 with your change.

No ;) My change was wrong. event == 0 in this case, and then later
utrace_get_signal() checks

if ((want  (event | UTRACE_EVENT(QUIESCE))) == 0)
continue;



OK, please forget. I must admit, this still looks a bit, well, unnatural
to me. May be it makes sense to add

_UTRACE_EVENT_SIGNAL_REPORT,
_UTRACE_EVENT_SIGNAL_HANDLER,

into utrace_events. Then ugdb could ask for UTRACE_SIGNAL_REPORT and
avoid the unnecessary -report_quiesce() calls.

But at least I can see the logic now, thanks.



Roland, could you also comment this patch?

https://www.redhat.com/archives/utrace-devel/2010-August/msg00108.html

Again, this looks like a bug to me, but I won't insist if it is not.

Oleg.



Re: gdbstub initial code, v8

2010-09-06 Thread Oleg Nesterov
On 09/05, Jan Kratochvil wrote:

 On Sat, 04 Sep 2010 00:40:47 +0200, Oleg Nesterov wrote:
  - implement qXfer:siginfo:read
 
  - implement continue with signal.

 OK, thanks, just it was a bit premature to ask for it I see.  I miss at least
 memory writes

Yes. This is simple.

 (also to put in breakpoints):

And this is not clear to me, I need your help ;)

What should ugdb know about breakpoints? I played with the real
gdbserver, and afaics gdb just changes the tracee's memory and
inserts 0xcc (int 3). But gdb.info mentions Z TYPE,ADDR,KIND
packets.

So, what should ugdb do? Just implement memory write (M or X)
and then report SIGTRAP like gdbserver does?

Oleg.



Re: gdbstub initial code, v8

2010-09-06 Thread Oleg Nesterov
On 09/06, Jan Kratochvil wrote:

 On Mon, 06 Sep 2010 20:18:08 +0200, Oleg Nesterov wrote:
  On 09/05, Jan Kratochvil wrote:
 
   (also to put in breakpoints):
 
  And this is not clear to me, I need your help ;)

 Sorry, I just meant that by implementing the memory writes breakpoints
 automatically start to work.


  What should ugdb know about breakpoints? I played with the real
  gdbserver, and afaics gdb just changes the tracee's memory and
  inserts 0xcc (int 3). But gdb.info mentions Z TYPE,ADDR,KIND
  packets.

 I believe it is described by:
   /* If GDB wanted this thread to single step, we always want to
  report the SIGTRAP, and let GDB handle it.  Watchpoints should
  always be reported.  So should signals we can't explain.  A
  SIGTRAP we can't explain could be a GDB breakpoint --- we may or
  not support Z0 breakpoints.  If we do, we're be able to handle
  GDB breakpoints on top of internal breakpoints, by handling the
  internal breakpoint and still reporting the event to GDB.  If we
  don't, we're out of luck, GDB won't see the breakpoint hit.  */


  So, what should ugdb do? Just implement memory write (M or X)
  and then report SIGTRAP like gdbserver does?

 Therefore until you track some ugdb-specific software(*) breakpoints ugdb does
 not need to support Z0 IMO.  I guess ugdb will never have to support these:
 thread-related(?) and tracepoint ones.

Good! I thought ugdb should somehow handle this all transparently
for gdb. I thought (I don't know why) that writing int 3 from gdb
side should be avoided in favour of some better method unknown to me.

Thanks Jan.

Oleg.



Re: gdbstub initial code, v7

2010-09-02 Thread Oleg Nesterov
Sorry for the delay, I was distracted. Trying to switch back to ugdb.

On 08/31, Jan Kratochvil wrote:

  ugdb should support qXfer:siginfo, currently accessible only via $_siginfo
  print/set, though.

 Still sure this feature should be also implemented one day.

Yes sure. This should be simple, although I didn't expect qXfer
needs remote_escape_output() and x86_siginfo_fixup(). I assume
that qXfer:siginfo:read always mean Hg thread. It is not clear
to me what should ugdb report if there is no a valid siginfo.
linux_xfer_siginfo() return E01, but gdbserver uses SIGSTOP to
stop the tracee, so it always has something to report. But ugdb
stop the tracee somewhere else, not in get_signal_to_deliver()
path.

Likewise, it is not clear what should ugdb do if gdb sends
$CSIG in this case. But this all is minor, I think.

I was going to send v8 which implements qXfer:siginfo:read and
continue with signal, but (oh, as always) hit the unexpected
problems. To the point, I had to add printk's to utrace.c to
understand what is wrong. Hopefully tomorrow.

Oleg.



Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-09-02 Thread Oleg Nesterov
I was going to ping you ;) This is connected to the problem I hit today.
Despite the fact I already complained about utrace_get_signal()  QUIESCE,
I forgot about this and figured out it doesn't work in practice.

On 09/02, Roland McGrath wrote:

   If an engine used UTRACE_INTERRUPT without having first set QUIESCE,
   then it has no rightful expectation of getting any callback.
 
  Hmm. This is certainly new to me. Could you confirm?

 Well, I didn't say it precisely correctly.  UTRACE_INTERRUPT serves two
 purposes.  First, it just interrupts things like a signal would.  You could
 use that on its own without even tracing any events at all, just do achieve
 fault injection or similar.  Second, it's like UTRACE_REPORT.

Yes, this is clear.

 As with UTRACE_REPORT, after UTRACE_INTERRUPT you can expect to get some
 report_quiesce callback even if there is no other event you were tracing
 that happens.  For this to actually happen, you need to have
 UTRACE_EVENT(QUIESCE) set.

Hmm. I am not sure I understand. If -report_signal != NULL, then
you can't expect -report_quiesce() after utrace_control(INTERRUPT) ?

 So, it is technically kosher enough to use UTRACE_INTERRUPT without
 UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure
 about all those caveats above.

How? see below.

 But, it's only UTRACE_EVENT(QUIESCE) that
 gives you any expectation of an extra callback for no event from either
 UTRACE_REPORT or UTRACE_INTERRUPT.

Yes, this is clear too.


So. Now I am even more confused. First of all, ugdb (at least currently)
does not need report_quiesce() and UTRACE_EVENT(QUIESCE) at all. OK, this
is not 100% true due to multitracing/etc, lets ignore this. Anyway it
makes sense to clear QUIESCE after $c to avoid the unnecessary callbacks.

All it needs is -report_signal(). But, the problem is, it is not called
after utrace_control(INTERRUPT) ! You need QUIESCE to ensure report_signal()
will be called, and this looks at least strange to me.

Once again, I am not asking to change utrace now, but could you explain
what is wrong with the patch below?

Oleg.

--- x/kernel/utrace.c
+++ x/kernel/utrace.c
@@ -2029,7 +2029,8 @@ int utrace_get_signal(struct task_struct
report.spurious = false;
finish_resume_report(task, utrace, report);
return -1;
-   } else if (!(task-utrace_flags  UTRACE_EVENT(QUIESCE))) {
+   } else if (!(task-utrace_flags 
+   (UTRACE_EVENT(QUIESCE) | 
UTRACE_EVENT_SIGNAL_ALL))) {
/*
 * We only got here to clear utrace-signal_handler.
 */



gdbstub initial code, v7

2010-08-30 Thread Oleg Nesterov
Changes:

- report signals. A bit more code changes than I expected.

- implement QPassSignals, trivial.

Note: $CSIG is not supported yet, and I am not sure I understand
how it should work. Next time...

#include linux/module.h
#include linux/proc_fs.h
#include linux/utrace.h
#include linux/poll.h
#include linux/mm.h
#include linux/regset.h
#include asm/uaccess.h

static int o_remote_debug;
module_param_named(echo, o_remote_debug, bool, 0);

#define BUFFER_SIZE 1024
#define PACKET_SIZE 1024

struct pbuf {
char*cur, *pkt;
charbuf[BUFFER_SIZE];
};

static inline void pb_init(struct pbuf *pb)
{
pb-cur = pb-buf;
pb-pkt = NULL;
}

enum {
U_STOP_IDLE = 0,
U_STOP_PENDING,
U_STOP_SENT,
};

struct ugdb {
struct list_headu_processes;
struct list_headu_stopped;

int u_stop_state;

struct mutexu_mutex;
spinlock_t  u_slock;

struct ugdb_thread
*u_cur_tinfo,
*u_cur_hg,
*u_cur_hc;

wait_queue_head_t   u_wait;

int u_err;

struct pbuf u_pbuf;
charu_cbuf[PACKET_SIZE];
int u_clen;

sigset_tu_sig_ign;

unsigned int
u_no_ack:1,
u_allstop:1;
};

static inline void ugdb_ck_stopped(struct ugdb *ugdb)
{
spin_lock(ugdb-u_slock);
WARN_ON(!list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_IDLE);
WARN_ON(list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_PENDING);
spin_unlock(ugdb-u_slock);
}

static struct ugdb *ugdb_create(void)
{
struct ugdb *ugdb;
int err;

err = -ENODEV;
// XXX: ugly. proc_reg_open() should take care.
if (!try_module_get(THIS_MODULE))
goto out;

err = -ENOMEM;
ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL);
if (!ugdb)
goto put_module;

INIT_LIST_HEAD(ugdb-u_processes);
INIT_LIST_HEAD(ugdb-u_stopped);

mutex_init(ugdb-u_mutex);
spin_lock_init(ugdb-u_slock);

init_waitqueue_head(ugdb-u_wait);

pb_init(ugdb-u_pbuf);

return ugdb;

put_module:
module_put(THIS_MODULE);
out:
return ERR_PTR(err);
}

#define P_DETACHING (1  1)
#define P_ZOMBIE(1  2)

struct ugdb_process {
int p_pid;
int p_state;

struct list_headp_threads;

struct ugdb *p_ugdb;
struct list_headp_processes;
};

static inline bool process_alive(struct ugdb_process *process)
{
return !(process-p_state  P_ZOMBIE);
}

static inline void mark_process_dead(struct ugdb_process *process)
{
process-p_state |= P_ZOMBIE;
}

static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr)
{
struct ugdb_process *process;

process = kzalloc(sizeof(*process), GFP_KERNEL);
if (!process)
return NULL;

process-p_pid = pid_nr;
process-p_ugdb = ugdb;
INIT_LIST_HEAD(process-p_threads);
list_add_tail(process-p_processes, ugdb-u_processes);

return process;
}

#define T_STOP_RUN  0
#define T_STOP_REQ  (1  0)/* requested by gdb */
#define T_STOP_ALL  (1  1)/* vCont;c:pX.-1, for report_clone */
#define T_STOP_ACK  (1  2)/* visible to vStopped */
#define T_STOP_STOPPED  (1  3)/* reported as stopped to gdb */
/* TASK_TRACED + deactivated ? */

#define T_EV_NONE   0
#define T_EV_EXIT   (1  24)
#define T_EV_SIGN   (2  24)

#define T_EV_TYPE(event)((0xff  24)  (event))
#define T_EV_DATA(event)(~(0xff  24)  (event))

struct ugdb_thread {
int t_tid;
int t_stop_state;
int t_stop_event;

struct ugdb *t_ugdb;
struct ugdb_process *t_process;

struct list_headt_threads;
struct list_headt_stopped;

struct pid  *t_spid;

struct utrace_engine*t_engine;
};

static inline bool thread_alive(struct ugdb_thread *thread)
{
WARN_ON((thread-t_tid != 0) != process_alive(thread-t_process));

return thread-t_tid != 0;
}

static inline void mark_thread_dead(struct ugdb_thread *thread)
{
mark_process_dead(thread-t_process);
thread-t_tid = 0;
}

/*
 * The thread should be alive, and it can't pass ugdb_report_death()
 * if the caller holds ugdb-u_mutex. However 

gdbstub initial code, v6

2010-08-27 Thread Oleg Nesterov
Changes:

- adapt to no more utrace changes. Whatever I think
  about utrace I have to agree, it is not practical to
  change utrace now

- attach to the thread-group with the dead leader

- fix the double-attaching bug

- do not assume we can trust pid_task(), the tracee can
  be reaped even if it can't pass -report_death()

- size_t is unsigned! -EAGAIN from -read() makes this
  fd unusable from rw_verify_area() pov

Next: report signals.   
#include linux/module.h
#include linux/proc_fs.h
#include linux/utrace.h
#include linux/poll.h
#include linux/mm.h
#include linux/regset.h
#include asm/uaccess.h

static int o_remote_debug;
module_param_named(echo, o_remote_debug, bool, 0);

#define BUFFER_SIZE 1024
#define PACKET_SIZE 1024

struct pbuf {
char*cur, *pkt;
charbuf[BUFFER_SIZE];
};

static inline void pb_init(struct pbuf *pb)
{
pb-cur = pb-buf;
pb-pkt = NULL;
}

enum {
U_STOP_IDLE = 0,
U_STOP_PENDING,
U_STOP_SENT,
};

struct ugdb {
struct list_headu_processes;
struct list_headu_stopped;

int u_stop_state;

struct mutexu_mutex;
spinlock_t  u_slock;

struct ugdb_thread
*u_cur_tinfo,
*u_cur_hg,
*u_cur_hc;

wait_queue_head_t   u_wait;

int u_err;

struct pbuf u_pbuf;
charu_cbuf[PACKET_SIZE];
int u_clen;

unsigned int
u_no_ack:1,
u_allstop:1;
};

static inline void ugdb_ck_stopped(struct ugdb *ugdb)
{
spin_lock(ugdb-u_slock);
WARN_ON(!list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_IDLE);
WARN_ON(list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_PENDING);
spin_unlock(ugdb-u_slock);
}

static struct ugdb *ugdb_create(void)
{
struct ugdb *ugdb;
int err;

err = -ENODEV;
// XXX: ugly. proc_reg_open() should take care.
if (!try_module_get(THIS_MODULE))
goto out;

err = -ENOMEM;
ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL);
if (!ugdb)
goto put_module;

INIT_LIST_HEAD(ugdb-u_processes);
INIT_LIST_HEAD(ugdb-u_stopped);

mutex_init(ugdb-u_mutex);
spin_lock_init(ugdb-u_slock);

init_waitqueue_head(ugdb-u_wait);

pb_init(ugdb-u_pbuf);

return ugdb;

put_module:
module_put(THIS_MODULE);
out:
return ERR_PTR(err);
}

#define P_DETACHING (1  1)
#define P_ZOMBIE(1  2)

struct ugdb_process {
int p_pid;
int p_state;

struct list_headp_threads;

struct ugdb *p_ugdb;
struct list_headp_processes;
};

static inline bool process_alive(struct ugdb_process *process)
{
return !(process-p_state  P_ZOMBIE);
}

static inline void mark_process_dead(struct ugdb_process *process)
{
process-p_state |= P_ZOMBIE;
}

static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid_nr)
{
struct ugdb_process *process;

process = kzalloc(sizeof(*process), GFP_KERNEL);
if (!process)
return NULL;

process-p_pid = pid_nr;
process-p_ugdb = ugdb;
INIT_LIST_HEAD(process-p_threads);
list_add_tail(process-p_processes, ugdb-u_processes);

return process;
}

#define T_STOP_RUN  0
#define T_STOP_REQ  (1  0)/* requested by gdb */
#define T_STOP_ALL  (1  1)/* vCont;c:pX.-1, for report_clone */
#define T_STOP_ACK  (1  2)/* visible to vStopped */
#define T_STOP_STOPPED  (1  3)/* reported as stopped to gdb */
/* TASK_TRACED + deactivated ? */
struct ugdb_thread {
int t_tid;
int t_stop_state;
int t_stop_code;

struct ugdb *t_ugdb;
struct ugdb_process *t_process;

struct list_headt_threads;
struct list_headt_stopped;

struct pid  *t_spid;

struct utrace_engine*t_engine;
};

static inline bool thread_alive(struct ugdb_thread *thread)
{
WARN_ON((thread-t_tid != 0) != process_alive(thread-t_process));

return thread-t_tid != 0;
}

static inline void mark_thread_dead(struct ugdb_thread *thread)
{
mark_process_dead(thread-t_process);
thread-t_tid = 0;
}

/*
 * The thread should be alive, and it can't pass ugdb_report_death()
 * if the caller 

Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  On 08/16, Roland McGrath wrote:
  
  Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5
  get_utrace_lock() must not succeed if utrace-reap == T, this becomes
  a bit off-topic. However, I thought about relaxing the dead check in
  get_utrace_lock(), instead of utrace-reap we could check
  utrace-reap  !utrace-death. In fact, initially I was going to
  do this, but then decided to make the simpler patch for now.

 When utrace-reap is set, it means release_task() has been called.  The
 API caller has no way to know reaping hasn't already begun--except if
 its report_death callback has not returned yet.

No! the task can be already reaped even before -report_death() was
called. The task_struct itself can't go away, but that is all (perhaps
you meant this).

And this is the problem for ugdb. Damn, I knew this, that is why
ugdb_process_exit() doesn't try to read -group_exit_code. Still I
managed to forget that this has other implications.

 But I think that the
 API definition of once release_task() has been called (i.e. entered,
 maybe not returned yet), any utrace call will get -ESRCH, is a clear
 and comprehensible constraint.  We should not open any holes in that.

I am not sure. I had another reason in mind to check reap  !death in
get_utrace_lock(), synchronization with -report_death() callback.

But OK. Let's forget about more utrace changes for now.

Oleg.



Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  Wait. It doesn't break this. It only breaks -EALREADY contract. And
  I don't understand why this is bad.

 The -EALREADY contract lets you have a report_death callback that does all
 your cleanup and then returns UTRACE_DETACH.

I think this is possible without the current -EALREADY contract.

 To have that plan, you need a
 way for an asynchronous detach attempt to be excluded once report_death has
 begun, and for that asynchronous caller to know that's the situation.

The problem is, with the current conract detach is never simple if you
have report_death. You always have to synchronize with this callback.

But see below.

 It breaks the documented API.  I am of course open to changing the API.
 But we need to get completely clear on what the new range of plans we'll
 support is, and make the documentation and the implementation match.

Yes. I thought it makes sense to change the API and docs if we can improve
things, before utrace is widely used.

But OK, lets's forget about this.

  Why? What is the point? What makes UTRACE_EVENT(DEATH) so special?
  I do not see the logic at all.

 It is special because it is the last interesting event to a traditional
 user such as ptrace.  Hence it is attractive to write a report_death
 callback that returns UTRACE_DETACH spontaneously, i.e. without an
 asynchronous caller (i.e. debugger thread) having requested the detach.

Sure. And this is still possible.

  If -report_death() does something which the caller of utrace_control()
  should know, then they should take care of synchronization anyway.
  With or without this patch, utrace_control(DEATH) can return 0 after
  -report_death() was already called.

 If your engine's report_death always returns UTRACE_DETACH, then
 utrace_control(UTRACE_DETACH) can never return 0 once report_death
 is about to be called nor any time thereafter.

I don't undestand this. OK, probably I missed something, this doesn't
matter.

 But, the current state of play from the broader perspective is that we
 really have no idea about the future viability of the utrace API.  I don't
 think it makes a lot of sense to put a great deal of effort into perfecting
 the corners of the API much further when we don't really have any good
 reasons to think that this API will be the basis of anything that survives
 in the long run.

Yes. I have to agree, this is good point.

So, lets forget about these changes, at least for now.

 The current focus of work is your ugdb prototype.  If some utrace changes
 make it greatly easier to get that to kind-of-working status, then they are
 worthwhile.

No. I didn't think about ugdb at all. I'll find the workaround for ugdb.
If nothing else, I can use utrace_engine_ops-release(). Or something else.

 If it
 works in practice but has disturbing robustness holes, it is OK to just
 comment that loudly now and move on to the next battle, IMHO.

OK, agreed.

Oleg.



Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  OK, instead of filling utrace_detached_ops{} we can change start_callback()
 
  - if (want  UTRACE_EVENT(QUIESCE)) {
  + if ((want  UTRACE_EVENT(QUIESCE)) || ops == detached_ops) {

 If we're going to have a special-case check for utrace_detached_ops, then
 it can just short-circuit entirely and there is no need for any callbacks
 in utrace_detached_ops.  (Then we might even use NULL or (void *)-1l
 instead of utrace_detached_ops.)  All it needs to do is:

   report-detaches = true;
   utrace-reporting = NULL;
   return NULL;

Yes, I thought about this too, see the changelog.

But probably we need utrace_detached_ops-report_reap() anyway.
-report_death can return UTRACE_DETACH, and after that utrace_report_death()
can skip utrace_reset() and do -report_reap().

  And, in particular, I don't understand this code in utrace_get_signal:
 
  } else if (!(task-utrace_flags  UTRACE_EVENT(QUIESCE))) {
  /*
   * We only got here to clear utrace-signal_handler.
   */
  return -1;
  }
 
  How so? What if we were called because of utrace_control(INTERRUPT)
  and QUIESCE is not set?

 If an engine used UTRACE_INTERRUPT without having first set QUIESCE,
 then it has no rightful expectation of getting any callback.

Hmm. This is certainly new to me. Could you confirm?

If yes, shouldn't we change utrace_control()

- if (action = UTRACE_REPORT  action  UTRACE_RESUME 
+ if (action = UTRACE_INTERRUPT  action  UTRACE_RESUME 
  unlikely(!(engine-flags  UTRACE_EVENT(QUIESCE {

then?

I must admit, I do not understand why INTERRUPT needs QUIESCE.
utrace_get_signal() should respect QUIESCE, this is clear. But why
it is needed?

Oleg.



Re: [PATCH] fix mark_engine_detached() vs start_callback() race

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  I think in the longer term mark_engine_detached() should not change
  engine-flags at all but add QUIESCE to -utrace_flags. However, this
  breaks utrace_maybe_reap(reap = true) and we should avoid the race
  with finish_callback() which clears -reporting after report_quiesce().

 What's the benefit to adding QUIESCE?  If any utrace code gets entered at
 all, then any callback run will be able to do the special-case ops check
 for detached engines.

Well yes, but otoh it could be the last engine. We shouldn't miss, say,
start_callback() from utrace_resume() (although currently -spurious helps).

Anyway, this needs more thinking, and probably you are right and QUIESCE
is not needed.

Oleg.



Re: gdbstub initial code, v5

2010-08-24 Thread Oleg Nesterov
On 08/23, Oleg Nesterov wrote:

 However. I spent all Monday trying to resolve the new bug, and
 so far I do not understand what happens. Extremely hard to reproduce,
 and the kernel just hangs silently, without any message.
 So far I suspect the proble in utrace.c, but this time I am not sure.

Solved. This was scheduler bug fixed in 2.6.35, but I used 2.6.34.
This is really funny. This bug (PF_STARTING lockup) was found and
fixed by me  Peter.

Oh. But I hit yet another problem, BUG_ON() in __utrace_engine_release().
Again, it is not reproducible, I saw it only once in dmesg and I do
not even know for sure what I was doing.

I'll contiue tomorrow, but if I won't be able to quickly resolve
this problem I am going to ignore it for now. This time I think
ugdb is wrong.

Oleg.



Re: gdbstub initial code, v5

2010-08-23 Thread Oleg Nesterov
Just a small report to explain what I am doing...

On 08/20, Oleg Nesterov wrote:

   - I forgot to implement the attach to the thread group
 with the dead leader. Next time.

Almost done, but we should avoid the races with exec somehow.
But this is minor.

I tried to test this code as much as I can. Again, I do not use
gdb at all, I am using the scripts which try to really stress ugdb.

Found 2 bugs in ugdb.ko, the second one is not nice but at least
I have the temporary fix.

However. I spent all Monday trying to resolve the new bug, and
so far I do not understand what happens. Extremely hard to reproduce,
and the kernel just hangs silently, without any message.
So far I suspect the proble in utrace.c, but this time I am not sure.

Will continue tomorrow...

Oleg.



gdbstub initial code, v5

2010-08-20 Thread Oleg Nesterov
On 08/19, Oleg Nesterov wrote:

 Next step: handle exit correctly and report W/S. I misunderstood
 what gdbserver does when the main thread exits, it is not stupid
 as I wrongly thought.

Yes, in non-stop mode gdbserver reports W/X;process:PID when the
last thread exits. This makes sense, so does ugdb.

But,

==
All, please ack/nack the behavioral difference!

When the main thread exits, gdbserver still exposes it to gdb as
a running process. It is visible via info threads, you can switch
to this thread, $Tp or $Hx result in OK as if this thread is alive.
gdbserver even pretends that $vCont;x:DEAD_THEAD works, although
this thread obviously can never report something.

I don't think this is really right. This just confuses the user, and
imho this should be considered like the minor bug.

ugdb doesn't do this. If the main thread exits - it exits like any
other thread. I played with gdb, it seems to handle this case fine.

==

Problems:

- I forgot to implement the attach to the thread group
  with the dead leader. Next time.

- The exit code (Wxx) can be wrong in mt-case.

  The problem is, -report_death can't safely access
  -group_exit_code with kernel  2.6.35. This is
  solveable.

Roland, sorry, I ignored your emails for today. It is not easy to me
to switch between ugdb an utrace ;)

Oleg.
#include linux/module.h
#include linux/proc_fs.h
#include linux/utrace.h
#include linux/poll.h
#include linux/mm.h
#include linux/regset.h
#include asm/uaccess.h

static int o_remote_debug;
module_param_named(echo, o_remote_debug, bool, 0);

#define BUFFER_SIZE 1024
#define PACKET_SIZE 1024

struct pbuf {
char*cur, *pkt;
charbuf[BUFFER_SIZE];
};

static inline void pb_init(struct pbuf *pb)
{
pb-cur = pb-buf;
pb-pkt = NULL;
}

enum {
U_STOP_IDLE = 0,
U_STOP_PENDING,
U_STOP_SENT,
};

struct ugdb {
struct list_headu_processes;
struct list_headu_stopped;

int u_stop_state;

struct mutexu_mutex;
spinlock_t  u_slock;

struct ugdb_thread
*u_cur_tinfo,
*u_cur_hg,
*u_cur_hc;

wait_queue_head_t   u_wait;

int u_err;

struct pbuf u_pbuf;
charu_cbuf[PACKET_SIZE];
int u_clen;

unsigned int
u_no_ack:1,
u_allstop:1;
};

static inline void ugdb_ck_stopped(struct ugdb *ugdb)
{
// XXX: temporary racy check
WARN_ON(!list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_IDLE);
WARN_ON(list_empty(ugdb-u_stopped) 
ugdb-u_stop_state == U_STOP_PENDING);
}

static struct ugdb *ugdb_create(void)
{
struct ugdb *ugdb;
int err;

err = -ENODEV;
// XXX: ugly. proc_reg_open() should take care.
if (!try_module_get(THIS_MODULE))
goto out;

err = -ENOMEM;
ugdb = kzalloc(sizeof(*ugdb), GFP_KERNEL);
if (!ugdb)
goto put_module;

INIT_LIST_HEAD(ugdb-u_processes);
INIT_LIST_HEAD(ugdb-u_stopped);

mutex_init(ugdb-u_mutex);
spin_lock_init(ugdb-u_slock);

init_waitqueue_head(ugdb-u_wait);

pb_init(ugdb-u_pbuf);

return ugdb;

put_module:
module_put(THIS_MODULE);
out:
return ERR_PTR(err);
}

#define P_DETACHING (1  1)
#define P_ZOMBIE(1  2)

struct ugdb_process {
int p_pid;
int p_state;

struct list_headp_threads;

struct ugdb *p_ugdb;
struct list_headp_processes;
};

static inline bool process_alive(struct ugdb_process *process)
{
return !(process-p_state  P_ZOMBIE);
}

static inline void mark_process_dead(struct ugdb_process *process)
{
process-p_state |= P_ZOMBIE;
}

static struct ugdb_process *ugdb_create_process(struct ugdb *ugdb, int pid)
{
struct ugdb_process *process;

process = kzalloc(sizeof(*process), GFP_KERNEL);
if (!process)
return NULL;

process-p_pid = pid;
process-p_ugdb = ugdb;
INIT_LIST_HEAD(process-p_threads);
list_add_tail(process-p_processes, ugdb-u_processes);

return process;
}

#define T_STOP_RUN  0
#define T_STOP_REQ  (1  0)/* requested by gdb */
#define T_STOP_ALL  (1  1)/* vCont;c:pX.-1, for report_clone */
#define T_STOP_ACK  (1  2)/* visible to vStopped */
#define T_STOP_STOPPED  (1

Re: gdbstub initial code, v4

2010-08-20 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  Note the second attachment, GDBCAT. It is just the simple perl
  script which connects /proc/ugdb to tcp port. It can be used for
  remote debugging via tcp, or with (unpatched) gdb which can't do
  select() on /proc/ugdb.

 bash$ nc -l 2000  /proc/ugdb 

Yes, this works,

  Actually, it is more convenient to use
  it in any case, at least for logging purposes.

 (gdb) set remote debug 1

set debug remote 1. Yes, I tried this. Very, very inconvenient.
In this case the debugging output is mixed with the normal output,

Also: you can't save to file, can't filter packets, can't add
the packet for debugging (not implemented yet).

But yes, it is not strictly necessary, nc should work too.

Oleg.



Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-18 Thread Oleg Nesterov
On 08/17, Oleg Nesterov wrote:

 On 08/16, Roland McGrath wrote:
 
   The problem is, utrace_control(DETACH) does nothing and returns
   -EALREADY if utrace-death is set, this is not right. We can and
   should detach in this case, we only should skip utrace_reset() to
   avoid the race with utrace_report_death()-REPORT_CALLBACKS().
 
  This behavior is the original (minimal) synchronization scheme from before
  we had utrace_barrier.  See Interlock with final callbacks in
  Documentation/DocBook/utrace.tmpl.

 OK, I agree my patch breaks this part

   Normally
   functionutrace_control/function called with
   constantUTRACE_DETACH/constant returns zero, and this means that no
   more callbacks will be made.

 of documentation (which I never read ;)

Wait. It doesn't break this. It only breaks -EALREADY contract. And
I don't understand why this is bad.

 But in any case, personally I dislike the current behaviour anyway,
 I think this certainly complicates the life for module writers.
 Instead of simple detach + barrier, you always need the nontrivial
 code if report_quiesce/death ever touches engine-data! This can't
 be good.

Yes.

Roland, could you explain once again why do you dislike this patch?


Once again. Currently utrace_control(DETACH) refuses to even try to
detach the engine if utrace-death is set.

Why? What is the point? What makes UTRACE_EVENT(DEATH) so special?
I do not see the logic at all.

If -report_death() does something which the caller of utrace_control()
should know, then they should take care of synchronization anyway.
With or without this patch, utrace_control(DEATH) can return 0 after
-report_death() was already called.


Currently, utrace_control(DETACH) returns -EALREADY when this callback
was alredy called, or it can be called later, or it may be running. And
utrace_barrier() can't help.

With this patch utrace_control(DETACH) returns either 0 with the same
meaning (will not be called later, but probably was already called
before utrace_control), or -EINPROGRESS which suggests to use
utrace_barrier().


Please correct me, but I think this certainly makes things much simpler.
Otherwise UTRACE_DETACH is never trivial (and iiuc it is better to
avoid utrace_engine_ops-release() hook).

What do you think?

Oleg.



[PATCH] fix mark_engine_detached() vs start_callback() race

2010-08-18 Thread Oleg Nesterov
Suppose that engine-flags == UTRACE_EVENT(EXEC), QUIESCE bit is not set.

1. start_callback() reads want = engine-flags (== EXEC)

2. mark_engine_detached() sets engine-ops = utrace_detached_ops

3. start_callback() gets ops = utrace_detached_ops

After that start_callback() skips if (want  UTRACE_EVENT(QUIESCE))
block and returns utrace_detached_ops, then -report_exec == NULL
will be called.

This is the minimal temporary ugly fix for now, we should certainly
cleanup and simplify this logic. The barriers in mark_engine_detached()
and in start_callback() can't help and should be removed. If we ignore
utrace_get_signal() we do not even need utrace_detached_quiesce(),
start_callback() could just do

ops = engine-ops;

if (ops == utrace_detached_ops) {
report-detaches = true;
return NULL;
}

I think in the longer term mark_engine_detached() should not change
engine-flags at all but add QUIESCE to -utrace_flags. However, this
breaks utrace_maybe_reap(reap = true) and we should avoid the race
with finish_callback() which clears -reporting after report_quiesce().

A bit off-topic, but I don't think finish_callback() should check
engine-ops == utrace_detached_ops before return. Instead we should
change finish_callback_report() to return the boolean. We shouldn't
return true without setting report-detaches.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- kstub/kernel/utrace.c~8_fix_mark_detached_without_quiesce   2010-08-18 
16:46:08.0 +0200
+++ kstub/kernel/utrace.c   2010-08-18 17:47:53.0 +0200
@@ -1522,7 +1522,7 @@ static const struct utrace_engine_ops *s
smp_rmb();
ops = engine-ops;
 
-   if (want  UTRACE_EVENT(QUIESCE)) {
+   if ((want  UTRACE_EVENT(QUIESCE)) || ops == utrace_detached_ops) {
if (finish_callback(task, utrace, report, engine,
(*ops-report_quiesce)(report-action,
   engine, event)))



[PATCH?] avoid the unnecessary utrace_resume()-utrace_reset()

2010-08-18 Thread Oleg Nesterov
utrace_resume(UTRACE_REPORT) always calls utrace_reset() because
start_callback() obviously can't clear report-spurious when
event == 0.

Change start_callback() to correctly clear -spurious in this case.
We could probably clear it in utrace_resume() unconditionally after
reporting loop, but this is not exactly right if there are no engines
which have QUIESCE in engine-flags.

utrace_get_signal() probably has the same problem.

Note: utrace_control(DETACH) does utrace_do_stop() and sets UTRACE_REPORT
if the tracee is not stopped. It also does mark_engine_detached() which
does not set QUIESCE in target-utrace_flags. This means we rely on
report.spurious which should provoke utrace_reset() from utrace_resume()
if target-utrace_flags doesn't have QUIESCE. A bit too subtle, imho.
Also, UTRACE_REPORT can be lost because of UTRACE_INTERRUPT or normal
signal: utrace_get_signal() checks utrace_flags  UTRACE_EVENT(QUIESCE)
and returns otherwise. This should be fixed somehow. This check is wrong
anyway, but it is not clear how we can fix the race with DETACH.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- kstub/kernel/utrace.c~10_utrace_resume_and_spurious 2010-08-18 
19:00:50.0 +0200
+++ kstub/kernel/utrace.c   2010-08-18 19:41:05.0 +0200
@@ -1540,7 +1540,7 @@ static const struct utrace_engine_ops *s
if (want  ENGINE_STOP)
report-action = UTRACE_STOP;
 
-   if (want  event) {
+   if (want  (event ?: UTRACE_EVENT(QUIESCE))) {
report-spurious = false;
return ops;
}



Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case

2010-08-17 Thread Oleg Nesterov
On 08/16, Roland McGrath wrote:

  - It is possible that both -death and -reap are true. In this
case it is OK to clear UTRACE_EVENT(REAP), but set_events fails.

 No, it's not OK to clear it.  Once -reap is set, then the engine's
 ops-report_reap might or might not have been called already.

Afaics - no.

If utrace-death is set (and we check it under utrace-lock) we can
ignore utrace-reap.

In short, if ops-report_reap can be called before -death is cleared,
then 2 possible callers of utrace_maybe_reap() can race with each other,
but this can't happen.

utrace-death == T means:

- (utrace_flags  _UTRACE_DEATH_EVENTS) == T

- utrace-death was set by utrace_report_death() which will take
  utrace-lock later and clear -death, only then it may call
  ops-report_reap().

- until utrace_report_death() clears -death, _UTRACE_DEATH_EVENTS
  must be set in -utrace_flags, otherwise utrace_maybe_reap(true)
  is buggy.

  Note that both utrace-death and _UTRACE_DEATH_EVENTS are cleared
  atomically from utrace-lock pov.

IOW. utrace-death is true, then

IF (utrace-reap)

tracehook_prepare_releas()-utrace_maybe_reap(true) was
already called, this is how utrace-reap was set.

But utrace_maybe_reap() did nothing and returned.
We rely on the subsequent utrace_maybe_reap(false) from
utrace_report_death() - but this can't happen until
we drop utrace-lock

ELSE
utrace-reap can't be set until we drop utrace-lock.   


Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5
get_utrace_lock() must not succeed if utrace-reap == T, this becomes
a bit off-topic. However, I thought about relaxing the dead check in
get_utrace_lock(), instead of utrace-reap we could check
utrace-reap  !utrace-death. In fact, initially I was going to
do this, but then decided to make the simpler patch for now.

Oleg.



Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks

2010-08-17 Thread Oleg Nesterov
On 08/16, Roland McGrath wrote:

  In short: this double check allows to avoid tasklist when utrace_flags
  already has these bits while engine-flags doesn't.
 
  But multitracing is unlikely case, in the likely case if we add
  _UTRACE_DEATH_EVENTS to engine-flags we set these bits in utrace_flags
  too. I think it makes sense to consolidate these checks to make the
  code a bit more understandable.

 I don't really agree about unlikely case.  In many uses, the systemtap
 task-finder will have a utrace engine on every task in the system, for
 example.  Moreover, this is an importantly distinct particular kind of
 micro-optimization to be doing here: avoiding a system-wide lock.  Any
 place that we take tasklist_lock at all, we are introducing a system-wide
 slowdown or limitation on scaling, just because of our tracing of one task.
 So optimizing the minimize that is qualitatively different and really much
 more important than avoiding taking the utrace lock, for example.

 But with that note in your mind, I am happy to take this patch now as a
 simplification and let reoptimization come back later.

OK, agreed.

Oleg.



Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-08-17 Thread Oleg Nesterov
On 08/17, Oleg Nesterov wrote:

 Suppose that, say, engine-flags == UTRACE_EVENT(EXEC) (no QUIESCE).

   1. start_callback() reads want = engine-flags (== EXEC)

   2. mark_engine_detached() sets engine-ops = utrace_detached_ops

   3. start_callback() gets ops = utrace_detached_ops

 After that start_callback() skips if (want  UTRACE_EVENT(QUIESCE)) block
 and returns utrace_detached_ops, utrace_detached_ops-report_exec == NULL
 will be called.

OK, instead of filling utrace_detached_ops{} we can change start_callback()

- if (want  UTRACE_EVENT(QUIESCE)) {
+ if ((want  UTRACE_EVENT(QUIESCE)) || ops == detached_ops) {


 But this is minor. Roland, I can't explain this, but I have the strong
 gut feeling there is something else we should worry about. I am still
 reading this code...

Or not... I'll recheck once again tomorrow and try to make the patch.

In particular, I'd like to think if we can simplify this somehow.
Say, avoid the barriers somewhere, or avoid changing engine-flags in
mark_engine_detached().

I am worried that mark_engine_detached() sets engine-flags, but it
doesn't add QUIESCE to utrace_flags. I can't convince myself that
utrace_do_stop() closes all possible races.


And, in particular, I don't understand this code in utrace_get_signal:

} else if (!(task-utrace_flags  UTRACE_EVENT(QUIESCE))) {
/*
 * We only got here to clear utrace-signal_handler.
 */
return -1;
}

How so? What if we were called because of utrace_control(INTERRUPT)
and QUIESCE is not set?

Oleg.



[PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-16 Thread Oleg Nesterov
Suppose that we want to detach the engine and free engine-data. To
avoid the races with our calbacks which can use -data we should do
either

err = utrace_set_events(0);
if (err == ...)
utrace_barrier();
utrace_control(DETACH);

or

err = utrace_control(DETACH);
if (err = ...)
utrace_barrier();

Neither variant works if we should care about report_quiesce() or
report_death(). Let's discuss the 2nd case, the 1st one have the
similar problems.

The problem is, utrace_control(DETACH) does nothing and returns
-EALREADY if utrace-death is set, this is not right. We can and
should detach in this case, we only should skip utrace_reset() to
avoid the race with utrace_report_death()-REPORT_CALLBACKS().

This was the main problem I hit during the testing. Consider the
exiting tracee with the valid engine-ops, suppose that
utrace_control(DETACH) is called right after utrace_report_death()
unlocks utrace-lock and before it does REPORT_CALLBACKS().

utrace_control() fails, but utrace_barrier() does not help after
that. It takes get_utrace_lock() successfully and returns 0.
The caller frees engine-data, but utrace_report_death() calls
-report_death() after that.

Kill utrace_control_dead(). Change utrace_control() to always
allow UTRACE_DETACH if engine has the valid -ops. This means that
mark_engine_detached() can race with REPORT_CALLBACKS() but there
is nothing new. utrace_do_stop() is unnecessary and pointless in
this case, but it doesn't hurt.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |   49 ++---
 1 file changed, 10 insertions(+), 39 deletions(-)

--- kstub/kernel/utrace.c~6_fix_control_dead2010-08-16 11:17:05.0 
+0200
+++ kstub/kernel/utrace.c   2010-08-16 11:17:51.0 +0200
@@ -928,37 +928,6 @@ void utrace_maybe_reap(struct task_struc
}
 }
 
-/*
- * You can't do anything to a dead task but detach it.
- * If release_task() has been called, you can't do that.
- *
- * On the exit path, DEATH and QUIESCE event bits are set only
- * before utrace_report_death() has taken the lock.  At that point,
- * the death report will come soon, so disallow detach until it's
- * done.  This prevents us from racing with it detaching itself.
- *
- * Called only when @target-exit_state is nonzero.
- */
-static inline int utrace_control_dead(struct task_struct *target,
- struct utrace *utrace,
- enum utrace_resume_action action)
-{
-   lockdep_assert_held(utrace-lock);
-
-   if (action != UTRACE_DETACH || unlikely(utrace-reap))
-   return -ESRCH;
-
-   if (unlikely(utrace-death))
-   /*
-* We have already started the death report.  We can't
-* prevent the report_death and report_reap callbacks,
-* so tell the caller they will happen.
-*/
-   return -EALREADY;
-
-   return 0;
-}
-
 /**
  * utrace_control - control a thread being traced by a tracing engine
  * @target:thread to affect
@@ -1115,18 +1084,20 @@ int utrace_control(struct task_struct *t
ret = 0;
 
/*
-* -exit_state can change under us, this doesn't matter.
-* We do not care about -exit_state in fact, but we do
-* care about -reap and -death. If either flag is set,
-* we must also see -exit_state != 0.
+* -exit_state can change under us, this doesn't matter. But,
+* if utrace-death is set we must also see -exit_state != 0,
+* this is what we care about.
 */
if (unlikely(target-exit_state)) {
-   ret = utrace_control_dead(target, utrace, action);
-   if (ret) {
+   /* You can't do anything to a dead task but detach it */
+   if (action != UTRACE_DETACH) {
spin_unlock(utrace-lock);
-   return ret;
+   return -ESRCH;
}
-   reset = true;
+
+   /* If it is not set, we can safely do utrace_reset() */
+   if (likely(!utrace-death))
+   reset = true;
}
 
switch (action) {



[PATCH 3/3] utrace_set_events: kill now unnecessary exit_state/reap checks

2010-08-16 Thread Oleg Nesterov
Now that get_utrace_lock() can never succeed if utrace-reap is true,
we can kill this check in utrace_set_events(). This also means we can
kill the target-exit_state check, it buys nothing now.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |   13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

--- kstub/kernel/utrace.c~7_set_events_kill_reap2010-08-16 
11:17:51.0 +0200
+++ kstub/kernel/utrace.c   2010-08-16 11:28:00.0 +0200
@@ -539,15 +539,10 @@ int utrace_set_events(struct task_struct
old_utrace_flags = target-utrace_flags;
old_flags = engine-flags  ~ENGINE_STOP;
 
-   /* If -death or -reap is true we must see exit_state != 0. */
-   if (target-exit_state) {
-   if (utrace-death) {
-   if ((old_flags  ~events)   _UTRACE_DEATH_EVENTS)
-   goto unlock;
-   } else if (utrace-reap) {
-   if ((old_flags ^ events)  UTRACE_EVENT(REAP))
-   goto unlock;
-   }
+   if ((old_flags  ~events)   _UTRACE_DEATH_EVENTS) {
+   /* Too late if utrace_report_death() is in progress */
+   if (utrace-death)
+   goto unlock;
}
 
/*



Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-08-16 Thread Oleg Nesterov
On 08/16, Oleg Nesterov wrote:

 Well, I tried to test these changes, seems to work.

 But... From 2/3 changelog:

   This means that
   mark_engine_detached() can race with REPORT_CALLBACKS() but there
   is nothing new.

 Yes, there is nothing new. But, Roland, this is WRONG!

 With or without these fixes utrace_control()-mark_engine_detached()
 was always wrong if it races with REPORT/REPORT_CALLBACKS. Not sure
 what we were thinking about :/

 Look. Suppose that utrace_control(DETACH) is called right after
 start_callback() returns ops != NULL. After that
 finish_callback(...,  ops-callback(...)) can hit -callback == NULL!

Cough. I guess I spent too much time with utrace and testing today ;)

Let me try again. mark_engine_detached() does

engine-ops = utrace_detached_ops;
smp_wmb();
engine-flags = UTRACE_EVENT(QUIESCE);

Whatever we do, start_callback() can see the old engine-flags but
the new -ops = utrace_detached_ops. Just suppose that the caller
of UTRACE_DETACH is interrupted right after setting engine-ops.

 The obvious and simplest solution: utrace_detached_ops{} should
 have the dummy handler for every callback. But I'd like to think
 a bit more.

Yes. Perhaps (unlikely) we can reverse the order, but I can no longer
think properly today.

Or do you think I miss something and this is false alarm?

Oleg.



  1   2   3   4   5   6   >