Re: utrace unwanted traps

2010-10-21 Thread Oleg Nesterov
On 10/19, Roland McGrath wrote:

   But that's how it used to be, and then we changed it.
   So it merits digging up our old discussion threads that led to doing that
   and see what all we had in mind then.
 
  I tried to grep my mail archives and google, but failed to find anything
  related.

 Based on the date of the commit you just mentioned, I browsed in:
   https://www.redhat.com/archives/utrace-devel/2009-October/thread.html
 and saw:
   https://www.redhat.com/archives/utrace-devel/2009-October/msg00287.html
 which is the day before that commit.  I didn't read through all that to
 refamiliarize myself with all the details.

Yes, I saw this thread, but didn't read it carefully. When I read it again,
I started to believe that yes, it can explain the current -resume logic.

However, I think this is false alarm. With a lot of efforts, I seem to
fully understand our discussion.

In short: I believe we never discussed why utrace-resume should be
*STEP, especially if task is stopped or it is going to stop. Although
I recall I thought this is obviously nice when I tried to review
utrace: sticky resume action.


As for this thread. Yes, I tried to suggest the (ugly)
utrace-set_singlestep which should be checked after wakeup. But my
motivation was quite different, what I was trying to preserve was
TIF_SINGLESTEP, not resume_action. This is why the top email says
I no longer think utrace_control() should just turn SINGLESTEP into
REPORT silently.


Please recall that, by that time

1. utrace_control(SINGLESTEP) called user_enable_single_step()

2. To handle the stepping over syscall correctly, both ptrace
   and ptrace-utrace relied on syscall_trace_leave() paths
   which checked TIF_SINGLESTEP to send SIGTRAP if needed.

1 was wrong, user_enable_single_step() was called under utrace-lock.
It was decided that the tracee itself should call this helper before
it returns to user-mode. To implement this, we could change
utrace_control(SINGLESTEP) to set TIF_NOTIFY_RESUME, so that
-report_quiesce() could return UTRACE_SINGLESTEP.

However, this breaks 2, the stepping over syscall. By the time
-report_quiesce() is called we already passed syscall_trace_leave()
without TIF_SINGLESTEP, and -report_quiesce() can't just return
UTRACE_SINGLESTEP but otoh it can't know that we should synthesize
a trap before return to user-mode.


So. I do not think there is any reason to ever keep UTRACE_*STEP in
utrace-resume. Except, if utrace_control(STEP) sets -resume = STEP
before wakeup, we avoid the unnecessary reporting loop in utrace_resume.
But at the same time, this leads to the problems we are discussing.


I even tested the kernel with the patch below, everything _seems_
to work (not that I think this can proves something).

What do you think?

Oleg.

--- kstub/kernel/utrace.c~XXX_RESUME2010-10-20 18:18:40.0 +0200
+++ kstub/kernel/utrace.c   2010-10-21 18:48:52.0 +0200
@@ -768,11 +768,13 @@ relock:
/*
 * Ensure a reporting pass when we're resumed.
 */
-   utrace-resume = action;
if (action == UTRACE_INTERRUPT)
set_thread_flag(TIF_SIGPENDING);
-   else
+   else {
+   action = UTRACE_REPORT;
set_thread_flag(TIF_NOTIFY_RESUME);
+   }
+   utrace-resume = action;
}
 
/*
@@ -1195,6 +1197,7 @@ int utrace_control(struct task_struct *t
 * In that case, utrace_get_signal() will be reporting soon.
 */
clear_engine_wants_stop(engine);
+   action = UTRACE_REPORT;
if (action  utrace-resume) {
utrace-resume = action;
set_notify_resume(target);
@@ -1381,11 +1384,13 @@ static void finish_report(struct task_st
 
if (resume  utrace-resume) {
spin_lock(utrace-lock);
-   utrace-resume = resume;
if (resume == UTRACE_INTERRUPT)
set_tsk_thread_flag(task, TIF_SIGPENDING);
-   else
+   else {
+   resume = UTRACE_REPORT;
set_tsk_thread_flag(task, TIF_NOTIFY_RESUME);
+   }
+   utrace-resume = resume;
spin_unlock(utrace-lock);
}
 



Re: utrace unwanted traps

2010-10-19 Thread Oleg Nesterov
On 10/14, Roland McGrath wrote:

  In short: I no longer understand why utrace-resume can be UTRACE_*STEP
  when the tracee is stopped, and in fact I think this is not right.

 I think we didn't have that originally.

Yes.

 The rationale that I remember is
 the idea that you should be able to implement a debugger interface feature
 like PTRACE_SINGLESTEP without noticeably more overhead than it had in
 pre-utrace ptrace.  That is, in vanilla ptrace, the tracer thread calls
 user_enable_single_step itself and then just lets the tracee resume
 directly to user mode.  We wanted to avoid always having an additional trip
 through tracehook_notify_resume and the utrace reporting loop and then back
 through the return-to-user assembly path a second time, between the tracee
 waking up and actually going to user mode.

Probably, I am not sure. I can't recall the previous discussions. IIRC,
the main problem with user_enable_single_step() was: we wanted to move
the callsite from tracer to tracee by many reasons. I can't recall if
avoid another reporting loop was the target, and additional trip
through tracehook_notify_resume() is not avoidable anyway.

Afaics, this was introduced by 26fefca955cc7c3c83014be2e10b773209d32eea
utrace: sticky resume action. Before that patch the stopped tracee
could only remember the pending UTRACE_REPORT or UTRACE_INTERRUPT.

 Right.  Said another way, any time an engine that previously used
 UTRACE_*STEP now uses UTRACE_RESUME, we must degrade utrace-resume back to
 UTRACE_REPORT at most.  Since we don't keep track of which engine it was
 that put UTRACE_*STEP into utrace-resume before, we'd have to just always
 degrade it any time anybody uses UTRACE_RESUME.

Agreed.

 That has almost the whole effect of punting utrace-resume back to just a
 utrace-report flag.

(see below)

 But that's how it used to be, and then we changed it.
 So it merits digging up our old discussion threads that led to doing that
 and see what all we had in mind then.

I tried to grep my mail archives and google, but failed to find anything
related.

_Perhaps_ this was not intended actually. Before the commit above, another
problem was fixed by 8ad60bbd4c665a11be179a0bff41483cca3b3560
utrace_stop: preserve report/interrupt requests across stop/resume, until
this one it was possible to lose UTRACE_INTERRUPT request if it races
with UTRACE_STOP from another engine. So, perhaps cleanup was the main
reason for 26fefca9.

See also http://www.mail-archive.com/utrace-devel@redhat.com/msg01647.html

Oleg.



Re: utrace unwanted traps

2010-10-19 Thread Roland McGrath
 Probably, I am not sure. I can't recall the previous discussions. 

Me either (except hazily the one notion I mentioned), but that's why
we have mailing list archives.

 Afaics, this was introduced by 26fefca955cc7c3c83014be2e10b773209d32eea
 utrace: sticky resume action. Before that patch the stopped tracee
 could only remember the pending UTRACE_REPORT or UTRACE_INTERRUPT.

Correct.

  But that's how it used to be, and then we changed it.
  So it merits digging up our old discussion threads that led to doing that
  and see what all we had in mind then.
 
 I tried to grep my mail archives and google, but failed to find anything
 related.

Based on the date of the commit you just mentioned, I browsed in:
https://www.redhat.com/archives/utrace-devel/2009-October/thread.html
and saw:
https://www.redhat.com/archives/utrace-devel/2009-October/msg00287.html
which is the day before that commit.  I didn't read through all that to
refamiliarize myself with all the details.


Thanks,
Roland



utrace unwanted traps

2010-09-26 Thread Oleg Nesterov
On 09/23, Roland McGrath wrote:

  OK, finish_callback_report() and utrace_control(DETACH) can set
  TIF_NOTIFY_RESUME.

 Right.  Those utrace_resume has the report.action==UTRACE_RESUME bail-out
 case.  So either that would change or detach would also do UTRACE_REPORT.

  But what if there are no more attached engines?
  Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special
  case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise
  utrace_resume/utrace_get_signal won't be called.

 Right.  Or else tracehook_notify_resume could call utrace_resume
 unconditionally, but I'm not at all sure that is not worse.  The
 original theory was that it should always be OK to have some
 utrace_flags bits stay set when they are stale, because any kind of
 reporting pass that got enabled would hit the report-spurious case
 and clean the state up synchronously when it's safe.

I tried to make the patch which addresses this issue, but surprisingly
I failed. Because I think there are more (minor) problems here. When
it comes to multitracing, we can have the unwanted trap even without
detaching.

In short: I no longer understand why utrace-resume can be UTRACE_*STEP
when the tracee is stopped, and in fact I think this is not right.

For example. Consider two engines, E1 and E2. To simplify the discussion
suppose that both engines have engine-data-resume and -report_quiesce()
just returns -resume. So, if the debugger wants, say, single-step, it
does

engine-data-resume = UTRACE_SINGLESTEP;
utrace_control(UTRACE_SINGLESTEP);

Actually, any engine should do something like this.

Now suppose that E1 wants UTRACE_SINGLESTEP, E2 asks for UTRACE_STOP.
In this case utrace_resume() results in utrace_stop() which sets
utrace-resume = UTRACE_SINGLESTEP before sleeping.

First of all - why? Yes, the theory is that we have another reporting
loop after wake_up, but utrace-resume = UTRACE_REPORT is enough, E1
should always acquire UTRACE_SINGLESTEP if needed or it is buggy.

But more importantly, this doesn't look right or I missed something.
Suppose that, before E2 resumes the tracee, E1 does utrace_control(STOP)
which immediately succeeds and allows E1 to play with the stopped tracee.
Again, to simplify the discussion, suppose that E2 does UTRACE_RESUME
and nothing more after that.

Now. E1 wants to resume the tracee too and it does

engine-data-resume = UTRACE_RESUME;
utrace_control(UTRACE_RESUME);

utrace_control(RESUME) correctly clears TIF_SINGLESTEP, but it doesn't
change utrace-resume. This means utrace_resume() doesn't do the
reporting loop, it just calls user_enable_single_step() and returns
to user-mode.

So. Could you explain why utrace_stop() should ever keep UTRACE_STEP
in utrace-resume? finish_report() does the same but this looks fine.


Also. I am not sure utrace_control(UTRACE_STEP) should always set
utrace-resume = UTRACE_STEP. If the tracee is stopped but there
is another ENGINE_STOP engine (iow, the tracee won't be resumed
right now) we have the same problem.



Off-topic,

#define UTRACE_RESUME_BITS  (ilog2(UTRACE_RESUME_MAX) + 1)

why UTRACE_RESUME_MAX? it should be (ilog2(UTRACE_RESUME_MAX - 1) + 1), no?

Oleg.