Re: Q: ->attaching && REPORT_CALLBACKS()

2009-03-12 Thread Oleg Nesterov
On 03/11, Roland McGrath wrote:
>
> > But not vise versa. I misunderstood the comment as if the new engine
> > should not be notified if it is attached by another task while target
> > is inside callback.
>
> That is indeed what happens in that case.  But that one is not a
> specific "should not", it's just what happens to be true given what we
> say about the "asynchronous" attach case in general.  That is, that an
> "asynchronous" attach + set_events makes no guarantees about how
> instantly you start to get event reports.

Yes, yes, I understand. In short: I greatly misinterpreted the comment.

> > Still can't understand... If (say) ->report_exec() attaches the new
> > engine to the same task and does utrace_set_events(EXEC), then it looks
> > logical the new engine gets the notification too. But OK, I agree, either
> > way is correct, and perhaps the current behaviour is more intuitive.
>
> As you can see in the cited thread, that's what I thought too.  Jim
> convinced me that the (new) current behavior is more useful.
> ...
> Jim's use seems fairly representative of situations where this might
> come up.  He's concerned with the EXEC event as the "old address space
> is gone, new one is here" event.  It's also the "my name changed"
> event that may be triggering a new tracing setup.

Aha, thanks!

> > But this means that "When target == current it would be safe just to call
> > splice_attaching() right here" part of the comment is not right, no?
> > Except for report_reap() target == current.
>
> It would be "safe", meaning it doesn't have race problems like the
> target != current case does for touching ->attached here.

Yes, I see. Again, I confused "safe" with "not what the user wants".

> > Yes, yes, I see. But I meant another case. Suppose that the debugger D
> > attaches to T and does
> >
> > engine = utrace_attach_task(T, ...);
> > utrace_set_events(T, engine, XXX);
> >
> > It is possible that ->report_xxx() is called before utrace_set_events()
> > completes. But afaics currently this is not a problem.
>
> As far as the API guarantees are concerned, there is no "completes".
> When you call utrace_set_events, it becomes possible your callbacks
> get made.

Yes sure.

Thanks!

Oleg.



Re: Q: ->attaching && REPORT_CALLBACKS()

2009-03-11 Thread Roland McGrath
> But not vise versa. I misunderstood the comment as if the new engine
> should not be notified if it is attached by another task while target
> is inside callback.

That is indeed what happens in that case.  But that one is not a
specific "should not", it's just what happens to be true given what we
say about the "asynchronous" attach case in general.  That is, that an
"asynchronous" attach + set_events makes no guarantees about how
instantly you start to get event reports.  It might be as long as the
time it takes to get back to user mode from whereever the thread is
now, or the time it takes it to process an interrupt and then get back
to user mode.  It's like you did "thread->events |= events" but there
has not been any kind of memory barrier--it might see it or might not,
until you do something affirmative to make sure (i.e. put it through
UTRACE_STOP, or else get some other callback you're sure happens after
your utrace_set_events call).

For this purpose an "asynchronous attach" means one by a third task (not
the thread itself or the creator during its report_clone), and done when
that third task did not already have some engine that completed a UTRACE_STOP.

This applies even if it is literally synchronous, i.e. if a callback
arranged for the third task to do the attach and set_events and then
blocked waiting for the third task to report its success, we'd call this an
"asynchronous attach" because it didn't synchronize using UTRACE_STOP.

> Still can't understand... If (say) ->report_exec() attaches the new
> engine to the same task and does utrace_set_events(EXEC), then it looks
> logical the new engine gets the notification too. But OK, I agree, either
> way is correct, and perhaps the current behaviour is more intuitive.

As you can see in the cited thread, that's what I thought too.  Jim
convinced me that the (new) current behavior is more useful.  The most
important thing to me is that it's clearly specified one way or the
other for the synchronous case.  It's obviously straightforward to do:

report_exec(engine, ...)
{
new_engine = utrace_attach_task(current, &new_ops);
utrace_set_events(new_engine, UTRACE_EVENT(EXEC));
new_ops.report_exec(new_engine, ...);
}

if you want one of your own callback functions to get another call
there.  OTOH, it's much more cumbersome to make the report_exec
callback used by your new engine keep flags and whatnot to distinguish
the first exec event that preceded that engine's setup from the next
one (which is what the new engine is really there to respond to).

Jim's use seems fairly representative of situations where this might
come up.  He's concerned with the EXEC event as the "old address space
is gone, new one is here" event.  It's also the "my name changed"
event that may be triggering a new tracing setup.  The former use just
wants report_exec to do "wipe out our state and go away" stuff.  The
latter use might want to set up a new incarnation of that sort of
tracing setup--a new engine whose report_exec callback does clean up.
It's obvious how the new engine getting its "clean up now" callback
immediately as a consequence of where the call to set it up came from
is not helpful.  I'm sure this sort of scenario will not be unique
either to Jim's work or to EXEC callbacks in particular.

> But this means that "When target == current it would be safe just to call
> splice_attaching() right here" part of the comment is not right, no?
> Except for report_reap() target == current.

It would be "safe", meaning it doesn't have race problems like the
target != current case does for touching ->attached here.  That's what
the comment says (and that's what the code used to do).  The reason we
don't do it (any more) is the explicit choice for API semantics, not
any implementation reason (in the implementation it is indeed an
obvious optimization if you are understanding the code).  That's why
the comment is there.

> Yes, yes, I see. But I meant another case. Suppose that the debugger D
> attaches to T and does
> 
>   engine = utrace_attach_task(T, ...);
>   utrace_set_events(T, engine, XXX);
> 
> It is possible that ->report_xxx() is called before utrace_set_events()
> completes. But afaics currently this is not a problem.

As far as the API guarantees are concerned, there is no "completes".
When you call utrace_set_events, it becomes possible your callbacks
get made.  The return value (a failure return, not -EINPROGRESS) can
say that you are now sure no callback was made or will be.  But when
you called, you wanted it to be possible.  If you didn't, then you
should have made sure it was fully stopped via UTRACE_STOP before you
called utrace_set_events.


Thanks,
Roland



Re: Q: ->attaching && REPORT_CALLBACKS()

2009-03-11 Thread Oleg Nesterov
On 03/10, Roland McGrath wrote:
>
> > The comment:
> >
> > * When target == current, it would be safe just to call
> > * splice_attaching() right here.  But if we're inside a
> > * callback,
> >
> > just to clarify, "inside a callback" means inside utrace_report_xxx(),
> > not only inside utrace_engine_ops->report_xxx(), right?
>
> Certainly what I mean when I say "a callback" is one of the functions whose
> pointer lives in struct utrace_engine_ops.  But I don't see how the
> distinction you make could even be meaningful here.

Yes, I wasn't clear.

> A utrace_attach_task()
> call "inside utrace_report_foo()" could only possibly mean one made by a
> ->report_foo() function utrace_report_foo() calls, since obviously there
> are no hard-wired utrace_attach_task() calls in utrace.c itself.

But not vise versa. I misunderstood the comment as if the new engine
should not be notified if it is attached by another task while target
is inside callback.

I was confused by "When target == current" part of the comment, please
see below.

> >  This is not what the user wants.
> >
> > It it not clear to me why the user doesn't want this.
>
> Jim Keniston is the user who doesn't want this.
> https://www.redhat.com/archives/utrace-devel/2008-December/msg00051.html

Still can't understand... If (say) ->report_exec() attaches the new
engine to the same task and does utrace_set_events(EXEC), then it looks
logical the new engine gets the notification too. But OK, I agree, either
way is correct, and perhaps the current behaviour is more intuitive.

But this means that "When target == current it would be safe just to call
splice_attaching() right here" part of the comment is not right, no?
Except for report_reap() target == current.

> > I understand this as follows. If we add the new engine to the ->attached
> > list, and if the target is inside a callback, the target can later race
> > with (say) utrace_set_events(). The target can see "engine->flags & event"
> > and call start_callback/finish_callback before utrace_set_events() 
> > completes.
>
> It's not a race question.  There are no guarantees for such races.  (That
> is, utrace_set_events() calls on target!=current make no guarantee about
> reporting an event that might already have started.  Only if the target was
> already stopped by your engine when you made the call can you be sure that
> no such event can be in progress.)
>
> The scenario we are talking about here is fully synchronous.  The target
> itself is inside a callback, calling utrace_set_events() on itself.

Yes, yes, I see. But I meant another case. Suppose that the debugger D
attaches to T and does

engine = utrace_attach_task(T, ...);
utrace_set_events(T, engine, XXX);

It is possible that ->report_xxx() is called before utrace_set_events()
completes. But afaics currently this is not a problem.

> > Another question. In any case I don't understand why do we really need
> > two lists.
>
> [... big snip ...]

Thanks for your explanations!

And, in any case,

> This is an optimization to consider later on

Yes, yes, sure. I didn't mean we should do this change right now even
_if_ it is good, and I didn't mean I think it is necessary good ;)

Oleg.



Re: Q: ->attaching && REPORT_CALLBACKS()

2009-03-10 Thread Roland McGrath
> The comment:
> 
>   * When target == current, it would be safe just to call
>   * splice_attaching() right here.  But if we're inside a
>   * callback,
> 
> just to clarify, "inside a callback" means inside utrace_report_xxx(),
> not only inside utrace_engine_ops->report_xxx(), right?

Certainly what I mean when I say "a callback" is one of the functions whose
pointer lives in struct utrace_engine_ops.  But I don't see how the
distinction you make could even be meaningful here.  A utrace_attach_task()
call "inside utrace_report_foo()" could only possibly mean one made by a
->report_foo() function utrace_report_foo() calls, since obviously there
are no hard-wired utrace_attach_task() calls in utrace.c itself.

>   that would mean the new engine also gets
>   * notified about the event that precipitated its own
>   * creation.
> 
> engine->flags == 0, so it should not be notified until the caller
> does utrace_set_events() later, right?

Right.  The case in question is a callback doing:

new_engine = utrace_attach_task(current, ...);
utrace_set_events(new_engine, );

Then new_engine would get the "this event" callback at the end of the very
same reporting loop containing its creation.  (This is what happened before
I changed the code as the comment describes.)

>This is not what the user wants.
> 
> It it not clear to me why the user doesn't want this.

Jim Keniston is the user who doesn't want this.
https://www.redhat.com/archives/utrace-devel/2008-December/msg00051.html

> I understand this as follows. If we add the new engine to the ->attached
> list, and if the target is inside a callback, the target can later race
> with (say) utrace_set_events(). The target can see "engine->flags & event"
> and call start_callback/finish_callback before utrace_set_events() completes.

It's not a race question.  There are no guarantees for such races.  (That
is, utrace_set_events() calls on target!=current make no guarantee about
reporting an event that might already have started.  Only if the target was
already stopped by your engine when you made the call can you be sure that
no such event can be in progress.)

The scenario we are talking about here is fully synchronous.  The target
itself is inside a callback, calling utrace_set_events() on itself.

> Another question. In any case I don't understand why do we really need
> two lists.

We want that in the common case a reporting pass takes no locks.  The
->attached list is never touched when the target is not quiescent.  (The
target uses the lock to synchronize when it transitions between being
quiescent and not.)  Once you believe the quiescence logic, this makes
it easy to be confident about the unlocked use of that list in reporting
passes.  It's used in totally vanilla ways, and modified in totally
vanilla ways.

> Let's suppose we implement the new trivial helper,
> 
>   list_for_each_entry_xxx(pos, head, tail, member)
> 
> it stops when "pos" reaches "tail", not "head". Then REPORT_CALLBACKS()
> can just read "tail = utrace->attached->prev" (under ->lock, or
> utrace_add_engine() can use list_add_rcu) before list_for_each_entry_xxx.
> 
> This way we can kill ->attaching, no?

This is a lot like what the old utrace code did before the introduction
of the two lists (when the engine struct was managed using RCU).  This
is just an optimization over what we have now.  It saves the ->attaching
space (i.e. two words in struct utrace), the splice_attaching() logic
(pretty cheap), and the sometimes-superfluous resume report after an
attach.  The cost for this is some very touchy fine-grained complexity
in convincing ourselves (and reviewers) that the list traversal and
modification is always correct.

I've already implied that anything taking any locks for every vanilla
reporting pass is a non-starter.  I'm asserting preemptive optimization
here because it's the case that is most important to optimize.  The
overhead of a reporting pass applies to situations like every system
call entry, with an engine callback that quickly filters out the vast
majority of calls (i.e. "if (regs->foo != __NR_bar) return;" or
something about that cheap).  So we think about the reporting pass
overhead as something that might be done a million times a second, and
accordingly think carefully about that hot path.  In contrast, we are
talking here about optimizing attach, and saving a couple of words of
data structure space that will already be cache-hot.

We are not actually following any RCU rules at all, so to use
list_add_tail_rcu would really just mean that we are relying on our own
fancy special list mutation scheme and proving/documenting that it is
correct.  It just happens to have the same implementation details as
list_add_tail_rcu, and we must either copy those innards and document
why they are right in our uses, or document how the list_add_tail_rcu
innards happen to match what is right for our uses and kee